How can I specify the language to be used by Tesseract when using OCRFeeder

Question

I'm using the OCR-utility of OCRFeeder. OCRFeeder is using the tesseract-engine. I have installed the several language-packs needed for tesseract. How can I set the language such that tesseract will use the right language-file for converting the scanned document into text?

score 4 · Answer 1 · answered Feb 11 '11 at 22:34

4

You need to setup the engine command line on OCR Feeder settings. It should look like:

-l lang_id $IMAGE $FILE; cat $FILE.txt

Where lang_id is the id as shown on the correspondin language package name.

answered Feb 11 '11 at 22:34

João Pinto

17,159

Thank you, João. The language-id must however be specified as the last argument. eg. $IMAGE $FILE -l lang-id; cat $FILE.txt – Bernard Decock Feb 13 '11 at 08:55
I added now several ocr-engines into OCR-Feeder. Goto Tools, OCR-Engines and a a new ocr-engine: I keep using the tesseract-engine, but I specified a new name for each entry made with a specific language-id. So for each language I have now a specific ocr-egine that can be selected by OCR-Feeder (Thanks to João Pinto for the hint) – Bernard Decock Feb 13 '11 at 09:01
2

The language-tags can be found in synaptic for the tesseract-packages. (spa = spanish, fra = french, deu = german, nld = dutch; ita = italian, por = portugese). eg. for scanning a french text, my Tesseract-French engine has the following command-line: $IMAGE $FILE -l fra; cat $FILE.txt – Bernard Decock Feb 13 '11 at 09:07

How can I specify the language to be used by Tesseract when using OCRFeeder

1 Answers1