I'm using the OCR-utility of OCRFeeder. OCRFeeder is using the tesseract-engine. I have installed the several language-packs needed for tesseract. How can I set the language such that tesseract will use the right language-file for converting the scanned document into text?
Asked
Active
Viewed 1.1k times
1 Answers
4
You need to setup the engine command line on OCR Feeder settings. It should look like:
-l lang_id $IMAGE $FILE; cat $FILE.txt
Where lang_id is the id as shown on the correspondin language package name.

João Pinto
- 17,159
-
Thank you, João. The language-id must however be specified as the last argument. eg. $IMAGE $FILE -l lang-id; cat $FILE.txt – Bernard Decock Feb 13 '11 at 08:55
-
I added now several ocr-engines into OCR-Feeder. Goto Tools, OCR-Engines and a a new ocr-engine: I keep using the tesseract-engine, but I specified a new name for each entry made with a specific language-id. So for each language I have now a specific ocr-egine that can be selected by OCR-Feeder (Thanks to João Pinto for the hint) – Bernard Decock Feb 13 '11 at 09:01
-
2The language-tags can be found in synaptic for the tesseract-packages. (spa = spanish, fra = french, deu = german, nld = dutch; ita = italian, por = portugese). eg. for scanning a french text, my Tesseract-French engine has the following command-line: $IMAGE $FILE -l fra; cat $FILE.txt – Bernard Decock Feb 13 '11 at 09:07