Questions tagged [ocr]

Optical Character Recognition, the process of converting printed or handwritten text or images of text into digitally encoded text on a computer (so that, for example, it can be reproduced, machine-translated, reformatted, edited, distributed, used as input to software such as text-to-speech and so on)

67 questions
6
votes
1 answer

How can I specify the language to be used by Tesseract when using OCRFeeder

I'm using the OCR-utility of OCRFeeder. OCRFeeder is using the tesseract-engine. I have installed the several language-packs needed for tesseract. How can I set the language such that tesseract will use the right language-file for converting the…
4
votes
1 answer

How I prevent hocr2pdf to use a large font from tesseract generated .hocr file?

Tesseract now creates an .hocr file rather than an .html file for ocr output, but this is not exactly what is at issue here. When hocr2pdf uses this output it uses a large text size with small bounding boxes since the upgrade. Most of the text…
user299889
  • 41
  • 5
3
votes
1 answer

pdfsandwich - how to not change page colour

I am using pdfsandwich but it changes the colour of the pages from colour to black and white. Since I have a document with many coloured pictures how can I avoid it?
brasileiro
  • 31
  • 2
3
votes
2 answers

Optical character recognition for LibreOffice

I have a paper document. There are more pages containing a table with 3 columns (current number, name and a grade). I scanned it and got 16 jpeg documents. Each jpeg is a scanned page. Now, I need an OCR to convert each jpeg into text, in order to…
Mihaita
  • 31
  • 1
  • 1
  • 2
2
votes
2 answers

How to wildcard tesseract?

I want tesseract to convert all the files of a folder. I do not want to merge the files in any way as I am having trouble with programs like hocr2pdf and pdfbeads merging more than one file at a time. I run tesseract *.tif * hocr and end up with…
1
vote
1 answer

Tesseract and OCRopus

I was wondering what relations are between Tesseract and OCRopus? Is OCRopus a wrapper of Tesseract? Or are they now developing independently? What are some advantages of one over the other? Thanks and regards!
Tim
  • 25,177
1
vote
1 answer

Tesseract OCR Engine on ubuntu how to

I've installed tesseract-ocr. I was looking at the manual, but i can't see an option that i can define an image bounds (X,Y,W,H) Can someone help about it , or am asking in a wrong place ?
Ahmed Al-attar
  • 313
  • 2
  • 13
1
vote
2 answers

"sh: 1: cannot open /tmp/pdfsandwich4e375e.html: No such file" when using pdfsandwitch

I tried to add a textlayer to some pdf files in order to make them searchable. This technique is explained in the german Ubuntu wiki: http://wiki.ubuntuusers.de/pdfsandwich . After installing dependencies sudo apt-get install imagemagick exactimage…
1
vote
3 answers

gimage reader OCR

I have recently installed gimage reader OCR. It is not obvious how to use it. I have not yet worked out how to get an editable text file. My aim is to get a libreoffice file to edit and save. Thanks in advance. The original text is standard English…
TonyB
  • 19
0
votes
2 answers

Convert hand written data log to excel

I have to enter loads of hand written data into excel and I was wondering if there is an easier way of doing it than typing all the data into the excel manually. Any suggestions?
0
votes
2 answers

OCR for TAN list (online banking)

I have a TAN list on paper for online banking that looks like this: 001 123456 015 123456 029 123456 043 123456 ... 002 123456 ... ... I scaned it and now I want to use OCR to get the text. I tried tesseract, gocr and cuneiform. All programms…
guettli
  • 1,777
-1
votes
1 answer

Ocr can't recognize a specific image

I am seeking to make these images , (8,0) recognized by an Ocr I am using tesseract but i don't mind if another Ocr make it
MRTgang
  • 661