3

I have a paper document. There are more pages containing a table with 3 columns (current number, name and a grade).

I scanned it and got 16 jpeg documents. Each jpeg is a scanned page.

Now, I need an OCR to convert each jpeg into text, in order to insert this table in an excel document.

I use LibreOffice and Ubuntu 12.04.

Mihaita
  • 31
  • 1
  • 1
  • 2

2 Answers2

4

The Scanning and OCR page on Ubuntu Apps show us several alternatives, of which I suggest you to use XSane Image Scanning Program or Simple Scan (usually pre-installed in 12.04 and maybe earlier versions too) and/or gscan2pdf, to get your documents scanned.

My favorite is gscan2pdf, which allow you to follow the Scan/OCR process in the same GUI with no problem.

enter image description here

Please notice that I am attempting to run a OCR to a screenshot.

You simply Scan or Import the documents/images and go to the Tools Menu, choose the OCR Option and you'll be asked for a OCR Engine, simply choose the one that gives the best results for you and click "Start OCR".

enter image description here

You'll find the OCR Output in the tab with the same title as shown in the next screenshot.

enter image description here

Please note that even with good quality images the OCR may fail interpreting certain characters, which may result in misspelled words or simply Egyptian hieroglyphics. The process to OCR a large set of documents may delay for a while.

Here is a link to a comprehensive video which explains the process to Scan and OCR in GScan2PDF: http://www.youtube.com/watch?v=UjjogfWfWsQ

Good luck!

3

Bit late in answering this question.

But for others who come to this page searching for an OCR solution for LibreOffice, I recently develeped LibreOCR, an OCR plugin for LibreOffice.

It is part of Indic-OCR project.

The extension can now be found from LibreOffice Extensions Website