2

I want tesseract to convert all the files of a folder. I do not want to merge the files in any way as I am having trouble with programs like hocr2pdf and pdfbeads merging more than one file at a time.

I run tesseract *.tif * hocr and end up with the following

read_params_file: parameter not found: II*

  • If you have multiple tif files in a directory (lets say example1.tif, example2.tif, and example3.tif then your command gets expandeded by the shell to tesseract example1.tif example2.tif example3.tif hocr. This is done before tesseract gets to see any parameters. – Hennes Mar 30 '13 at 12:30
  • Here is another working example that works on multiple scanned files (A&S) from https://personal.math.ubc.ca/~cbm/aands/ Link: tar.gz archive Bash for i in *.jpg ; do tesseract -l eng $i $i pdf; done; Searchable except ... Particular instances of ".1.1" don't get found, whereas ".1.2" do? For instance 17.1.1 fails but 17.1.2 succeeds as does 18.1.1 Suggestions? – rrogers Jul 10 '22 at 19:29

2 Answers2

6

I tried this and it works

for i in *.tif ; do tesseract $i outtext;  done;

Make sure when you are in terminal and you change directory to the location of all the files

Meer Borg
  • 4,963
1

I've modified Meer Borg's answer slightly. Using that code, my output file only had input from the last file in the folder.

Using tesseract's stdout option with >> is a way to get all of the output appended to a single file:

for i in *.tif ; do tesseract $i stdout >> outtext;  done;
MylSh
  • 11