I want to convert djvu to pdf while preserving OCR. This page describes how to do so, but I am getting a blank html file.
In /home/steven/Documents/djvu2pdf/1/, djvu2hocr -p 1 Intro.djvu
gives me:
Converting 'Intro.djvu':
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="ocr-system" content="djvu2hocr 0.7.9" />
<meta name="ocr-capabilities" content="ocr_carea ocr_page ocr_par ocrx_block ocrx_line ocrx_word" />
<title>DjVu hidden text layer</title>
</head>
<body>
*** [1-11711] Failed to open 'Intro.djvu': No such file or directory.
*** (ByteStream.cpp:693)
*** 'DJVU::GUTF8String DJVU::ByteStream::Stdio::init(const DJVU::GURL&, const char*)'
</body>
</html>
Traceback (most recent call last):
File "/usr/bin/djvu2hocr", line 7, in <module>
_.main(sys.argv)
File "/usr/share/ocrodjvu/lib/cli/djvu2hocr.py", line 325, in main
djvused.wait()
File "/usr/share/ocrodjvu/lib/ipc.py", line 114, in wait
raise CalledProcessError(return_code, self.__command)
subprocess.CalledProcessError: Command 'djvused' returned non-zero exit status 10
leading to a blank html file, so when I run
sed 's/ocrx/ocr/g' > pg1.html
it just runs on an indefinite loop.
I also have a secondary program called djvu2pdf which I found at http://0x2a.at/s/projects/djvu2pdf, but
djvu2pdf Intro.djvu
gives me
-e Error: /usr/bin/djvu2pdf: File 'Intro.djvu' not found
The OCR file opens fine.