39

I want to convert a .pdf file to an .odt file so that I can further convert it to a .doc file. Is there any software/script that can do this. I have tried to copy the content of the .pdf file and paste it in LibreOffice Writer but the formatting isn't preserved.

The document is confidential so I'd prefer not to use any on-line service for the conversion.

Any help is highly appreciated.

Ankit
  • 6,779

5 Answers5

19

You could take a look at PDF Utilities (poppler-utils via Synaptic or apt-get) which includes pdftotext:

Poppler is a PDF rendering library based on Xpdf PDF viewer.

This package contains command line utilities (based on Poppler) for getting information of PDF documents, convert them to other formats, or manipulate them:
* pdfdetach -- lists or extracts embedded files (attachments)
* pdffonts -- font analyzer
* pdfimages -- image extractor
* pdfinfo -- document information
* pdfseparate -- page extraction tool
* pdftocairo -- PDF to PNG/JPEG/PDF/PS/EPS/SVG converter using Cairo
* pdftohtml -- PDF to HTML converter
* pdftoppm -- PDF to PPM/PNG/JPEG image converter
* pdftops -- PDF to PostScript (PS) converter
* pdftotext -- text extraction
* pdfunite -- document merging tool

Of course, success will depend on how the pdf file was generated. If you get what you want as a text file, you could then save that as an .odt file.

Edit: I forgot to provide the source for the quote. It's from the description tab in Synaptic for PDF Utilities (based on Poppler).

16

I was annoyed by the lack of a free PDF to ODT converter too. I didn't even need anything complicated. Just a tool that generates ODT files that I can then annotate in LibreOffice (e.g. to fill out forms).

I know how to do this manually, by converting the PDF document into graphics files and then importing them into LibreOffice, but that gets tedious quite fast.

So, I finally wrote a quick little shell script that does all the required steps automatically. You can find it at https://github.com/gutschke/pdf2odt

It can take any number of PDF and image files as input and generates a ODT file that can be opened and edited in LibreOffice. Images show up as page background, so you can write over them freely. Each image is associated with its own page style. Keep that in mind, when inserting page breaks and adjust the page style as necessary.

I tested the script on both Linux and Mac. Given that it only needs a handful of reasonably standard tools, it should be quite portable.

gutschke
  • 169
  • 1
  • 2
  • This script makes screenshots of each page and plots them into the target format, Thanks for the script Gutschke – Oliver Mar 21 '15 at 13:56
  • I had used pdf2oo a few years ago, but it seems to now produce corrupt files for LibreOffice. This script does that and more - thanks! – eacousineau Jul 15 '15 at 14:28
  • 4
    The pdf2odt script, unfortunately, converts to an image format that is used as an ODT background. Don't expect to be able to "edit" any of the original text. – Richard Elkins Mar 14 '18 at 18:41
  • Can you use Inkscape to generate the "image" so the image is an SVG? If I open PDF by hand with Poppler import in Inkscape, I can export as SVG, and this SVG seems to be very close to the PDF and can be imported in LibreOffice. – Ole Tange Feb 28 '23 at 19:07
11

LibreOffice is capable of importing .pdf files. Simply open it in a current version of LibreOffice for best results. It will, however, open the document as a drawing, and you will be able to convert it only to one of the supported image formats, not as a Writer document.

Naturally, not all formatting is preserved, but at least some.

bender
  • 1,814
  • 1
    I tried it recently, and it just awful, it doesn't save a formatting even a little. More over, it makes text unreadable at all. – Hi-Angel Jul 15 '15 at 18:26
7

Try Calibre. It converts to html and then into other formats. It did a pretty good job on a large (183 pages) file I would have otherwise had to print.

In my case I converted it to an epub, but for fun just converted it to a .docx which turned out very well.

3

If the poppler-utils package is installed, a file manager script including a command like the one below will help convert PDF file to HTML (the option "-i" can be deleted to include images as well), which can then be opened with LibreOffice Writer and saved as ODT although the success of formatting conversion depends very much on how PDF is created.

pdftohtml -noframes -q -s -c -i -p -noframes <filename>
Sadi
  • 10,996
  • Thank you for this helpful script. Just a small remark (from man pdftohtml): -noframes : generate no frames. Not supported in complex output mode. So -noframes won't have any effect with -c set. – Glutanimate Jan 17 '13 at 22:41
  • 2
    Thanks, I've removed this redundant option from my script now. A zenity-powered bash script to provide a gui for all these options would be very nice it seems ;-) – Sadi Jan 18 '13 at 08:24
  • #MHC, it seems this info is wrong; if we don't include -noframes we get separate html files for pdf pages; so I inserted it again to my script. – Sadi Jan 29 '13 at 17:17
  • That's strange. Must be a mistake in the documentation then. I'll change my copy of the script accordingly. Thanks for the heads up! – Glutanimate Feb 01 '13 at 04:47
  • This link is dead. – starbeamrainbowlabs Jan 30 '23 at 12:36
  • 1
    @starbeamrainbowlabs Thanks, updated the answer now just providing the command to be used in a simple script the format and location of which is subject to change in time. – Sadi Jan 31 '23 at 14:46