On Ubuntu 16.04, pdfimages -all
produces image files whose sum of storage use is greater than the PDF files from which they came.
Is there any explanation for this? How can I extract image files that are no larger than the size they're using on the .pdf
without compensating on picture quality?
Note: I've tried an approach that uses the pdftohtml
command (Extracting embedded images from a PDF) but the files don't seem to allow me because of some kind of permission relating to extracting text (I get the error: Permission Error: Copying of text from this document is not allowed.
).
pdfimage
outputs as raster images such as JPEG. It is possible to extract vector graphics directly (to SVG, or EPS) for example using image programs such asinkscape
- however it may not be easy to automate – steeldriver May 23 '16 at 02:06