2

On Ubuntu 16.04, pdfimages -all produces image files whose sum of storage use is greater than the PDF files from which they came.

Is there any explanation for this? How can I extract image files that are no larger than the size they're using on the .pdf without compensating on picture quality?

Note: I've tried an approach that uses the pdftohtml command (Extracting embedded images from a PDF) but the files don't seem to allow me because of some kind of permission relating to extracting text (I get the error: Permission Error: Copying of text from this document is not allowed.).

Zanna
  • 70,465
  • 1
    It may be because the images are represented within the PDF as vector graphics whereas pdfimage outputs as raster images such as JPEG. It is possible to extract vector graphics directly (to SVG, or EPS) for example using image programs such as inkscape - however it may not be easy to automate – steeldriver May 23 '16 at 02:06

0 Answers0