Why are image files from pdfimages bigger than the PDFs they came from?

Asked May 22 '16 at 23:31

Active Dec 17 '18 at 10:47

Viewed 444 times

On Ubuntu 16.04, pdfimages -all produces image files whose sum of storage use is greater than the PDF files from which they came.

Is there any explanation for this? How can I extract image files that are no larger than the size they're using on the .pdf without compensating on picture quality?

Note: I've tried an approach that uses the pdftohtml command (Extracting embedded images from a PDF) but the files don't seem to allow me because of some kind of permission relating to extracting text (I get the error: Permission Error: Copying of text from this document is not allowed.).

edited Dec 17 '18 at 10:47

Zanna

70,465

asked May 22 '16 at 23:31

Orion751

1

It may be because the images are represented within the PDF as vector graphics whereas pdfimage outputs as raster images such as JPEG. It is possible to extract vector graphics directly (to SVG, or EPS) for example using image programs such as inkscape - however it may not be easy to automate – steeldriver May 23 '16 at 02:06

Why are image files from pdfimages bigger than the PDFs they came from?

0 Answers0