559

I have requirement of converting PDF pages to images. There is a background image with some text in my file, and when I save it as an image only the background image gets saved.

Is there any software available for the same so that complete page can be converted to an image?

Zanna
  • 70,465

13 Answers13

735

You can use pdftoppm from the poppler-utils package to convert a PDF to a PNG:

pdftoppm input.pdf outputname -png

This will output each page in the PDF using the format outputname-01.png, with 01 being the index of the page.

Converting a single page or a range of pages of the PDF

pdftoppm input.pdf outputname -png -f {page} -singlefile

Change {page} to the page number. It's indexed at 1, so -f 1 would be the first page.

If you'd like to work on a range of pages, you can also specify a number for the flag -l (last page), so having -f 1 -l 30 would specify the pages from 1 to 30.

Note again that .png will be appended to outputname automatically, so there's no need to include the extension. Also, -singlefile removes the -01 suffix cited above, since the output is known to have only one file.

Specifying the converted image's resolution

The default resolution for this command is 150 DPI. Increasing it will result in both a larger file size and more detail.

To increase the resolution of the converted PDF, add the options -rx {resolution} and -ry {resolution}. For example:

pdftoppm input.pdf outputname -png -rx 300 -ry 300
enzotib
  • 93,831
  • 35
    Thank you so much. Much better quality than with imagemagick or graphicsmagick! – dAnjou Jan 09 '13 at 00:18
  • 14
    pdftoppm is much faster than convert – zuo Nov 06 '13 at 04:52
  • with only one pdf in a folder the specific name of the pdf file is not needed: pdftoppm -png *.pdf prefix –  Nov 10 '14 at 13:36
  • 8
    This is really much better than imagemagick. Imagemagick actually changed the colors in an unexpected way in my case! – NoBackingDown Sep 17 '15 at 07:16
  • 34
    this is good!, but it's a bit easier to write -r 300 instead of specifying the x and y resolutions independently when you want to set them to the same value. – mlc Oct 20 '15 at 14:52
  • 1
    Awesome, even though this is in the askubuntu section, I was pleased to find out it works on OS X as well! – RocketNuts May 25 '16 at 23:16
  • How would we put them back into pdf? with this tool, to complete the circle. – Ray Foss Jun 24 '16 at 13:45
  • 1
    pdftohtml (listed at end of pdftoppm manpage) worked better for my use-case; thanks for the hint :-) – Abbafei Jan 20 '17 at 04:50
  • Also: pdftocairo -png page.pdf page.png – turdus-merula Sep 23 '17 at 13:25
  • 3
    Works fine. To obtain this software you can use brew install poppler on macos. – Pavel Vlasov Dec 24 '17 at 14:22
  • 5
    I had much more success with pdftoppm than with imagemagick. – Michael Hays Apr 14 '18 at 07:30
  • 1
    Is there a way to force max settings aka no compression? – William Apr 18 '18 at 19:48
  • 1
    I made the pdf plot with python matplotlib or ROOT. When I use pdftoppm or convert module to convert the plot into png, the result is placed at the top-right corner and it leaves a wide white space. I solved the problem by adding -cropbox option. – HD189733b Dec 04 '18 at 01:18
  • To make it a CBZ (e.g. for reading in an ebook reader like Gnome Books) you can chain commands and use pdftoppm myfile.pdf myfile -png && zip myfile.cbz myfile-*.png; rm myfile-*.png. This will give a "myfile.cbz" in the same directory as "myfile.pdf" – IBBoard May 04 '19 at 14:32
  • Or, to make it easier to do multiple PDFs, use FILE=filename-without-extension; pdftoppm $FILE.pdf $FILE -png && zip $FILE.cbz $FILE-*.png; rm $FILE-*.png. This will give a "filename-without-extension.cbz" in the same directory as "filename-without-extension.pdf". – IBBoard May 04 '19 at 14:51
  • pdftoppm works extremely well and supports a bunch of output image formats, including PPM, PNG, JPEG, TIFF. You can also specify the resolution with -r 300 for example, as well as the JPEG compression (quality) level. See my full answer with examples here: https://askubuntu.com/questions/150100/extracting-embedded-images-from-a-pdf/1187844#1187844 – Gabriel Staples Nov 11 '19 at 04:35
  • I first skipped this answer, because I didn't want to install extra software - only to find out I already had pdftoppm installed on Ubuntu 18.04 – Zoltán Mar 13 '20 at 10:12
  • 1
    Is there any way to set transparent background in png?

    The background is white with pdftoppm and transparent with convert, but convert has problems with big pdfs even if I increase memory limit in policy.xml.

    – Roah Nov 04 '20 at 14:45
  • is there any way to add password ? – Manohar Dec 20 '20 at 08:53
  • @turdus-merula Seemingly cairo is buggier than ppm. – Yai0Phah Mar 05 '21 at 12:07
  • -cropbox exported the pages as I expected, so try using this option if you don't like your initial results. – Denilson Sá Maia Dec 10 '21 at 00:04
  • Sidenote: To install the software on Ubuntu: sudo apt update then sudo apt install poppler-utils – Avatar Feb 22 '22 at 07:25
  • 1
    If you want to resize the resulting PNG use e.g. -scale-to 300. This will give a PNG with max height of 300px. Parameter -r is "kind of how blocky it will look, and -scale-to is how big the overall image will be (on one side)." https://askubuntu.com/a/1179820/238253 – Avatar Feb 22 '22 at 07:32
  • See pdftoppm docs/manual: https://www.systutorials.com/docs/linux/man/1-pdftoppm/ – Avatar Feb 22 '22 at 07:33
  • Thanks! That preserved the fonts, unlike inkscape. Afterwards I used convert -trim to get rid of whitespace because -cropbox didn't work for me. – SurpriseDog Mar 06 '22 at 20:17
  • Amazing solution, given that pdftoppm comes with the default Ubuntu 23.04! – Dan Doe May 30 '23 at 09:16
365

You can use ImageMagick for this. Note that newer versions of ImageMagick have disabled the ability to convert PDF files to images, because of security vulnerabilities that are being exploited in the wild. See the comments for more details and for a workaround.

  1. Install imagemagick by clicking here or by running:

    sudo apt install imagemagick
    
  2. Using a terminal where the PDF is located:

    • For the full document:

      convert -density 150 input.pdf -quality 90 output.png
      
    • For a single page:

      convert -density 150 input.pdf[666] -quality 90 output.png
      

Whereby:

  • PNG, JPG or (virtually) any other image format can be chosen.

  • -density xxx will set the DPI to xxx (common are 150 and 300).

  • -quality xxx will set the compression to xxx for PNG, JPG and MIFF file formates (100 means no compression).

  • [666] will convert only the 667th page to PNG (zero-based numbering so [0] is the 1st page).

  • All other options (such as trimming, grayscale, etc.) can be viewed on the website of Image Magic.

Flimm
  • 41,766
Binarylife
  • 16,442
  • 2
    The answer as is does work but the resolution is very poor. Therefore not currently an answer that is useful. Maybe if convert has some parameters that can be specified this could change. – Elijah Lynn Jan 16 '15 at 19:57
  • 52
    This answer is much higher quality http://askubuntu.com/a/50180/11929 – Elijah Lynn Jan 16 '15 at 20:06
  • 8
    You can change the density by adding the -density 300 parameter – Mokus Apr 01 '15 at 12:18
  • The image in your answer is broken. Perhaps you should update it. – Petr R. Oct 02 '15 at 08:44
  • 5
    So can anybody confirm that specifying density makes it "as good" as the other answers here, or not? Also as a note to followers, ImageMagick calls out to "ghostscript" to actually convert from pdf to png ex: gs -q NOPROMPT ...-sDEVICE=pngalpha -r150x150 -sOutputFile=/var/tmp/Yf%d -f/var/tmp/L -f/var/tmp/Fic1 and if you get convert: no images defined output.png it means you don't have ghostscript installed... – rogerdpack Mar 03 '17 at 17:29
  • 1
    This worked fine for me with the -density 300 parameter. – mghaoui Jun 02 '18 at 20:39
  • Using -density 500 -quality 100 I still get much poorer image quality compared to pdftoppm. – frozen-flame Aug 21 '18 at 05:55
  • And to convert back from images to pdf: convert output-0.png output-1.png output-2.png output.pdf. See: https://itsfoss.com/convert-multiple-images-pdf-ubuntu-1304/ – Gabriel Staples Sep 05 '18 at 20:56
  • 1
    I'm getting this error convert-im6.q16: not authorized 'test.pdf' @ error/constitute.c/ReadImage/412. – Joschua Nov 09 '18 at 20:21
  • 2
    I get convert-im6.q16: no images definedoutput.png' @ error/convert.c/ConvertImageCommand/3258. I know @rogerdpack mentioned it already but I have ghostscript installed, I can usegs` – hsandt Nov 27 '18 at 20:26
  • 10
    Parsing PDF in imagemagick has been disabled - https://bugs.archlinux.org/task/59778 - it can be enabled manually by editing /etc/ImageMagick-7/policy.xml file and removing PDF from <policy domain="coder" rights="none" pattern="{PS,PS2,PS3,EPS,PDF,XPS}" /> – Jezor Dec 10 '18 at 18:58
  • 2
    You might want to add -background white -alpha off to remove transparency. – Martin Thoma Jan 28 '19 at 11:52
  • I found GIMP produces a much higher quality conversion than imagemagick (as of the current respective versions packaged in Ubuntu 19.04) – durette Dec 01 '19 at 18:37
  • 1
    @ElijahLynn I have changed the accepted answer. – Deependra Solanky Feb 09 '20 at 05:40
  • Unfortunately, I couldn't make out a pragmatic, easy to follow routine with my favorite tool "convert". I'll have to agree with @ElijahLynn and point to solution http://askubuntu.com/a/50180/11929 – somethis Feb 14 '21 at 13:19
  • I got an error: "convert-im6.q16: attempt to perform an operation not allowed by the security policy `PDF'" . I think PDF parsing doesn't work in ImageMagick any more. – Flimm Jun 11 '23 at 08:52
  • @Flimm, see Jezor's comment above – SpinUp __ A Davis Oct 14 '23 at 18:46
33

IIRC GIMP is capable of using PDFs, i.e. converting them into images. So if you want to edit the images right away - GIMP is your friend.

tesseract
  • 486
  • GIMP can indeed open PDFs, each page as one layer. Choosing "Export As" seems to save only the current layer, but you can easily delete the layer after exporting and run "Export As" again. – Dan Dascalescu Aug 12 '19 at 08:34
  • As of the current respective versions packaged in Ubuntu 19.04, I found GIMP produces a much higher quality conversion than imagemagick. – durette Dec 01 '19 at 18:38
20

The currently accepted answer does the job but results in an output which is larger in size and suffers from quality loss.

The method in the answer given here results in an output which is comparable in size to the input and doesn't suffer from quality loss.

TLDR - Use pdfimages : pdfimages -j input.pdf output

Quoting the linked answer:

It's not clear what you mean by "quality loss". That could mean a lot of different things. Could you post some samples to illustrate? Perhaps cut the same section out of the poor quality and good quality versions (as a PNG to avoid further quality loss).

Perhaps you need to use -density to do the conversion at a higher dpi:

convert -density 300 file.pdf page_%04d.jpg

(You can prepend -units PixelsPerInch or -units PixelsPerCentimeter if necessary. My copy defaults to ppi.)

Update: As you pointed out, gscan2pdf (the way you're using it) is just a wrapper for pdfimages (from poppler). pdfimages does not do the same thing that convert does when given a PDF as input.

convert takes the PDF, renders it at some resolution, and uses the resulting bitmap as the source image.

pdfimages looks through the PDF for embedded bitmap images and exports each one to a file. It simply ignores any text or vector drawing commands in the PDF.

As a result, if what you have is a PDF that's just a wrapper around a series of bitmaps, pdfimages will do a much better job of extracting them, because it gets you the raw data at its original size. You probably also want to use the -j option to pdfimages, because a PDF can contain raw JPEG data. By default, pdfimages converts everything to PNM format, and converting JPEG > PPM > JPEG is a lossy process.

So, try

pdfimages -j file.pdf page

You may or may not need to follow that with a convert to .jpg step (depending on what bitmap format the PDF was using).

I tried this command on a PDF that I had made myself from a sequence of JPEG images. The extracted JPEGs were byte-for-byte identical to the source images. You can't get higher quality than that.

  • 4
    This is the incorrect solution for the OPs question if the PDF is a print-ready PDF created by something like Illustrator or Acrobat, since pdfimages extracts only the images from the PDF but does not flatten each entire page and export the full pages to images. – GuyPaddock May 14 '20 at 15:22
  • @GuyPaddock Thanks for pointing it out. – Anmol Singh Jaggi May 15 '20 at 08:00
12

If your pdfs are scanned, the images are already stored as part of pdf. you will simply need to extract them with pdfimages:

pdfimages my-file.pdf prefix 
VitoshKa
  • 264
  • 6
    This is the perfect solution for scanned pdfs, as with this you can, with one command, extract the original jpgs, and without further recompressions. – Jose Gómez Jan 31 '16 at 22:49
  • 2
    This is the incorrect solution for the OPs question if the PDF is a print-ready PDF created by something like Illustrator or Acrobat, since pdfimages extracts only the images from the PDF but does not flatten each entire page and export the full pages to images. – GuyPaddock May 14 '20 at 15:22
5

If you only want to convert a specific page of a PDF to a PNG, you can pipe pdftk to convert (described above) like this:

pdftk document.pdf cat 12 output - | convert - document-page-12.png
IQAndreas
  • 3,188
3

You can do this with ghostscript:

gs -dSAFER -dBATCH -dNOPAUSE -r300 -sDEVICE=png16m -dFirstPage=1 -dLastPage=1 -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -sOutputFile=output.png input.pdf

See https://www.ghostscript.com/doc/9.52/Devices.htm for details

vstepaniuk
  • 481
  • 3
  • 15
3

You can use convert and specify a higher density using -density option.

eg. convert -d 300 foo.pdf bar.png

Arjun
  • 139
  • 3
3

To get a single page from gm convert, add [N] (with N the page number starting at 0) to the PDF name, ie gm convert foo.pdf[11] out.png to get the 12th page from the PDF.

For pdftoppm use -f N -singlefile, where N is the page number starting at 1, ie pdftoppm -f 12 -singlefile foo.pdf out for the same result. It appears to always add ".png" to the output filename and there is no way to stop this.

jkt123
  • 3,530
  • 22
  • 24
2

pdftocairo file.pdf -png (was posted by Anthony Ebert as a comment at How to convert PDF to image?)

  • Provided by: poppler-utils_0.24.5-2ubuntu4_amd64. Docs: http://manpages.ubuntu.com/manpages/trusty/man1/pdftocairo.1.html – Avatar Feb 21 '22 at 08:29
2

Master PDF Editor (ver 2.2) has this option built in. Open the PDF file and then go to File > Export to > Images. It presents a dialog where you can define different options for the output. Extremely useful. Hope this info helps.

Zanna
  • 70,465
Rush
  • 29
  • Is that in the free or paid version? In my version, the option is greyed out? Does that mean I need to pay? Is there a paid version? – Joshua Robison Sep 07 '17 at 02:25
  • (In case it crashes at some point with pdf with many pages: print part of the original to pdf before extracting from the output with this tool) – cipricus Aug 24 '21 at 10:01
2

PDF Mod also allows exporting images of all or individual pages of PDF files.

  • Open PDF file in PDF Mod
  • Select page(s)-
  • Edit > Export image(s)
Zanna
  • 70,465
nhylated
  • 451
1

For high-quality output, mutool does a great job if the output resolution is specified to a high value (e.g., above 250). mutool comes from the mupdf-tools package, associated with the MuPDF viewer. The command can also do the opposite task, converting png back to pdf.

mutool convert -O resolution=600 -o out-pdf.png in-pdf.pdf