776

I have a 72.9 MB PDF file that I need to shrink to 500 KB or below.

The file was a JPEG image that I had scanned and then converted to PDF.

tamimym
  • 7,899
  • 1
    it depends on what consumes the space...need a lot more information. compressing image space could help, but if you're trying a large file heap spray, that won't work. seriously need more info. – RobotHumans Mar 16 '12 at 17:14
  • 1
    convert it to DjVu, instead trying to reduce to impossible sized PDF (according source) – zetah Mar 16 '12 at 17:22
  • 2
    PDF to PS is not effective in scanned PDF file, I try to convert 56 MB pdf into ps file but ps file convert into 1.3 GB and again ps2pdf is converted in 45 MB file –  Jan 18 '13 at 05:32
  • 1
    It only seems to help filesize a little bit, but pdfopt has a simple syntax and improves loading and page-turning speed in the iPad era. :-) – Ari B. Friedman May 31 '12 at 00:53
  • Please see this related Q&A for a number of GUI front ends to ghostscript that should make the process of reducing PDF filesizes easier. – Glutanimate Apr 11 '13 at 21:58
  • Note that the OP appears to have accidentally marked the wrong answer as accepted. His accompanying comment gives thanks for the ghostscript solution, which solved the problem, but ghostscript appears not in this answer but in a different one. – Ray Butterworth Jun 27 '19 at 15:00
  • I've tried nearly all answers from below to get a ~10MB pdf to below required 1MB and only shrinkpdf worked for me. Here I could fine tune the dpi and also grey-scale it to get a still readable but compact version. Great tool! – Wolfson Aug 27 '23 at 14:06

25 Answers25

1088

Use the following Ghostscript command:

gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/ebook \
-dNOPAUSE -dQUIET -dBATCH -sOutputFile=output.pdf input.pdf

Summary of -dPDFSETTINGS:

  • -dPDFSETTINGS=/screen lower quality, smaller size. (72 dpi)
  • -dPDFSETTINGS=/ebook for better quality, but slightly larger pdfs. (150 dpi)
  • -dPDFSETTINGS=/prepress output similar to Acrobat Distiller "Prepress Optimized" setting (300 dpi)
  • -dPDFSETTINGS=/printer selects output similar to the Acrobat Distiller "Print Optimized" setting (300 dpi)
  • -dPDFSETTINGS=/default selects output intended to be useful across a wide variety of uses, possibly at the expense of a larger output file

Reference:

Controls and features specific to PostScript and PDF input

-dPDFSETTINGS=configuration

Presets the "distiller parameters" to one of four predefined settings:

  • /screen selects low-resolution output similar to the Acrobat Distiller (up to version X) "Screen Optimized" setting.
  • /ebook selects medium-resolution output similar to the Acrobat Distiller (up to version X) "eBook" setting.
  • /printer selects output similar to the Acrobat Distiller "Print Optimized" (up to version X) setting.
  • /prepress selects output similar to Acrobat Distiller "Prepress Optimized" (up to version X) setting.
  • /default selects output intended to be useful across a wide variety of uses, possibly at the expense of a larger output file.

The exact settings for each of these, including their DPI values, are shown in the dozens of options in this table.

Michael D
  • 11,035
  • 2
    One can also make a Nautilus script to access this function for every file. – Sina May 07 '13 at 15:51
  • 21
    This should be the accepted answer. ghostscript is the PDF, XPS and PS implementation for unices and can do basically everything delivering best quality... – dom0 Oct 02 '13 at 17:27
  • 9
    @Sina: There is actually a Nautilus Script with a simple Zenity-based GUI that utilizes this gs command with all its quality-level options: https://launchpad.net/compress-pdf – Sadi Oct 25 '13 at 11:47
  • 55
    This is the right answer for this question (compressing a pdf that is mostly bitmap data). I found that the screen setting was too low quality for me, but ebook worked well, cutting a 33Mb scan-based PDF down to 3.6Mb, and keeping it very readable. Other options for the -dPDFSETTINGS option are listed here: http://milan.kupcevic.net/ghostscript-ps-pdf/, and it might be a good idea to include them in this answer. – naught101 Dec 02 '14 at 01:13
  • 1
    The high quality default compressed a PDF scan in black&white from 38.2 MB to 6.4 MB without any significant quality loss. Apparently the original encoding was very inefficient. Thank you! – pietrodn Mar 05 '16 at 08:15
  • @pietrodn, the original encoding was probably a bitmap. If it saves every single pixel without compression, I would't call that inefficient. Rather, it seems like GS just does a good job at compressing. – Turion Jun 19 '16 at 10:35
  • Any GUI for this? – Orion Sep 18 '17 at 09:05
  • 4
    on 17.10 it made 42 mb pdf to 127 mb :( – YaSh Chaudhary Oct 23 '17 at 03:24
  • 1
    My pdf file size increased too... :( – Millemila Apr 11 '18 at 04:02
  • 1
    Based on this, I created a handy script called pdf_compress.sh @ https://github.com/erikw/dotfiles/blob/personal/bin/pdf_compress.sh – Erikw Jun 09 '18 at 18:33
  • I used these instructions to compress a PDF. However, I get a lot of errors. I described the issue here in detail. I would appreciate it if you could take a look and help me with that. – Foad Jan 24 '20 at 12:50
  • Is the dCompatibilityLevel value important? Can it change over different ghostscript releases? – shevy Aug 16 '20 at 16:05
  • 1
    With Ubuntu 20.04 and gs 9.50, this failed with a Segmentation Falt ... – Nuno Oct 17 '20 at 05:47
  • On Ubuntu 20.04 it ran just fine, decreasing a 3 pg PDF from 3.8MB to 330KB, but it is completely unreadable. Looks like it is like 25DPI. Horribly pixelated. Here's the cmd I ran: gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/screen -dNOPAUSE -dQUIET -dBATCH -sOutputFile=output.pdf input.pdf. – Gabriel Staples Dec 24 '20 at 11:25
  • ...and the other settings look great, but don't decrease the PDF size at all. :( Note this is on OCRed output from my pdf2searchablepdf program. – Gabriel Staples Dec 24 '20 at 11:27
  • Looks like this cmd is saved as a gist by somebody here too: https://gist.github.com/firstdoit/6390547. – Gabriel Staples Dec 27 '20 at 01:50
  • I was mistaken the other day, using -dPDFSETTINGS=/ebook actually works fine. Example: gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/ebook -dNOPAUSE -dQUIET -dBATCH -sOutputFile=out.pdf in.pdf. This is very similar to (if not identical to?) ps2pdf -dPDFSETTINGS=/ebook in.pdf out.pdf, since ps2pdf is actually just a wrapper around Ghostscript (gs), apparently. Also, for -dPDFSETTINGS= reference see https://www.ghostscript.com/doc/current/VectorDevices.htm#PSPDF_IN, as well as this table here: https://www.ghostscript.com/doc/current/VectorDevices.htm#distillerparams. – Gabriel Staples Dec 27 '20 at 01:59
  • I've added an answer here now, stemming from this and other answers: https://askubuntu.com/a/1303196/327339. – Gabriel Staples Dec 27 '20 at 17:58
  • This has been completely useless every time I've tried it, no matter what scanner the original images have come from and what quality settings I've used. – Peter Jun 14 '21 at 07:59
  • Do not use this. /ebook, /screen, etc downsample the PDF i.e. they reduce the number of pixels - a horribly inefficient/lossy way to compress a PDF. Instead of downsampling, use image compression - it results in a much nicer final result (at least when you have enough pixels): convert -density 300 input.pdf -quality 30 output.pdf – Zaz Oct 27 '21 at 08:15
  • This worked pretty well, but gave several instances of this error: **** Error: File encountered 'rangecheck' error while processing an image. Output may be incorrect. and seems to have deleted some of the images in the source PDF. – Caleb Stanford Jul 09 '22 at 14:29
  • Getting: Failed to initialise downsample filter, downsampling aborted – Sumit Wadhwa Aug 25 '22 at 06:24
  • 1
    @Zaz, the convert command you gave (convert -density 300 input.pdf -quality 30 output.pdf just took a searchable 962 KB file and made it a non-searchable 4.5 MB file, making the "compressed" PDF nearly 5x larger. That's definitely not the command I'm looking for. – Gabriel Staples Oct 13 '22 at 05:23
  • 1
    @GabrielStaples: Sorry. To clarify, ImageMagick's convert is only for PDFs that are just images. So if you scan a document, convert will work better than the command above because it will compress the pages instead of downsampling them (reducing the number of pixels). – Zaz Oct 19 '22 at 23:00
  • Used this on a PDF file containing vector images and it seemed to work fine. However, after compression some of the vector images only show up as black boxes, but only in Safari. Other viewers or browsers (Chrome/Firefox) work perfectly fine. – Chris Mar 17 '23 at 17:18
  • Worked great on my Mac as well (Homebrew gs) – Magnus Apr 21 '23 at 17:57
298

My favorite way to do this is to convert the PDF to PostScript and back. It does not always work, though, but when it works the results are nice:

ps2pdf input.pdf output.pdf

This also directly works on PDFs, as suggested in the comments.

Some users also report more success when using the ebook settings as follows:

ps2pdf -dPDFSETTINGS=/ebook input.pdf output.pdf
don.joey
  • 28,662
  • 22
    Despite the fact that this one approach became my favorite solution to compress pdf files, it breaks up url links the document may have (which does not happen with @Michael D's approach). Apart from that, awesomeness is all I can think of running this snippet! (: – Rubens Dec 06 '13 at 11:01
  • 1
    @Rubens Ah. Did not know about the fact that it breaks the url links. Thanks for adding that. – don.joey Dec 06 '13 at 12:19
  • 4
    This bypasses password protection...just sayin' – j-i-l Jan 06 '15 at 20:28
  • Great solution! In contrast to printing to a postscript file in Evince, the quality didn't noticeably change. – balu Jan 28 '16 at 07:45
  • For the second command, I would use this shorthand: ps2pdf output.{ps,pdf} – GNUSupporter 8964民主女神 地下教會 Jun 18 '16 at 07:07
  • 13
    ps2pdf will take pdfs as inputs, so you can do this in one step: ps2pdf intput.pdf output.pdf – frabjous Sep 01 '16 at 19:19
  • 1
    This command has actually increased the size of a pdf from Google books 10 times! But at least my Kindle now displays all characters in this processed pdf. – Vladimir F Героям слава Jan 21 '17 at 19:34
  • @VladimirF in that case the increase in size is justified – don.joey Jan 23 '17 at 10:30
  • @Pablo the editor of edit 2, I know you want to do well, but if you are adding that much info, it deserves its own answer. – don.joey Mar 23 '17 at 07:50
  • 2
    @don.joey can't understand why, since just extends your answer. Main thing here: ps2pdf also use ghostscript, so you can use things like -dPDFSETTINGS=/ebook. – Pablo Bianchi Mar 23 '17 at 20:26
  • @PabloBianchi I think you answer is a valid alternative to mine. I like mine which is nice and short, but yours definitely has its place in the list. So please feel free to update it and add it as your own answer. – don.joey Mar 24 '17 at 10:05
  • 398MB scanned pdf to 397MB! – wbad Jan 06 '19 at 19:30
  • @wbad that means that probably all possible compression is already applied, I doubt you will be able to lower the size without altering the quality – don.joey Jan 06 '19 at 21:06
  • I was in a hurry and it worked like a charm, thanks a lot! :) – flawr Mar 01 '19 at 20:50
  • 9
    It didn’t work (84 MB→82 MB), but ps2pdf -dPDFSETTING=/ebook in.pdf out.pdf , as suggested by @PabloBianchi leads to 272 kB ! Thanks a lot ! – Frédéric Grosshans Mar 17 '19 at 17:32
  • ps2pdf -dPDFSETTINGS=/ebook worked really well. 14megs down to 4. Thanks! – ndstate Nov 07 '19 at 16:20
  • Thank you so much for your solution it was really helpful! Also is a quick way to reduce the size of pdf files. I wonder if an option like this should be included in the most popular document viewers in Linux, do you know anyone that includes this feature? – EnriqueBet Aug 06 '20 at 22:06
  • It worked! With ps2pdf -dPDFSETTINGS=/ebook input.pdf output.pdf, a 3 pg 3.8 MB input file which was an output from my pdf2searchablepdf program got reduced to 916 KB! Note, however, that ps2pdf input.pdf output.pdf did nothing. This was on Ubuntu 20.04, with an original PDF already at 300 DPI to start, considering my script ran pdftoppm -tiff -r 300 "$pdf_in" "$temp_dir/pg" first. – Gabriel Staples Dec 24 '20 at 11:33
  • Note: dead link :(: https://ghostscript.com/doc/current/Ps2pdf.htm#Options – Gabriel Staples Dec 27 '20 at 01:33
  • Try this: https://www.ghostscript.com/doc/current/VectorDevices.htm#PSPDF_IN. I posted it here too. – Gabriel Staples Dec 27 '20 at 01:41
  • To compress even further one can use ps2pdf -dPDFSETTINGS=/screen input.pdf output.pdf. Though quality in /screen option is poor than /ebook option. – dheerendra May 26 '22 at 18:17
  • Lmao this is so weird, it turned a bunch of the portrait pages into landscape!!! – Caleb Stanford Jul 09 '22 at 14:32
  • After fixing all the rotated pages manually, this solution worked best for me. 46 MB -> 20 MB with the default command. – Caleb Stanford Jul 09 '22 at 14:48
236

aking1012 is right. With more information regarding possible embedded images, hyperlinks etc.. it would be much more easier to answer this question!

Here are a couple of script and command-line solutions. Use as you see fit.

v2r
  • 9,547
  • 27
    Thank you very much for your suggestions, the ghostscript shell worked wonders and shrank it down to 460KB :) – tamimym Mar 16 '12 at 19:56
  • That is not necessarily true. If the content went from an image to text, that is more than fessible. [That is assuming that the text is accurately ocred] – monksy May 16 '15 at 20:11
  • 7
    I recommend you shrinkpdf.sh script, you can customize the code to use the ppi value you want (72 by default) and reach exactly the filesize you need to sacrifice the least quality. This made me able to upload a scanned document of 11 MB with a max. size of 3 MB without losing a lot of quality. – Severo Raz Apr 09 '16 at 22:18
  • 9
    shrinkpdf works great! – AmanicA Feb 22 '17 at 22:14
  • First link, with setting "/ebook", reduced a 19MB scanned file to 4.2MB and the scanned text remains readable. – dremodaris Dec 13 '17 at 21:10
  • 4
    Where is the ghostscript shell that the OP is referring to https://askubuntu.com/questions/113544/how-can-i-reduce-the-file-size-of-a-scanned-pdf-file#comment134224_113547? – user13107 Mar 05 '18 at 06:44
  • shrinkpdf worked for me! You must install ghostscript and ensure the command 'gs' is available in PATH for this to work. In macos you can install ghostscript using 'brew install ghostscript" – pcx Oct 28 '19 at 04:51
  • 1
    @user13107 It is this answer - https://askubuntu.com/a/256449/171427 – callmekatootie Nov 15 '19 at 17:21
  • For me, https://docupub.com/pdfcompress/ [pdfcomress], got the best results. – Michael D Mar 08 '20 at 20:15
  • First link, /ebook setting, reduced 2.2MiB PDF to 144,5KiB – mydoghasworms Jun 25 '20 at 07:47
  • 1
    First link is a long page for a gs one-line command line. Second is best, a flexible script http://www.alfredklomp.com/programming/shrinkpdf/ . Third is limited (convert to ps and back to pdf, no options). Fourth implies sending your PDF to an unknown third party online (but hey, it's free!), no thanks. So, I recommend http://www.alfredklomp.com/programming/shrinkpdf/ – Stéphane Gourichon Nov 10 '20 at 12:48
224

If you have a pdf with scanned images, you can use convert (ImageMagick) to create a pdf with jpeg compression (You can use this method on any pdf, but you'll loose all text informations).

For example:

convert -density 200x200 -quality 60 -compress jpeg input.pdf output.pdf

Adjust the parameters to your needs

  • -density: the pixel density in dpi (e.g. 100x100). Higher pixel densities increase quality and size
  • -quality: the compression ratio. For jpg it is between 1 to 100 with 100 the best quality, but lowest compression
  • -compress: the compression algorithm. jpeg compression might not be the best choice due to compression artifacts. You have the choice between BZip, Fax, Group4, JPEG, JPEG2000, Lossless, LZW, RLE or Zip as alternate compression methods (some only allow b/w images).

I was able to achieve great compression ratios for scanned/photographed documents (depending on the settings). Depending on the document source, you might want to reduce the color depth (-depth argument).

someonr
  • 2,349
  • 4
    For a scanned document where the text is what you are interested in rather then the images and preserving depth isn't an issue, jpeg compression is not a good idea because the artifacts tend to be extremely noticeable. If you use pdfimages input.pdf pages to extract pbm files, then you can do something like: for page in *.pbm; do convert $page -compress Group4 -type bilevel TIFF:- | convert - output.pdf. Any OCR will be lost so I usually then do pdfsandwich output.pdf, which seems to reduce file size even further. – Brian Z May 04 '15 at 11:57
  • 1
    @BrianZ sure jpeg compression isn't always the best choice, but for me it was the best approach for mixed type documents. I added some informations about other compression methods to the answer. – someonr May 06 '15 at 23:43
  • 3
    This method ultimately uses gs behind the scenes. – alfC Jun 12 '15 at 04:55
  • 2
    I had to use double dash for the options to run the command --density --quality --compress vs -density -quality -compress. – Rotareti Nov 10 '16 at 18:22
  • 2
    If image qaulity is not the highest concern (and you just want to get that dang email attachment small enough to be sent) one might add -resize 50% too, change percentage depending on how much DPI was used while scanning – chrki Jan 11 '17 at 00:45
  • Could you please explain the -quality option a bit. Like how can I achieve a Low, Med and High quality compress? – rahim.nagori Aug 18 '20 at 13:30
  • @rahim.nagori I added some more information and a link with more details about the quality flag to the answer. – someonr Aug 18 '20 at 14:11
  • This is the only thing that worked for me. I had a 5-page 62MB pdf scan and all other options (ghostscript, ps2pdf, libre office...) failed miserably. Instead it was enough to convert -resize 30% input.pdf output.pdf and it was shrunk to 12MB, a bit rough but still legible. – Gabriele Buondonno Sep 02 '20 at 23:41
  • 1
    It increases my scanned pdf images: 14 MiB the original and 24MiB the converted pdf – somenxavier Oct 06 '20 at 08:55
  • This used to work perfectly for me. But now, with Ubuntu 20.04 and ImageMagick 6.9.10-23 Q16 x86_64, this does not work anymore. It fails with the message: "convert-im6.q16: no images defined `output.pdf' @ error/convert.c/ConvertImageCommand/3258" – Nuno Oct 17 '20 at 05:50
  • @Nuno Seems like a policy issue, comment gs policy in /etc/ImageMagick-7/policy.xml – aksh1618 Apr 05 '21 at 07:05
  • For a moderately large file this completely hogged all my CPU cores for several minutes. Not recommended. – Peter Apr 27 '21 at 11:20
  • works on Arch but Ubuntu's image magick seems to be too dumb – france1 Oct 04 '22 at 11:29
  • This was a more flexible solution than ghostscript as I was dealing with a PDF made of JPGs – Ucodia Dec 22 '22 at 21:53
  • This allows more fine-tuning than ghostscript by tweaking the -density and -quality options: raising one increases the size of the output, but this can be compensated by decreasing the other depending on the visual output that one needs. – Giuseppe Jan 04 '23 at 17:37
  • I've problem with this, if the pdf size is hugh, the convert will suddenly racking up the RAM and end up being OOM.... I have 1000 scanned document, with 20GB of ram, it stuck and need force restart... – Benyamin Limanto Jan 26 '23 at 01:35
  • GhostScript ps2pdf vs. ImageMagick convert:

    ps2pdf very cleanly just re-compresses the image assets.

    ❗️ Whereas convert in addition to that also adds a corresponding smask object (bitmap mask) to the PDF as pdfimages -list shows.

    This mask is redundant as is has the same pixel dimensions as the image, so effectively acts not as a mask at all, hence increases file size & rendering complexity unnecessarily.

    ❓ Does anyone know whether this is a bug or feature? And whether ImageMagick convert can be instructed to not produce such unnecessarily redundant masks?

    – porg Dec 18 '23 at 23:27
72

I needed to downsize a PDF that contained full color scans of a document. Each of my pages was a full color image as far as the file was concerned. They were images of pages containing text and images, but they were created by scanning to an image.

I used a combination of the below ghostscript command and one from another thread.

gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dDownsampleColorImages=true \
-dColorImageResolution=150 -dNOPAUSE  -dBATCH -sOutputFile=output.pdf input.pdf

This reduced the image resolution to 150dpi, cutting my file size in half. Looking at the document, there was almost no noticeable loss of image quality. The text is still perfectly readable on my 2012 Nexus7.

MadMike
  • 4,244
  • 8
  • 28
  • 50
mlitty
  • 729
  • 5
  • 2
  • 7
    +1 for down sampling images but keeping text as vectors. Made a huge difference in side without making my text pixelated. – Jason O'Neil Dec 08 '14 at 08:34
  • 2
    Fantastic that one can tune the resolution with this command - this gave me better results than just using dPDFSETTINGS=\screen – exchange May 13 '19 at 10:04
  • See also: https://stackoverflow.com/questions/9497120/how-to-downsample-images-within-pdf-file/9571488 – sanmai Aug 12 '20 at 08:44
  • 1
    +1 for the option that allows you to specify the exact resolution. Useful when you have a scanned pdf (all raster) and want to reduce the size keeping the file still readable. – Michele Piccolini Nov 18 '21 at 09:14
42

Here is a script for rewriting scanned pdfs:

#!/bin/sh

gs  -q -dNOPAUSE -dBATCH -dSAFER \
    -sDEVICE=pdfwrite \
    -dCompatibilityLevel=1.3 \
    -dPDFSETTINGS=/screen \
    -dEmbedAllFonts=true \
    -dSubsetFonts=true \
    -dColorImageDownsampleType=/Bicubic \
    -dColorImageResolution=72 \
    -dGrayImageDownsampleType=/Bicubic \
    -dGrayImageResolution=72 \
    -dMonoImageDownsampleType=/Bicubic \
    -dMonoImageResolution=72 \
    -sOutputFile=out.pdf \
     $1

You could customise it a bit to make it more reusable but if you only have one pdf, you could just replace $1 with your pdf filename and bung it in a terminal.

Oli
  • 293,335
  • 1
    Works a treat, thanks Oli. You've answered pretty much everything I've asked on here so far :-D – Rob Cowell Sep 01 '10 at 08:15
  • This is a good answer but in my case at least it takes a lot of time to convert a somewhat large (>10Mb) PDF file (more than a minute). – Gabriel Jun 12 '13 at 19:20
  • 1
    I'm not sure what happens, but a 30 MB PDF results a 68 MB file. Instead of reducing, it enlarges. Same output if using directly ps2pdf as stated in next answer. – Ed Villegas Jun 23 '13 at 18:08
  • 1
    @EdVillegas The only thing I can think of (to explain that sort of increase) is that the images are of a lower resolution than the ones being generated (72dpi). Or somehow embedding the fonts is sucking in all the fonts. – Oli Jun 25 '13 at 07:31
  • 1
    use pdfimages -list file.pdf to see the native images resolution. – vstepaniuk May 21 '20 at 10:53
38
  1. I use LibreOffice Draw to open the pdf.
  2. I then "export as pdf"
  3. And set "jpeg compression quality" to 50% and "image resolution" to 150 dpi

This will have a good result.

31

I usually use ps2pdf to do this (easier syntax), something like this:

ps2pdf -dPDFSETTINGS=/ebook BiggerPdf SmallerPDF

I use the following python script to reduce the size of all the pdf files in a dir in a production server (8.04). So it should work.

#!/usr/bin/python

import os

for fich in os.listdir('.'):
        if fich[-3:]=="pdf":
                os.system("ps2pdf -dPDFSETTINGS=/ebook %s reduc/%s" % (fich,fich))
Javier Rivera
  • 35,153
21

Best for me was

convert -compress Zip -density 150x150 input.pdf output.pdf

Other ways:

#### gs
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/ebook -dNOPAUSE -dQUIET -dBATCH -sOutputFile=output.pdf $INPUTFILE

### pdf2ps && ps2pdf
pdf2ps input.pdf output.ps && ps2pdf output.ps output.pdf

### Webservice
http://compress.smallpdf.com/de

regards

oxidworks
  • 339
17

For my other, pdfsizeopt-based answer, see here.

Referencing this answer and this answer, and after trying a bunch of the answers here, and doing a bunch of research and experimenting, I've come up with the following. Note that I've removed the -dCompatibilityLevel=1.4 part of the command used in some other answers here (including the most-upvoted answer) because this table indicates that 1.5 or 1.7 are automatically used for this setting today (27 Dec. 2020), and there's no need to override those values.

Use Ghostscript (gs) to compress input.pdf into output.pdf

3 Main levels of compression:
Note: you may also add -dQUIET to suppress all output to stdout. See: https://ghostscript.readthedocs.io/en/latest/Use.html.

  1. Low compression: 300 dpi (large file size)
    gs -sDEVICE=pdfwrite -dPDFSETTINGS=/printer -dNOPAUSE -dBATCH \
    -sOutputFile=output.pdf input.pdf
    
  2. [BEST in my testing] Medium compression (recommended): 150 dpi (medium file size)
    gs -sDEVICE=pdfwrite -dPDFSETTINGS=/ebook   -dNOPAUSE -dBATCH \
    -sOutputFile=output.pdf input.pdf
    
  3. High compression: 72 dpi (small file size--may produce grainy or unreadable results in some cases, so try it and give it a shot)
    gs -sDEVICE=pdfwrite -dPDFSETTINGS=/screen  -dNOPAUSE -dBATCH \
    -sOutputFile=output.pdf input.pdf
    

You can also add time in front of the command to see how long it takes (this works with any Linux command). Sample output:

$ time gs -sDEVICE=pdfwrite -dPDFSETTINGS=/ebook -dNOPAUSE -dBATCH -sOutputFile=out.pdf in.pdf
GPL Ghostscript 9.50 (2019-10-15)
Copyright (C) 2019 Artifex Software, Inc.  All rights reserved.
This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:
see the file COPYING for details.
Processing pages 1 through 15.
Page 1
Loading NimbusSans-Regular font from /usr/share/ghostscript/9.50/Resource/Font/NimbusSans-Regular... 5205104 3852122 2872760 1487237 3 done.
Page 2
Page 3
Page 4
Page 5
Page 6
Page 7
Page 8
Page 9
Page 10
Page 11
Page 12
Page 13
Page 14
Page 15

real 0m1.326s user 0m1.142s sys 0m0.048s

If you add -dQUIET to the command, none of the Ghostscript output is shown, and you get this (when using time in front):

$ time gs -sDEVICE=pdfwrite -dPDFSETTINGS=/ebook -dNOPAUSE -dBATCH -dQUIET -sOutputFile=out.pdf in.pdf

real 0m1.018s user 0m0.976s sys 0m0.040s

You can also use ps2pdf, which is a wrapper around gs, and produces very similar, but not exactly identical, results. I prefer to just use gs directly, as shown above, however.

  1. Low compression: 300 dpi (large file size)
    ps2pdf -dPDFSETTINGS=/printer input.pdf output.pdf
    
  2. Medium compression (recommended): 150 dpi (medium file size)
    ps2pdf -dPDFSETTINGS=/ebook   input.pdf output.pdf
    
  3. High compression: 72 dpi (small file size--may produce grainy or unreadable results in some cases, so try it and give it a shot)
    ps2pdf -dPDFSETTINGS=/screen  input.pdf output.pdf
    

PDF Compression Tests

Testing the gs commands above on output from my pdf2searchablepdf script here, I see the following:

  1. Low compression: has no significant effect since my script already produces 300 dpi output PDFs. So, a 3.8 MB 3 pg input PDF results in an output PDF of ~3.8MB.
  2. [BEST] Medium compression: compresses the file nicely! A 3.8 MB 3 pg input PDF results in an output PDF of ~0.95MB.
  3. High compression: may be too much. A 3.8 MB 3 pg input PDF results in an output PDF of ~0.37MB, BUT in my particular test is completely unreadable, since the input PDF was already of somewhat poor resolution to begin with. If you begin with a high quality/high resolution input PDF, you may have much better, readable results.

Ghostscript (gs) Documentation:

For all -d ("define") PDFSETTINGS available, see here: https://ghostscript.readthedocs.io/en/latest/VectorDevices.html#controls-and-features-specific-to-postscript-and-pdf-input. I have quoted that section below, except that I've added the DPI values for each setting in bold, as taken from this table here. You can refer to that table to see the dozens of lower-level settings chosen by gs for each PDFSETTINGS option.

Controls and features specific to PostScript and PDF input

-dPDFSETTINGS=configuration

Presets the "distiller parameters" to one of four predefined settings:

  • /screen (72 dpi) selects low-resolution output similar to the Acrobat Distiller (up to version X) "Screen Optimized" setting.
  • /ebook (150 dpi) selects medium-resolution output similar to the Acrobat Distiller (up to version X) "eBook" setting.
  • /printer (300 dpi) selects output similar to the Acrobat Distiller "Print Optimized" (up to version X) setting.
  • /prepress (300 dpi) selects output similar to Acrobat Distiller "Prepress Optimized" (up to version X) setting.
  • /default (72 dpi) selects output intended to be useful across a wide variety of uses, possibly at the expense of a larger output file.

You can also see definitions for various options on this page: https://ghostscript.readthedocs.io/en/latest/Use.html:

-dNOPAUSE
Disables the prompt and pause at the end of each page. Normally one should use this (along with -dBATCH) when producing output on a printer or to a file; it also may be desirable for applications where another program is "driving" Ghostscript.

-dBATCH
Causes Ghostscript to exit after processing all files named on the command line, rather than going into an interactive loop reading PostScript commands. Equivalent to putting -c quit at the end of the command line.

-dQUIET
Suppresses routine information comments on standard output. This is currently necessary when redirecting device output to standard output.

13

I strongly recommend pdfsizeopt.

It is much more efficient in terms of size reduction than any of the previous CLI and GUI software that I have tried (including convert, gs, pdftk, etc.) — although possibly slower with pngout activated —, and does not have some of their issues (no heavily pixelated/degraded images, no loss of metadata such as table of contents, etc.).

Now, if you need to attain a certain size whatever the consequences (inc. degrading images to a point of unreadability), it might not be the tool you need, but as an always-working go-to solution, to reduce unnecessary big sizes in PDFs without loosing in readability, information and acceptable image quality, I think it is the best option. (Note: I tend to use it after having first done a vectorization-OCR in Adobe Acrobat [the function used to be called "CleanScan"], which can have a dramatical size impact on some scanned text documents.)


I recommend the generic Unix install:

  1. Install all required dependencies:
  2. Download and install the executable:
    curl -L -o pdfsizeopt.single https://raw.githubusercontent.com/pts/pdfsizeopt/master/pdfsizeopt.single
    chmod +x pdfsizeopt.single
    cp pdfsizeopt.single /usr/local/bin/pdfsizeopt
    

Usage:

pdfsizeopt original.pdf [compressed.pdf]

Note: for Mac users finding this post (or Linuxbrew users), there is a Homebrew install formula:

brew install --HEAD pts/utils/pdfsizeopt
iNyar
  • 231
  • 2
    what MARVELOUS GEM of a software! Thank you very much for reccomending this :-) – luca76 Nov 13 '19 at 13:39
  • No luck. Running pdfsizeopt on a 3.8 MB 3pg 300 DPI output PDF file from my pdf2searchablepdf script, the size remained 3.8 MB (it got smaller by a few KB is all). – Gabriel Staples Dec 27 '20 at 03:51
  • 1
    @GabrielStaples: pdfsizeopt will not always reduce filesize significantly. If that is what I'm after (strongest reduction), I use other software (e.g., PDF Squeezer) that reduces image quality more drastically. pdfsizeopt is my default CLI solution for batch PDF resizing. – iNyar Mar 04 '21 at 18:33
  • 1
    For anyone struggling to install the jbig2, pngout, and sam2p dependencies, I've detailed full installation instructions for those in my answer here. – Gabriel Staples May 02 '23 at 23:27
6

For me the gs screen option was too bad, and the ebook one too big.

My original document contained text as colour and black and white images (depending on the page).

The best solution I did come up has been:

gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dDownsampleColorImages=true -dDownsampleGrayImages=true -dDownsampleMonoImages=true -dColorImageResolution=130 -dGrayImageResolution=130 -dMonoImageResolution=130 -r130 -dNOPAUSE  -dBATCH -sOutputFile=output_lr.pdf input.pdf

Note that the compression level is not linear.. if I was specifying 135 it didn't compressed, I did find 130 to be (in my case) the maximum resolution that achieves a compression.

Antonello
  • 735
  • 1
    The fact that specifying a target resolution of 135 didn’t reduce the file size is likely because of the ColorImageDownsampleThreshold option (and the other two for gray and monochrome), which defaults to 1.5 and tells ghostscript not to reduce the resolution of images whose resolution is not at least 1.5 times the target resolution. Your PDF probably contained images at 200 dpi, for which a target resolution of 135 dpi is a 1.48x decrease, but 130 dpi a 1.54x decrease.

    That’s probably because

    – Olivier 'Ölbaum' Scherler May 26 '23 at 06:47
6

I just encountered this problem myself. If using simple scan, select text mode for low resolution scans and you won't need to worry about the command line stuff. Just saying.

user179584
  • 77
  • 1
  • 1
  • 1
    This is the single answer in this thread that solved my problem. I downplayed Simplescan, but it really was the answer for me, instead of fighting against Xsane in what seemed to be an endless agony. – versvs Aug 31 '15 at 16:03
6

Control the compression quality:

#!/bin/sh
INPUT=$1; shift
OUTPUT=$1; shift
GS_BIN=/usr/bin/gs
QFACTOR="0.40"

# Image Compression Quality
#
# Quality HSamples VSamples QFactor
# Minimum [2 1 1 2] [2 1 1 2] 2.40
# Low     [2 1 1 2] [2 1 1 2] 1.30
# Medium  [2 1 1 2] [2 1 1 2] 0.76
# High    [1 1 1 1] [1 1 1 1] 0.40
# Maximum [1 1 1 1] [1 1 1 1] 0.15 

${GS_BIN} -dBATCH -dSAFER -DNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=${OUTPUT} -c "<< /ColorImageDict << /QFactor ${QFACTOR} /Blend 1 /HSample [1 1 1 1] /VSample [1 1 1 1] >> >> setdistillerparams" -f ${INPUT}
muru
  • 197,895
  • 55
  • 485
  • 740
  • ...so both INPUT and OUTPUT are the same argument? You might want to add usage guidelines. – mikewhatever Apr 02 '16 at 11:46
  • 2
    Note the shift. First parameter is input file, second is the output file and rest of the parameters will be passed to gs as is. – Mikko Rantalainen May 13 '16 at 12:53
  • I think you want /HSamples and /VSamples, not /HSample and /VSample. See e.g. ps2pdf docs or the PostScript language reference manual. Also perhaps worth noting that the allowed QFactor range is 0 to 1,000,000 and lower values produce higher quality. – Pont Aug 02 '17 at 08:31
5

Since this link was first for me when I searched in Google, I thought I'd add one more possibility. None of the above solutions was working for me on a pdf exported from Inkscape (15 mb), but I was at last able to shrink it down to 1 mb by opening it in GIMP, and exporting as pdf again.

Another option that came close (but text was a little fuzzy) was ImageMagick's convert utility:

convert -compress Zip input.pdf output.pdf
mbroshi
  • 158
  • 1
  • 3
  • 1
    I guess this is what you meant by "a little fuzzy", but just to clarify, convert -compress Zip appeared to rasterise all vectors. – Sparhawk Feb 22 '15 at 03:39
5

I was facing the same problem, and was glad to find this thread. Specifically I had a pdf generated from scanned images, and needed to reduce its byte size by a factor of 6.

Unfortunately, none of the solutions above worked :(. Then I realized that somewhere in the scanner->jpeg->pdf process the size of the page had gotten bloated by a factor of aprx 4. The documents I scanned were all Letter sized, but the pdf had size of

identify -verbose doc_orig.pdf | grep "Print size"
 Print size: 35.4167x48.7222

I got the desired results finally with a "convert" command that did both resizing as well as compression steps in one:

convert -density 135x135 -quality 70 -compress jpeg -resize 22.588% doc_orig.pdf doc_lowres.pdf

Note that doc_orig had density of 72x72 dpi.

Kalpit
  • 59
  • 1
  • 1
  • Your answer is a life saver, Kalpit. I was faced with the same problem, and nothing else was even making a dent in the file size. By resizing my pages, I went from 40MB to 2MB. Hurray! – Nicolas Payette Sep 12 '20 at 15:38
4

In the end I wrote my own bash script to solve this, it uses mogrify, convert and gs to extract pdf pages as png, resize them, convert them to 1-bit bmp and then rebuild them as pdf. File size reduction can be over 90%. Available at http://www.timedicer.co.uk/programs/help/pdf-compress.sh.php.

scoobydoo
  • 152
3

If converting to djvu would also be ok and if no colors are involved, you could try the following:

Convert the pdf to jpg files using pdfimages -j

If you get pbm files instead, you should do the intermediate step:

for FILENAME in $(ls *.pbm); do convert $FILENAME ${FILENAME%.*}.jpg ;done

The convert command is from the imagemagick package.

Then use scantailor to make tif's out of it.

In a last step you go to scantailors out direcory (where the tif's are located) and apply djvubind to that directory.

This should reduce the filesize drastically without big quality loss of the text. If you want finer control over the ocr-backend, you may try djvubind --no-ocr and use ocrodjvu to add the ocr layer afterwards.

If you have color's in your document things get a bit more complicated. Instead of djvubind you could use didjvu and in scantailor you have to change to mixed mode and select sometimes color-images manually.

student
  • 2,312
2

You can try this :

$ time pdftk myFile.pdf output myFile__SMALLER.pdf compress
GC Warning: Repeated allocation of very large block (appr. size 16764928):
    May lead to memory leak and poor performance.
GC Warning: Repeated allocation of very large block (appr. size 8384512):
    May lead to memory leak and poor performance.
GC Warning: Repeated allocation of very large block (appr. size 11837440):
    May lead to memory leak and poor performance.
GC Warning: Repeated allocation of very large block (appr. size 8384512):
    May lead to memory leak and poor performance.
GC Warning: Repeated allocation of very large block (appr. size 33525760):
    May lead to memory leak and poor performance.
GC Warning: Repeated allocation of very large block (appr. size 7254016):
    May lead to memory leak and poor performance.
GC Warning: Repeated allocation of very large block (appr. size 34041856):
    May lead to memory leak and poor performance.
GC Warning: Repeated allocation of very large block (appr. size 33525760):
    May lead to memory leak and poor performance.

real    0m23.677s
user    0m23.142s
sys     0m0.540s
$ du myFile*.pdf
108M    myFile.pdf
74M     myFile__SMALLER.pdf

It is faster than gs but compresses upto 30% in this case for a 107.5MiB input file.

SebMa
  • 2,291
2

I use this zsh function for compressing scanned documents:

pdf-compress-gray () {
    local input="${1}"
    local out="${2:-${input:r}_cg.pdf}"
    local dpi="${pdf_compress_gray_dpi:-90}"
gs -q -dNOPAUSE -dBATCH -dSAFER -sProcessColorModel=DeviceGray -sColorConversionStrategy=Gray -dDownsampleColorImages=true -dOverrideICC -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/screen -dColorImageDownsampleType=/Bicubic -dColorImageResolution=$dpi -dGrayImageDownsampleType=/Bicubic -dGrayImageResolution=$dpi -dMonoImageDownsampleType=/Bicubic -dMonoImageResolution=$dpi -sOutputFile=&quot;$out&quot; &quot;$input&quot;

}

Usage:

[pdf_compress_gray_dpi=100] pdf-compress-gray input.pdf [output.pdf]
HappyFace
  • 325
2

I normally simply use

gs -dQUIET -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -dPDFSETTINGS=/printer \
   -sOutputFile=output.pdf input.pdf

I went through many questions one how to reduce the size of a pdf on AskUbuntu, Stack Overflow and Unix & Linux SE and I wondered what all those options proposed in the answers meant.

Some are Interaction-related parameters:

-dQUIET
-dBATCH
-dNOPAUSE

Some are Device and output selection parameters:

-sDEVICE
-sOutputFile

Some are Common controls and features specific to device PDFWRITE:

-r<resolution>
-dCompressFonts

This important one presets the "Distiller Parameters", Adobe's documented parameters for controlling the conversion into PDF, to one of four predefined settings (screen, ebook, printer, prepress)

-dPDFSETTINGS

All the ones below are automatically preset according to -dPDFSETTINGS, as per this table. A command suggest by Kurt Pfeifle can be used to check these values. You can fine tune them if you want:

-dCompatibilityLevel
-dAutoRotatePages
-dEmbedAllFonts
-dSubsetFonts
-sColorConversionStrategy
-dDownsampleColorImages
-dDownsampleGrayImages
-dDownsampleMonoImages
-dColorImageResolution
-dGrayImageResolution
-dMonoImageResolution
-dColorImageDownsampleType
-dGrayImageDownsampleType
-dMonoImageDownsampleType
Glorfindel
  • 971
  • 3
  • 13
  • 20
toliveira
  • 349
2

load image or even pdf file into inkscape.

From inkscape: Save in vector format (as the native .svg).

Import vector files into scribus, edit layout and export/save as .pdf from there

ape
  • 29
  • 1
1

pdfsizeopt full installation instructions

For my other, gs-based answer, see here.

For anyone trying to follow @iNyar's answer to install and try out the pdfsizeopt tool, installing the dependencies is tricky. If you run the tool without installing all dependencies, pdfsizeopt won't run. Here are the errors I got when trying to run it without jbig2, pngout, and sam2p installed:

$ ./pdfsizeopt in.pdf out.pdf
info: This is pdfsizeopt ZIP rUNKNOWN size=69856.
info: prepending to PATH: /home/gabriel/GS/Jobs/Edge Autonomy/Onboarding [29 May 2023 start date!]
error: image optimizer not found on PATH: jbig2
error: image optimizer not found on PATH: pngout
error: image optimizer not found on PATH: sam2p
error: image optimizer not found on PATH: sam2p
fatal: not all image optimizers found (see above), ignore with --do-require-image-optimizers=no

So, the solution is to install the jbig2, pngout, and sam2p dependencies manually. Here are the full installation instructions, therefore, for pdfsizeopt:

Tested in Linux Ubuntu 20.04.

# ================================================
# 1. Install `pdfsizeopt` dependencies
# ================================================

--------------------

jbig2:

- https://github.com/agl/jbig2enc

--------------------

install dependencies

sudo apt update sudo apt install libleptonica-dev

git clone https://github.com/agl/jbig2enc.git cd jbig2enc ./autogen.sh ./configure time make
sudo make install

ensure it is installed

jbig2 --version

--------------------

pngout

- http://advsys.net/ken/utils.htm#pngout

- http://www.jonof.id.au/kenutils.html

--------------------

download it

wget https://www.jonof.id.au/files/kenutils/pngout-20200115-linux-static.tar.gz

extract it

tar -xf pngout-20200115-linux-static.tar.gz cd pngout-20200115-linux-static

install it

sudo cp -i amd64/pngout-static /usr/local/bin/pngout

ensure it's installed

pngout

--------------------

sam2p

- https://github.com/pts/sam2p

--------------------

install dependencies

sudo apt install libgif-dev

go here and find the latest release:

https://github.com/pts/sam2p/releases

Use the correct URL from there in the following commands.

download it

wget https://github.com/pts/sam2p/releases/download/v0.49.4/sam2p-0.49.4.tar.gz

extract it

tar -xf sam2p-0.49.4.tar.gz cd sam2p-0.49.4 ./configure time make sudo make install

ensure it's installed by checking its version

sam2p --version

================================================

2. Install pdfsizeopt

================================================

curl -L -o pdfsizeopt https://raw.githubusercontent.com/pts/pdfsizeopt/master/pdfsizeopt.single chmod +x pdfsizeopt sudo cp -i pdfsizeopt /usr/local/bin/pdfsizeopt

Now use it:

# 1. Check the help menu to ensure it's installed
pdfsizeopt --help 2>&1 | less -RFX

2. Use it to optimize in.pdf into out.pdf

pdfsizeopt in.pdf out.pdf

Example run and output:

$ pdfsizeopt in.pdf out.pdf
info: This is pdfsizeopt ZIP rUNKNOWN size=69734.
info: prepending to PATH: /home/gabriel/Downloads/Install_Files/pdfsizeopt/pdfsizeopt/pdfsizeopt_libexec
info: loading PDF from: in.pdf
info: loaded PDF of 1955931 bytes
info: separated to 18 objs + xref + trailer
info: parsed 18 objs
info: found 0 Type1 fonts loaded
info: found 0 Type1C fonts loaded
info: optimized 6 streams, kept 6 zip
info: compressed 0 streams, kept 0 of them uncompressed
info: saving PDF with 18 objs to: out.pdf
info: generated object stream of 523 bytes in 9 objects (21%)
info: generated 1953795 bytes (100%)

Result of running pdfsizeopt above: no change. in.pdf is 2.0 MB, and out.pdf is 2.0 MB. Then again, gs, as described in my other answer, didn't work on this particular PDF either. I don't know why.
Note: for anyone who wants to experiment with this, here is how I created the PDF I'm experimenting on:

  1. Take 3 photos with your phone (Google Pixel 5 in my case). Let Google Photos upload them to the cloud.
  2. Download them into a directory.
  3. Run pdf2searchablepdf -c "path/to/dir" on that dir with the 3 images, to perform OCR on them and combine them into a single PDF. You'll see that
    1. My "compressed" PDF output, using gs under-the-hood, did nothing.
    2. All files are the same size, even though one should be large, medium, small, etc.
  4. Try to compress them with pdfsizeopt and it does nothing for me on these PDFs as well.

References:

  1. @iNyar's answer

  2. ChatGpt helped me a ton to figure out how to install a lot of this mess, in particular calling sudo apt install libgif-dev prior to running time make to build sam2p, else I'd get this error:

    g++ -s -O2 -DHAVE_CONFIG2_H   -fsigned-char -fno-rtti -fno-exceptions -ansi -pedantic -Wall -W -Wextra -c gensi.cpp
    g++ -s   sam2p_main.o appliers.o crc32.o in_ps.o in_tga.o in_pnm.o in_bmp.o in_gif.o in_lbm.o in_xpm.o mapping.o in_pcx.o in_jai.o in_png.o in_jpeg.o in_tiff.o rule.o minips.o encoder.o pts_lzw.o pts_fax.o pts_defl.o error.o image.o gensio.o snprintf.o gensi.o -o sam2p
    /usr/bin/ld: appliers.o: in function `out_gif89a_work(GenBuffer::Writable&, Rule::OutputRule*, Image::SampledInfo*)':
    appliers.cpp:(.text+0x2025): undefined reference to `out_gif_write(GenBuffer::Writable&, Image::Indexed*)'
    collect2: error: ld returned 1 exit status
    make: *** [Makedep:66: sam2p] Error 1
    

    All words and commands in this answer are my own, however, and I tested everything in this answer personally.

1

Super simple PDF compress tool: GitHub page.

Installation on Ubuntu:

sudo add-apt-repository ppa:jfswitz/released

sudo apt-get update

sudo apt-get install pdf-compressor

It uses ghostscript.

Raphael
  • 8,035
John
  • 34
  • 2
0

I used below commands but it didnt compress my pdf file substantially. Some times some of the portion was blackened after compression.

  1. gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/ebook -dNOPAUSE -dQUIET -dBATCH -sOutputFile=output.pdf $INPUTFILE

  2. "ps2pdf -dPDFSETTINGS=/ebook %s %s" % (input_file_path, out_file_path)

After too much wandering over the web I just couldn't find the right compression library. I came across pdfcompressor.com. This is just awesome website. It compresses the pdf by 95% ( 15Mb of files). So I used selenium and Tor to automate the compression. Checkout my Github Repository. [GITHUB] (https://github.com/gugli28/PdfCompressor)

Prince
  • 21