59

I have a PDF file that was the result of the scan of a book.

In this file 2 pages of the book correspond to 1 in the PDF. So when I see a page in the PDF file I'm actually seeing 2 pages of the book.

enter image description here

(original)

I would like to know if there's any way to convert this file to another PDF where 1 page of the book corresponds to 1 page of the PDF i.e. the normal situation.

fossfreedom
  • 172,746
JGNog
  • 779

9 Answers9

65

You can use mutool, a MuPDF command-line tool (sudo apt-get install mupdf-tools):

mutool poster -x 2 input.pdf output.pdf

You can also use -y if you want to perform a vertical split.

Peque
  • 1,143
  • 6
    In ubuntu 16.04 the package name is mupdf-tools (so.. sudo apt-get install mupdf-tools). – franzlorenzon Nov 03 '16 at 09:49
  • 2
    Did the job very good and very fast! Unfortunatly I couldn't find a way to use it to remove the first, empty page from the output PDF. – Martin Scharrer Nov 18 '18 at 19:32
  • @MartinScharrer mutool clean input.pdf output.pdf 2-N – Peque Jul 01 '19 at 08:21
  • mutool poster only virtually crops the image. Each page has still has both pages but one is hidden. This can cause problems later when using other tools to OCR. gscan2pdf showed both pages and a bunch of programs crashed. – whitis Jan 12 '20 at 02:06
  • @whitis - That doesn't make much sense. The size of the output pdf is the same as that of the input, if indeed each separate page in output had both halves of the uncut page of the input, we should have an output of doubled size. – cipricus Jan 04 '22 at 08:14
  • The problem with this command is that when scanned books have uneven position of the line of separation between the two halves, it may cut one of the pages wrong. Scantailor (mentioned in another answer) can "see" the real separation between the two pages and it also has more detailed options. – cipricus Jan 04 '22 at 08:36
  • +1 thank you for your answer. I would like to ask you a question. The user want to split along the vertical axis. Could you explain why do you suggest an horizontal split ? – F. Zer Jan 17 '22 at 20:01
  • @F.Zer I guess it all depends on your reference. Vertical split could mean splitting the input so that the output has one page above the other (vertically stacked). Or it could mean splitting with a vertical line (horizontally stacked). Anyways you can use -x or -y depending on your use case. ^^ – Peque Jan 17 '22 at 20:27
  • Thank you, @Peque ! – F. Zer Jan 17 '22 at 21:27
29

Try Gscan2pdf, which you can download from the Software Centre or which you can install from command line sudo apt-get install gscan2pdf.

Open Gscan2Pdf:

  1. file > import your PDF file;

    import

    Now you have a single page (see the left column):

    single

  2. then tools > Clean up;

    clean up

  3. select double as layout and #output pages as 2, then click OK;

    split

  4. Gscan2pdf splits your document (among other things, it will also clean it up and deskew it etc.) Now you have two pages:

    double

  5. Save your PDF file if you're satisfied with the result.
neydroydrec
  • 4,620
  • I've been looking for an easier way to use unpaper without having to produce ppm files and this is it. Very helpful answer. – To Do Jan 26 '12 at 09:53
  • 8
    For future readers: this doesn't do what you want with non-image PDFs -- only the images are imported. gscan2pdf looks great for scanning, though :). – Andrew Aylett Nov 13 '12 at 22:00
  • gscan2pdf v.2.13.2 no longer has import option. You simply use file > open and then tools > Cleanup – helcim Jan 14 '24 at 21:59
15

I would use Briss. It lets you select various regions of each page, each of which to turn into a new page.

enter image description here

frabjous
  • 6,431
  • 1
    I accepted the answer from Benjamin and not yours simply because Briss is not mature yet. I tried Briss and it looks good. But gscan2pdf installation is much quicker and cleaner.

    Thank you for your contribution, anyway!

    – JGNog Sep 05 '11 at 23:04
  • 1
    I've been using Briss for over a year now. Seems reasonably mature to me. – frabjous Sep 05 '11 at 23:12
  • I've been using briss for years and it's great. It is currently being maintained as Briss-2.0 here: (https://github.com/mbaeuerle/Briss-2.0). I highly recommend it. (Note that when you load your file, you only have one 'box' per page. You can resize that box and create a second one by simply clicking on the page.) – mikemtnbikes Aug 27 '21 at 17:06
4

Another option is ScanTailor. This program is particularly well suited to processing several scans at a time.

apt-get install scantailor

It unfortunately only works on image file inputs, but it's simple enough to convert a scanned PDF to a jpg. Here's a one-liner that I used for converting a whole directory of PDFs into jpgs. If a PDF has n pages, it makes n jpg files.

for f in ./*.pdf; do gs -q -dSAFER -dBATCH -dNOPAUSE -r300 -dGraphicsAlphaBits=4 -dTextAlphaBits=4 -sDEVICE=png16m "-sOutputFile=$f%02d.png" "$f" -c quit; done;

I had screenshots ready to share, but I don't have enough rep to post them.

ScanTailor outputs to tif, so if you want the files back in PDF you can use this to make a PDF for each page.

for f in ./*.tif; do tiff2pdf "$f" -o "$f".pdf -p letter -F; done;

Then you can use this one-liner, or an application like PDFShuffler to merge any or all files into one PDF.

gs -q -sPAPERSIZE=letter -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=output.pdf *.pdf

Curtis
  • 141
  • This is a very good solution when detailed settings are needed (e.g. when the split between pages varies, or is not exactly in the middle. More on Scantailor in this answer and linked ones. To extract pages as jpg pdftoppm is best (see details in links). – cipricus Jan 04 '22 at 08:30
  • Scantailor is amazing! But unfortunately not available in Ubuntu 22.04+ :-( – starbeamrainbowlabs Apr 02 '23 at 17:28
2

A command line solution using ImageMagick:

  1. Split the PDF into individual images, here at 300 dpi resolution:

     convert -density 300 orig.pdf page.png
    
  2. Split each page image into a left and right image:

     for file in page-*.png;
       do convert "$file" -crop 50%x100% "$file-split.png";
     done
    
  3. Rename the page-###-split-#.png files to just 001.png, 002.png etc.:

     ls page-*-split-*.png | cat -n | 
       while read n f; do mv "$f" $(printf "%03d.png" $n); done
    
  4. Combine the resulting page images into a PDF again:

     convert [0-9][0-9][0-9].png result.pdf
    

Sources, variations and further tips:

tanius
  • 6,303
  • 1
  • 39
  • 49
1

Here is a python script for this.

https://gist.github.com/tshrinivasan/23d8e4986cbae49b8a8c

0

Sejda can do that either using its web interface or command line interface (open source). The task is called splitdownthemiddle

-1

You could use okular or any pdf reader and then use print to file and select options and copies-> pages . Select your interested pages and then give print. It will cut the selected pages . Simple and easy !!

Knight71
  • 99
  • 2
-2

There is a wonderful program scankromsator. It is free and works quite well through wine. More information here.

oromay
  • 1