11

Is there any way to convert a web page and its sub pages into one PDF file?

pa4080
  • 29,831
Tarek
  • 143
  • 1
    Please [edit] your question to add some details of exactly what you want. Your comments on pa4080's answer suggest you have some specific requirements that aren't clear from the question. – Zanna Aug 03 '17 at 20:12
  • Sorry for my English, then I have php files that represent pages of a website, these files are grouped within various subdirectories, I would like to create a single pdf containing the text of all formatted files as if it were displayed In the browser. – Tarek Aug 03 '17 at 20:19

1 Answers1

17

Save a list of Web pages as PDF file

  • First install wkhtmltopdf conversion tool (this tool requires desktop environment; source):

    sudo apt install wkhtmltopdf 
    
  • Then create a file that contains a list of URLs of multiple target web pages (each on new line). Let's call this file url-list.txt and let's place it in ~/Downloads/PDF/. For example its content could be:

    https://askubuntu.com/users/721082/tarek
    https://askubuntu.com/users/566421/pa4080
    
  • And then run the next command, that will generate a PDF file for each site URL, located into the directory where the command is executed:

    while read i; do wkhtmltopdf "$i" "$(echo "$i" | sed -e 's/https\?:\/\///' -e 's/\//-/g' ).pdf"; done < ~/Downloads/PDF/url-list.txt
    

    The result of this command - executed within the directory ~/Downloads/PDF/ - is:

    ~/Downloads/PDF/$ ls -1 *.pdf
    askubuntu.com-users-566421-pa4080.pdf
    askubuntu.com-users-721082-tarek.pdf
    
  • Merge the output files by the next command, executed in the above directory (source):

    gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -dPDFSETTINGS=/prepress -sOutputFile=merged-output.pdf $(ls -1 *.pdf)
    

    The result is:

    ~/Downloads/PDF/$ ls -1 *.pdf
    askubuntu.com-users-566421-pa4080.pdf
    askubuntu.com-users-721082-tarek.pdf
    merged-output.pdf
    

Save an entire Website as PDF file

  • First we must create a file (url-list.txt) that contains URL map of the site. Run these commands (source):

    TARGET_SITE="https://www.yahoo.com/"
    wget --spider --force-html -r -l2 "$TARGET_SITE" 2>&1 | grep '^--' | awk '{ print $3 }' | grep -v '\.\(css\|js\|png\|gif\|jpg\)$' > url-list.txt
    
  • Then we need go through the steps from the above section.

Create a script that will Save an entire Website as PDF file (recursively)

  • To automate the process we can bring all together in a script file.

  • Create an executable file, called site-to-pdf.sh:

    mkdir -p ~/Downloads/PDF/
    touch ~/Downloads/PDF/site-to-pdf.sh
    chmod +x ~/Downloads/PDF/site-to-pdf.sh
    nano ~/Downloads/PDF/site-to-pdf.sh
    
  • The script content is:

    #!/bin/sh
    TARGET_SITE="$1"
    wget --spider --force-html -r -l2 "$TARGET_SITE" 2>&1 | grep '^--' | awk '{ print $3 }' | grep -v '\.\(css\|js\|png\|gif\|jpg\|txt\)$' > url-list.txt
    while read i; do wkhtmltopdf "$i" "$(echo "$i" | sed -e 's/https\?:\/\///' -e 's/\//-/g' ).pdf"; done < url-list.txt
    gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -dPDFSETTINGS=/prepress -sOutputFile=merged-output.pdf $(ls -1 *.pdf)
    

    Copy the above content and in nano use: Shift+Insert for paste; Ctrl+O and Enter for save; Ctrl+X for exit.

  • Usage:

    enter image description here


The answer to the original question:

Convert multiple PHP files to one PDF (recursively)

  • First install the package enscript, which is a 'regular file to pdf' conversion tool:

    sudo apt update && sudo apt install enscript
    
  • Then run the next command, that will generate file called output.pdf, located into directory where the command is executed, which will contains the content of all php files within /path/to/folder/ and its sub-directories:

    find /path/to/folder/ -type f -name '*.php' -exec printf "\n\n{}\n\n" \; -exec cat "{}" \; | enscript -o - | ps2pdf - output.pdf
    
  • Example, from my system, that generated this file:

    find /var/www/wordpress/ -type f -name '*.php' -exec printf "\n\n{}\n\n" \; -exec cat "{}" \; | enscript -o - | ps2pdf - output.pdf
    
pa4080
  • 29,831
  • To display the page as if it were html? – Tarek Aug 03 '17 at 18:57
  • @Tarek, please, be more specific. You mean not the PHP code but the result that you see into the web browser or the HTML output from the PHP code? – pa4080 Aug 03 '17 at 18:58
  • For example, if I download a php page "www .... com / index.php", how do I create a pdf from this view as in the browser and not in PHP code? – Tarek Aug 03 '17 at 19:03
  • @Tarek, you mean that you have saved a web page as and you want to convert it in PDF? If so, why not just save it as a PDF? – pa4080 Aug 03 '17 at 19:10
  • Because I need a recursive solution to use for entire sites... – Tarek Aug 03 '17 at 19:33
  • @Tarek, I've updated the answer with a way that allows you to save an entire website as pdf. – pa4080 Aug 04 '17 at 08:00
  • Perfect, just what I needed, thank you for the help. Congratulations you're great! – Tarek Aug 04 '17 at 09:14