14

I am using this command:

wget -nd -e robots=off --wait 0.25 -r -A.pdf http://yourWebsite.net/

but I can't get PDFs from the website.

For example I have a root domain name:

www.example.com

and this site have PDFs, DOCs, HTMLs, etc. I want to download all PDFs by inserting only the root domain name, not the exact address of the download page.

PEDY
  • 145
  • 1
  • 1
  • 8

2 Answers2

10

The following command should work:

wget -r -A "*.pdf" "http://yourWebsite.net/"

See man wget for more info.

Radu Rădeanu
  • 169,590
  • @Rădeanu.Not work . It get html page (index.html) and then stop process. – PEDY May 18 '14 at 13:22
  • 1
    @PEDY the PDFs files must be linked by the index.html file, directly or indirectly, for wget to be able to find them. If they are just on the server, served by some script or dynamic php thing, wget will not be able to find them. The same problem happen if you want your PDF files searched by Google or similar thing; we used to have hidden pages with all the files statically linked to allow this... – Rmano May 18 '14 at 15:49
1

In case the above doesn't work try this: (replace the URL)

lynx -listonly -dump http://www.philipkdickfans.com/resources/journals/pkd-otaku/ | grep pdf | awk '/^[ ]*[1-9][0-9]*\./{sub("^ [^.]*.[ ]*","",$0); print;}' | xargs -L1 -I {} wget {} 

you might need to install lynx:

sudo apt install lynx