163

How can I download files (that are listed in a text file) using wget or some other automatic way?

Sample file list:

www.example.com/1.pdf
www.example.com/2.pdf
www.example.com/3.pdf
rav_kr
  • 124
  • 4
Sourav
  • 2,305

8 Answers8

274

wget has a built-in flag for this: wget -i your_list, where your_list is a file containing URL's delimited by linebreaks. You can find this kind of thing by reading man wget

aureianimus
  • 3,536
91

Get them in parallel with

cat urlfile | parallel --gnu "wget {}"

By default it will run as many processes as you have cores, you can probably ramp this up another 10x if you really want to pull them down quickly by adding "-j 20" after parallel.

meawoppl
  • 1,054
  • 1
    I for one just can't get it to work. I don't see any proc spawned, switching echo for wget doesn't output anything – Jakub Bochenski May 09 '14 at 19:10
  • Some earlier ubuntus have a dumb defect in this space: http://stackoverflow.com/questions/16448887/gnu-parallel-not-working-at-all – meawoppl May 11 '14 at 02:30
  • I posted a change that should fix the above. – meawoppl May 20 '14 at 01:15
  • 2
    Note with the 'it will run as many processes as you have cores' - network bandwidth is likely going to be more of a limiting factor. – Wilf Jun 21 '14 at 17:10
  • 2
    It really depends. For a large number of small files this can be almost an order of magnitude faster, as most of the transfer time is the handshake/TCP round trip's. Also in the situation where you are downloading from a number of smaller hosts, sometime the per connection bandwidth is limited, so this will bump things up. – meawoppl Jun 23 '14 at 17:22
  • It works, but shows no output to the console. Is it possible to have some visual feedback? – lolmaus - Andrey Mikhaylov Sep 29 '14 at 11:11
  • OMG, after finishing, it spitted out ALL wget output, thousands of lines showing progress. So, the command is extremely useful but has poor output. Is there a way to improve the output? – lolmaus - Andrey Mikhaylov Sep 29 '14 at 16:44
  • Hahaha. That is kinda funny. There is a way to silence wget, try adding the –quiet flag in there? – meawoppl Oct 07 '14 at 02:42
  • Also, I suspect you could port the output of wget to /dev/null if nothing else. – meawoppl Oct 16 '14 at 11:58
  • 2
    This is pretty useful if you want to use a list of relative URLs (resource ID without hostnames) with different hostnames, example:

    cat urlfile | parallel --gnu "wget http://example1.com{}" and cat urlfile | parallel --gnu "wget http://example2.com{}"

    – Mauricio Sánchez May 14 '15 at 02:21
  • Can this take advantage of keep-alive? – Gardner Bickford Aug 05 '16 at 21:31
  • Also, I should report in all fairness that I typically used this over the 500Mb connection my work hosts. – meawoppl Sep 26 '16 at 23:41
  • Nice trick, super fast – bhappy Nov 19 '16 at 08:32
  • 1
    One might add that flooding a website for a massive amount of parallel requests for large files is not particularly nice. Doesn't matter for big sites, but if it's just a smaller one you should take care. – Magnus Sep 19 '19 at 10:02
14

parallel has a built-in flag --arg-file (-a) that will use an input-file as the source, so you can avoid cat |. You can use

parallel --gnu -a urlfile wget

Or simply parallel --gnu wget < urlfile

yxogenium
  • 141
10
xargs -i wget 'http://{}'  < your_list
5
awk '{print "http://" $0;}' list.txt | xargs -l1 wget

where list.txt is your list file

cbix
  • 187
5

I saw Florian Diesch's answer.

I got it to work by including the parameter bqc in the command.

xargs -i wget -bqc 'http://{}' < download.txt

All downloads started in parallel in the background.

  • -b: Background. Go to background immediately after start
  • -q: Quiet. Turn off wget's output
  • -c: Continue. Continue getting a partially-downloaded file
muru
  • 197,895
  • 55
  • 485
  • 740
1

Link file links.txt

Command for down load all links file

cat links.txt | wget -i
Kulfy
  • 17,696
0

I just tested this:

xargs -a download_file -L1 wget

It works for me. Links inside the txt file must be in separate lines.

Kulfy
  • 17,696