13

I have a URL like this:

http://dl.minitoons.ir/longs/Khumba (2013) [EN] [BR-Rip 720p] - [www.minitoons.ir].rar

I want to download this URL using wget. If I pass it directly to wget, everything goes well. But I am in a situation that I have only the encoded versions of download URLs. If I pass the encoded version of the URL above to wget, it throws the following error:

$ wget "http%3A%2F%2Fdl.minitoons.ir%2Flongs%2FKhumba%20(2013)%20%5BEN%5D%20%5BBR-Rip%20720p%5D%20-%20%5Bwww.minitoons.ir%5D.rar"
wget: unable to resolve host address `http://dl.minitoons.ir/longs/khumba (2013) [en] [br-rip 720p] - [www.minitoons.ir].rar'

Notice that wget changed the casing of URL (for example Khumba to khumba). What should I do to solve this problem?

Braiam
  • 67,791
  • 32
  • 179
  • 269
melmi
  • 365

5 Answers5

20

As this is annoyingly so common, there are various converters available - e.g. this site. You can use these to decode the URL - so it will convert this:

http%3A%2F%2Fdl.minitoons.ir%2Flongs%2FKhumba%20(2013)%20%5BEN%5D%20%5BBR-Rip%20720p%5D%20-%20%5Bwww.minitoons.ir%5D.rar

to:

http://dl.minitoons.ir/longs/Khumba (2013) [EN] [BR-Rip 720p] - [www.minitoons.ir].rar

It would be niCe to have a command line version though...

EDIT:

Found a command line version - basically:

echo "http%3A%2F%2F-REST-OF-URL" | sed -e's/%\([0-9A-F][0-9A-F]\)/\\\\\x\1/g' | xargs echo -e

This can be implemented in a script like this to decode the URL:

#!/bin/bash
echo "$@" | sed -e's/%\([0-9A-F][0-9A-F]\)/\\\\\x\1/g' | xargs echo -e
exit

which if saved and made executable, works quite nicely.

also this script, which will download the UL as well:

#!/bin/bash
echo "$@" | sed -e's/%\([0-9A-F][0-9A-F]\)/\\\\\x\1/g' | xargs echo -e | wget -c -i -
exit

N.B. I think the case the URL is in is not important for most sites - e.g. HTTP://WWW.UBUNTU.COM

Wilf
  • 30,194
  • 17
  • 108
  • 164
  • 4
    python -c 'import urllib2; print urllib2.unquote("'${URL}'")' does approximately the same, if you put your url in environment variable URL. – taneli Mar 02 '14 at 20:54
  • 3
    The case for the domain is generally not important, but the case for what comes after can be if the server uses case-sensitive routing or does not redirect URLs with different case to the actual page. Case in point: http://developer.android.com/reference/android/view/View.html versus http://developer.android.com/reference/android/view/view.html. – JAB Mar 03 '14 at 15:32
7

You should use it like this

wget "http://dl.minitoons.ir/longs/Khumba%20(2013)%20[EN]%20[BR-Rip%20720p]%20-%20[www.minitoons.ir].rar"`

Just replace every space with %20 . Or Better copy your original link and paste it in Chromium Browser address bar. It will automatically format it for you. Now copy it from there to your terminal.

Wilf
  • 30,194
  • 17
  • 108
  • 164
g_p
  • 18,504
  • 2
    This method can be a security risk in some cases. If you already have Chromium open, it's probably the fastest way to press [Ctrl]+[Shift]+[J] (for the dev console) and insert decodeURIComponent("your-decoded-URI"). – ComFreek Mar 03 '14 at 09:02
4

Wget expects the URL to have the following format:

[protocol://]host/path

The protocol is optional. In absence of protocol, Wget assumes HTTP.

Wget accepts percent-encoded URLs just fine, but the delimiters between protocol, host and path cannot be percent-encoded.

This is also why Wget changed the casing of the URL. Since it didn't find a single unencoded slash, it assumes that

http://dl.minitoons.ir/longs/khumba (2013) [en] [br-rip 720p] - [www.minitoons.ir].rar

is the hostname (which would be case-insensitive). The actual hostname is, of course, dl.minitoons.ir.

For an automatic solution, substituting %3A%2F%2F and the %2F after the hostname by :// and / would suffice, but it's just as easy to decode the URL at one. @Wilf already gave a good solution for this.

However, if you're going to type the Wget command manually, just do this:

wget "dl.minitoons.ir/longs%2FKhumba%20(2013)%20%5BEN%5D%20%5BBR-Rip%20720p%5D%20-%20%5Bwww.minitoons.ir%5D.rar"
Dennis
  • 1,839
1

You only need to put quotes around the url and done:

wget "http://dl.minitoons.ir/longs/Khumba (2013) [EN] [BR-Rip 720p] - [www.minitoons.ir].rar"
Warning: wildcards not supported in HTTP.
--2014-03-02 20:40:20--  http://dl.minitoons.ir/longs/Khumba%20(2013)%20[EN]%20[BR-Rip%20720p]%20-%20[www.minitoons.ir].rar
Resolving dl.minitoons.ir (dl.minitoons.ir)... 79.127.127.41
Connecting to dl.minitoons.ir (dl.minitoons.ir)|79.127.127.41|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 594062365 (567M) [application/x-rar-compressed]
Saving to: ‘Khumba (2013) [EN] [BR-Rip 720p] - [www.minitoons.ir].rar’

 0% [                                       ] 73,288      44.9KB/s          

Is easier that way and you don't have to embarrass yourself with stuff.

Braiam
  • 67,791
  • 32
  • 179
  • 269
0

I ended up writing a python script for it.

from os import listdir, rename
from urllib.parse import unquote  # py2: from urllib import unquote

os.chdir('/mydir/')
for filename in listdir('.'):
    rename(filename, unquote(filename))
frigen
  • 101