6

Many server administrators want their server to be used only by humans and not by retrieval programs like wget. One way to block such programs is to use log analysis. Log analysis identifies retrieval programs by looking for statistically significant similarities among the requests, often through timing.

Whenever I try to use wget to download packages through a shell script (one similar to those created by synaptic, mostly they are actually created by synaptic), only a few packages are downloaded and most of the packages fail to download due to connection refusal.

So I strongly think that the most probable reason why the connection is refused is that Ubuntu servers use log analysis to block programs.

Do Ubuntu servers use log analysis to block (package retrieval) programs?

EDIT:
I executed some scripts which contained packages of small size (i.e., they would get downloaded in less time). Such scripts work properly as expected. The error comes up with packages that are large in size (consequently they take more time).

jtd
  • 2,375
  • 2
  • 23
  • 31
Registered User
  • 9,631
  • 14
  • 53
  • 85
  • 1
    Interesting question. I would say No, they don't, but I just don't know. Gave you an UP, hope an Ubuntu employee answers this. – falconer Jan 26 '14 at 15:51
  • 1
    The majority of download locations are third-party mirror servers over which Ubuntu/Canonical do not have control, and as such there is unlikely to be any consistent policy on connection auditing. See https://launchpad.net/ubuntu/+archivemirrors – chronitis Feb 13 '14 at 10:29
  • @chronitis So wget must be selecting random server, iwon't it? so some of the servers have not been updated for a long while.so maybe packages are not downloaded because they are searched for on a server which does not contain them.Could this be the reason? – Registered User Feb 13 '14 at 10:54
  • It depends which URL you are using. The default <countrycode>.archive.ubuntu.com redirects you to a local mirror, but I don't know if it does load balancing or just a random choice. You can always pick a direct URL from the link above to a local, up-to-date server. See also http://askubuntu.com/questions/39922/how-do-you-select-the-fastest-mirror-from-the-command-line – chronitis Feb 13 '14 at 12:35
  • How do I do it with wget [options] [link] command? Should I ask another question for it? – Registered User Feb 13 '14 at 13:10
  • Ive used wget on repos and never had a problem. I have seen traffic blocked by governments though. If your not in Korea, or the middle east though, I would suspect things like proxy connections, not being processed correctly. you could use the --verbose option in wget to try for more specific output. check this out: http://www.thegeekstuff.com/2010/07/wget-connection-refused-error/ – j0h Feb 15 '14 at 14:55
  • @j0h you are really lucky to be in a country with fastest and best internet connection in th world. In India though, the internet is very poor in small cities like mine.BTW govt does little to control web here, they don't even block porn.The secret services look over the internet for communication between pakistani terrorists and people here helping them.But I'm not a terrorist so they won't block my connction.lol. – Registered User Feb 16 '14 at 08:11

2 Answers2

3

wget has an option, --random-wait, that is designed to avert log analysis blocking. From the docs:

--random-wait

Some web sites may perform log analysis to identify retrieval programs such as Wget by looking for statistically significant similarities in the time between requests. This option causes the time between requests to vary between 0.5 and 1.5 * wait seconds, where wait was specified using the --wait option, in order to mask Wget's presence from such analysis.

A 2001 article in a publication devoted to development on a popular consumer platform provided code to perform this analysis on the fly. Its author suggested blocking at the class C address level to ensure automated retrieval programs were blocked despite changing DHCP-supplied addresses.

The --random-wait option was inspired by this ill-advised recommendation to block many unrelated users from a web site due to the actions of one.

So chances are, if the server accepts you with the --random-wait option turned on but not without it, it is using log analysis.

Richard
  • 8,502
  • 11
  • 47
  • 72
1

Most of the mirrors aren't controlled by Ubuntu and their configuration is completely up to the sysadmins. By extension there may be some blocking on some mirrors. I personally don't see why they would but given the defaults, wget is pretty simple to fingerprint through its user-agent string even before you start considering behavioural tracking.

You can make wget look like the current apt quite simply:

wget -U "Ubuntu APT-HTTP/1.3 (0.9.9.1~ubuntu3)" ...

And as another user pointed out, if your current mirror is controlled by somebody who doesn't want you using wget, you could just use another mirror. There are loads of them.

Oli
  • 293,335