0

I noticed that connections to some (not all) internal machines take about 10s to connect - for ssh and docker pull.

If I run ping on them, some hosts also take 10s to start up, some are immediate - usually the same for any given address regardless of how frequently I rerun ping.

Either way, running nslookup always quickly prints a non-authoritative response from one server, then hangs while 'trying the next server' before timing out:

$ nslookup xxxx.internaldomain
Server:         10.10.x.x
Address:        10.10.x.x#53

Name: xxxx.internaldomain Address: 10.20.y.y ;; Got recursion not available from 10.10.x.x, trying next server <---- 10s delay here ;; connection timed out; no servers could be reached

Another one is a bit more complex, but amounts to the same thing:

$ nslookup something.company.com
;; Got recursion not available from 10.10.x.x, trying next server
Server:         127.0.0.53
Address:        127.0.0.53#53

Non-authoritative answer: something.company.com canonical name = docker-reg.internal. docker-reg.internal canonical name = something.internaldomain. Name: something.internaldomain Address: 10.10.r.r ;; Got recursion not available from 10.10.x.x, trying next server <---- 10s delay here ;; connection timed out; no servers could be reached

nslookup is happy and fast with external dns, like bbc.co.uk.

My resolv.conf looks like this:

domain internaldomain
nameserver 10.10.x.x
nameserver 127.0.0.53
search internaldomain some other internal tlds

I don't see any other nameservers mentioned, so I presume it's trying the global nameservers, but I don't understand why for selected internal hosts ssh and ping reliably don't hang, for some they do, but nslookup always does.

I believe this is a different question to Very slow DNS lookup


Update:

$ sudo -s netstat -anlp|grep ':53 '
tcp        0      0 192.168.122.1:53        0.0.0.0:*               LISTEN      2228/dnsmasq        
tcp        0      0 127.0.0.53:53           0.0.0.0:*               LISTEN      1121/systemd-resolv 
udp        0      0 192.168.122.1:53        0.0.0.0:*                           2228/dnsmasq        
udp        0      0 127.0.0.53:53           0.0.0.0:*                           1121/systemd-resolv 

Also, this issue seems to affect Ubuntu, not the majority of developers who use Macbooks: my colleague on Ubuntu has the same issue.


Another update!

My /etc/systemd/resolved.conf is all comments:

[Resolve]
#DNS=
#FallbackDNS=
#Domains=
#LLMNR=no
#MulticastDNS=no
#DNSSEC=no
#Cache=yes
#DNSStubListener=yes

Also, if I try running with 'nslookup -anything xxxx.internaldomain', I get this with no delays (I tried -anything after -debug didn't produce reams of useful stuff):

$ nslookup -anything dockerio.badoo.com
Server:         10.10.x.x
Address:        10.10.x.x#53

Non-authoritative answer: something.company.com canonical name = docker-reg.internal. docker-reg.internal canonical name = something.internaldomain. Name: something.internaldomain Address: 10.10.r.r

I can get a version though:

$ nslookup -version
nslookup 9.11.3-1ubuntu1.13-Ubuntu

Another update:

$ systemd-resolve --status
Global
         DNS Servers: 10.10.x.x
          DNS Domain: various
                      internal
                      domains
          DNSSEC NTA: 10.in-addr.arpa
                      xx1.172.in-addr.arpa
                      168.192.in-addr.arpa
                      xx2.172.in-addr.arpa  # Lots of these 172s
                      internal
                      x.x.ip6.arpa
                      various
                      other
                      internals

Link 191 (cscotun0) Current Scopes: none LLMNR setting: yes MulticastDNS setting: no DNSSEC setting: no DNSSEC supported: no

Link 15 (docker0) Current Scopes: none LLMNR setting: yes MulticastDNS setting: no DNSSEC setting: no DNSSEC supported: no

Link 14 (br-04d8e612xxxx) Current Scopes: none LLMNR setting: yes MulticastDNS setting: no DNSSEC setting: no DNSSEC supported: no

Link 7 (virbr0-nic) Current Scopes: none LLMNR setting: yes MulticastDNS setting: no DNSSEC setting: no DNSSEC supported: no

Link 6 (virbr0) Current Scopes: none LLMNR setting: yes MulticastDNS setting: no DNSSEC setting: no DNSSEC supported: no

Link 5 (virbr1-nic) Current Scopes: none LLMNR setting: yes MulticastDNS setting: no DNSSEC setting: no DNSSEC supported: no

Link 4 (virbr1) Current Scopes: none LLMNR setting: yes MulticastDNS setting: no DNSSEC setting: no DNSSEC supported: no

Link 3 (wlp4s0) Current Scopes: DNS LLMNR setting: yes MulticastDNS setting: no DNSSEC setting: no DNSSEC supported: no DNS Servers: 194.168.4.100 # These are my home ISP 194.168.8.100 DNS Domain: ~.

Link 2 (enp0s3xxx) Current Scopes: none LLMNR setting: yes MulticastDNS setting: no DNSSEC setting: no DNSSEC supported: no

  • Can you post the output of sudo netstat -anlp | grep LISTEN | grep 53? – Brian Turek Sep 07 '20 at 10:03
  • @BrianTurek Hi - I've added the two lines above: dnsmasq and systemd-resolv Thanks. – Tim Baverstock Sep 07 '20 at 15:04
  • It appears that systemd-resolved (the "next server" being queried) isn't binding on a UDP port. Did you edit your /etc/systemd/resolved.conf at all? Can you check what the value of "DNSStubListener" is in resolved.conf? – Brian Turek Sep 07 '20 at 15:25
  • @BrianTurek Hi, and thanks. Resolved.conf is all commented out, so I presume DNSStubListener is expecting to listen on both tcp and udb. If I tweak the netstat command you suggested (above), I see it on both. – Tim Baverstock Sep 09 '20 at 09:12
  • Normally when you are VPNing you use Network Manager to instruct systemd-resolved or networkd to adjust the DNS lookups used by resolved for its DNS queries. Then, all lookups route via ResolveD and it handles recursive lookups to other DNS servers. Show the output of systemd-resolve --status please from when you're VPN'd into the remote network - SystemD may be trying only that remote VPN DNS server and failing to lookup as a result. – Thomas Ward Sep 09 '20 at 13:49
  • @ThomasWard Hi, and thanks. This is deeper than I've ever had to go with DNS - I haven't done serious sysadmin in a very long time! – Tim Baverstock Sep 09 '20 at 16:57
  • OK so we know that DNS exists here. What's happening is your computer first tries querying 10.10.x.x, and gets a "cannot recurse" response. It then tries to query SystemD ResolveD, which is also set to query 10.10.x.x which gets a "cannot recurse" response. It then tries to fail over to any other DNS server in the DNS config (your ISPs) and can't reach those servers so times out. The config in /etc/resolv.conf is partly to blame here, give me a bit to think how you should fix this... – Thomas Ward Sep 09 '20 at 17:21
  • @ThomasWard Thanks! On reflection, I'm not clear why our internal DNS doesn't claim to be authoritative about our own internal machines, which would seem to be a significant factor in the problem - I've asked about that. – Tim Baverstock Sep 11 '20 at 08:47
  • @TimBaverstock could be a misconfiguration, or like my odd DNS setup could be because the DNS request forwards to another server which is authoritative. – Thomas Ward Sep 11 '20 at 14:06
  • @ThomasWard The Head of Service Engineering found a solution. Many thanks for your help! – Tim Baverstock Sep 22 '20 at 16:26

1 Answers1

1

The problem was down to systemd-resolved, and was fixed by replacing /etc/resolv.conf with a symlink to a copy of the file.

# mv /etc/resolv.conf /etc/resolv.conf_bak && \
  ln -s /run/systemd/resolve/resolv.conf /etc/resolv.conf

I can't take credit for this - Head of Service Engineering took an interest in the internal ticket I raised, but that's why he's paid the big bucks.

After some experimentation and searching, he cited https://moss.sh/name-resolution-issue-systemd-resolved/

It seems the service was trying to handle everything, but it changes its MODE of work depending on whether /etc/resolv.conf is a symlink in its config or not!

One bewildering item: when I edited /etc/resolv.conf - with vi or just appending lines with shell redirect, the file was either instantly restored or otherwise protected (though lsof showed nothing, nor did lsattr).

  • when you edit the file which is a symlink for SystemD to handle, systemd-resolved holds a watch on the file, any changes you add in manually will be reverted. – Thomas Ward Sep 22 '20 at 16:35
  • @ThomasWard Thanks, I thought it was something like that, but lsof didn't show me anything and I assumed that would be where something like that would show up. It's unfortunate that the developers didn't think to inject a comment or an accompanying backup file explaining that: to me, suprising unexplained behaviour is a kind of bug. – Tim Baverstock Sep 28 '20 at 09:07