At least since a week ago, my ubuntu 18.04 sometimes does not have internet access. Despite that it shows in the GUI the wifi icon like normal.
Interestingly, dig @8.8.8.8 google.com
works, but ping google.com
does not. Websites in the browser do not load either.
(I intend to update this question with more detailed descriptions of what "does not work" means next time I see the error messages.)
When this happens, usually a dhclient -r wlp0s20f3
will not fix it, but a sudo dhclient wlp0s20f3
will temporarily fix it.
Sometimes that outputs RTNETLINK answers: File exists
and in that case it seems like (sometimes?) I need to use the gui to turn the wifi off and on again. It seems like doing the same with ifdown
/ifup
or sudo ifconfig wlp0s20f3 down
/up
does not work reliably for that, but using the gui does.
How to fix this and no longer have to manually get out of this state?
The attempts below list what I've tried and some additional, possibly useful, information. I believe Observation 7 is the most insightful so far, so please scroll down :)
Attempt 1
I found somewhere the suggestion to modify /etc/network/interfaces
to look like this:
# interfaces(5) file used by ifup(8) and ifdown(8)
auto lo
iface lo inet loopback
adding this in th ehopes that it will help me avoiding
that issue where i have to run
sudo dhclient wlp...
every time.
auto wlp0s20f3
iface wlp0s20f3 inet dhcp
auto enp0s31f6
iface enp0s31f6 inet dhcp
but that did not seem to help, so I removed those changes again after a reboot.
Attempt 2
This issue seems common 1,2,3 but all the answers seem to not be explaining much. This answer suggests it could be related to /etc/resolv.conf
and this answer talks about checking whether there is a default route.
Indeed, I had no default route (one time) before restarting the wifi. One time the following worked:
# down interface and delete dhcp leases, then up it again
sudo ifdown wlp0s20f3 ; sudo ifconfig wlp0s20f3 down ; sudo rm /var/lib/dhcp/dhclient.* ; sudo ifup wlp0s20f3 ;
view routes
ip route
still broken
try this:
sudo ifconfig wlp0s20f3 down
sudo ifconfig wlp0s20f3 up
ip route
now it works???
but next time it did not:
generic@motorbrot:~$ echo "bad:" && ip route
bad:
169.254.0.0/16 dev virbr0 scope link metric 1000 linkdown
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
192.168.122.0/24 dev virbr0 proto kernel scope link src 192.168.122.1 linkdown
generic@motorbrot:~$ echo "bad:" && ip route
bad:
169.254.0.0/16 dev virbr0 scope link metric 1000 linkdown
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
192.168.122.0/24 dev virbr0 proto kernel scope link src 192.168.122.1 linkdown
generic@motorbrot:~$ ping 1.1.1.1 -
ping: -: Name or service not known
generic@motorbrot:~$ ping 1.1.1.1
connect: Network is unreachable
generic@motorbrot:~$ dig @8.8.8.8 google.com
^Cgeneric@motorbrot:~echo "after down:" && ip route
after down:
169.254.0.0/16 dev virbr0 scope link metric 1000 linkdown
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
192.168.122.0/24 dev virbr0 proto kernel scope link src 192.168.122.1 linkdown
generic@motorbrot:~$ echo "after up:" && ip route
after up:
169.254.0.0/16 dev virbr0 scope link metric 1000 linkdown
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
192.168.0.0/24 dev wlp0s20f3 proto kernel scope link src 192.168.0.37
192.168.122.0/24 dev virbr0 proto kernel scope link src 192.168.122.1 linkdown
generic@motorbrot:~$ echo "after down-rm-up:" && ip route
after down-rm-up:
169.254.0.0/16 dev virbr0 scope link metric 1000 linkdown
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
192.168.0.0/24 dev wlp0s20f3 proto kernel scope link src 192.168.0.37
192.168.122.0/24 dev virbr0 proto kernel scope link src 192.168.122.1 linkdown
generic@motorbrot:~$ echo "after gui turnoff turnon:" && ip route
after gui turnoff turnon:
default via 192.168.0.1 dev wlp0s20f3 proto dhcp metric 600
169.254.0.0/16 dev virbr0 scope link metric 1000 linkdown
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
192.168.0.0/24 dev wlp0s20f3 proto kernel scope link src 192.168.0.37 metric 600
192.168.122.0/24 dev virbr0 proto kernel scope link src 192.168.122.1 linkdown
Notice that the final, working, ip route
shows the route that initially was not there. So something somehow changed.
Approach 3
My /etc/resolv.conf
also looks shady every now and then:
# this was the state of the /etc/resolv.conf
# file at the time when my network was currently working after a
# wifi-off-wifi-on action in the gui, but generally had issues
# after some time when I reconnected to a wifi...
domain v.cablecom.net
search v.cablecom.net
nameserver 62.2.17.61
nameserver 62.2.24.158
But i have my own dns resolver with dnscrypt-proxy
running on localhost. So it should actually rather be something like
nameserver 127.0.0.1
options edns0
This is an issue that I have had before at some point, according to my notes. This answer suggests to add dns=none
to /etc/NetworkManager/NetworkManager.conf
, but that did not work at all back then, until following the comment by Chris Moore to also run sudo service network-manager restart
.
However, at the current moment, dns=none
is set as such in my NetworkManager.conf
:
[main]
plugins=ifupdown,keyfile
# Added 30.07.2020 by LucidBrot to avoid /etc/resolv.conf being overwritten and hence breaking the DNS resolving.
dns=none
[ifupdown]
managed=false
[device]
wifi.scan-rand-mac-address=no
I can try to perform the sudo service network-manager restart
once more, but I would be surprised if it actually helped.
It is also worth pointing out that my /etc/resolv.conf
is a symlink. According to redhat this would too make NetworkManager not modify that file. But it evidently did, because I kept track of what I had set that file's contents to.
I do not know what to try next, and I would like to understand what happened, and why, in addition to how to fix it.
generic@motorbrot:/etc$ ls -la | grep resolv
drwxr-xr-x 3 root root 3 Mai 7 2020 resolvconf
lrwxrwxrwx 1 root root 25 Mär 31 10:21 resolv.conf -> /etc/resolv.conf.localdns
-rw-r--r-- 1 root root 737 Jul 29 2020 resolv.conf.backup
-rw-r--r-- 1 root root 74 Jul 30 2020 resolv.conf.backup2
-rw-r--r-- 1 root root 364 Mär 31 10:17 resolv.conf.backup3
-rw-r--r-- 1 root root 89 Apr 5 00:06 resolv.conf.localdns
Observation 3
It happened again, so I turned the wifi off and on again. Still not working. At this point I ran the following commands:
generic@motorbrot:~$ ip route
default via 192.168.43.68 dev wlp0s20f3 proto dhcp metric 600
169.254.0.0/16 dev virbr0 scope link metric 1000 linkdown
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
192.168.43.0/24 dev wlp0s20f3 proto kernel scope link src 192.168.43.143 metric 600
192.168.122.0/24 dev virbr0 proto kernel scope link src 192.168.122.1 linkdown
generic@motorbrot:~$ sudo dhclient wlp0s20f3
[sudo] password for generic:
generic@motorbrot:~$ ip route
default via 192.168.43.68 dev wlp0s20f3
169.254.0.0/16 dev virbr0 scope link metric 1000 linkdown
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
192.168.43.0/24 dev wlp0s20f3 proto kernel scope link src 192.168.43.143
192.168.43.0/24 dev wlp0s20f3 proto kernel scope link src 192.168.43.143 metric 600
192.168.122.0/24 dev virbr0 proto kernel scope link src 192.168.122.1 linkdown
We can see that all that sudo dhclient wlp0s20f3
changed was removing the proto dhcp metric 600
from the default
route. After that, internet is working.
NetworkManager or systemd-networkd
A comment suggests there might be different config methods conflicting. I believe I am using NetworkManager, and I believe this output supports that belief:
generic@motorbrot:~$ systemctl list-unit-files | grep networkd
networkd-dispatcher.service enabled
systemd-networkd-wait-online.service disabled
systemd-networkd.service disabled
systemd-networkd.socket disabled
generic@motorbrot:~$ systemctl list-unit-files | grep NetworkManager
NetworkManager-dispatcher.service enabled
NetworkManager-wait-online.service enabled
NetworkManager.service
Observation 4
Right now I had the problem that the gui thought I was connected, but even dig @8.8.8.8 google.com
did not work. So I suspect I have multiple issues at once.
There was no default route at that time. I used the gui to turn wifi off and on again and now the connection worked again, with a default route present:
# before restarting wifi:
generic@motorbrot:~$ ip route
169.254.0.0/16 dev virbr0 scope link metric 1000 linkdown
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
192.168.122.0/24 dev virbr0 proto kernel scope link src 192.168.122.1 linkdown
after restarting wifi:
generic@motorbrot:~$ ip route
default via 192.168.0.1 dev wlp0s20f3 proto dhcp metric 600
169.254.0.0/16 dev virbr0 scope link metric 1000 linkdown
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
192.168.0.0/24 dev wlp0s20f3 proto kernel scope link src 192.168.0.37 metric 600
192.168.122.0/24 dev virbr0 proto kernel scope link src 192.168.122.1 linkdown
I found some answers [5, 6] mentioning /etc/NetworkManager/NetworkManager.conf
when searching again for the problem of a missing default route. On my laptop, it contains managed=false
. It seems like this should be true
instead, so I changed it for now. However, these answers seem themselves unsure whether this should be managed=true
or managed=false
...
[main]
plugins=ifupdown,keyfile
# Added 30.07.2020 by LucidBrot to avoid /etc/resolv.conf being overwritten and hence breaking the DNS resolving.
dns=none
[ifupdown]
managed=true
[device]
wifi.scan-rand-mac-address=no
The answers are saying that requires a service network-manager restart
, which I'm doing now. I did a systemctl restart NetworkManager
and fascinatingly, my default route is now gone, but the internet connection is still working. An empty line in my routes disappeared.
generic@motorbrot:~$ systemctl status NetworkManager
● NetworkManager.service - Network Manager
Loaded: loaded (/lib/systemd/system/NetworkManager.service; enabled; vendor p
Active: active (running) since Tue 2022-04-05 00:12:28 CEST; 1 weeks 0 days a
Docs: man:NetworkManager(8)
Main PID: 16747 (NetworkManager)
Tasks: 4 (limit: 4915)
CGroup: /system.slice/NetworkManager.service
├─16747 /usr/sbin/NetworkManager --no-daemon
└─32449 /sbin/dhclient -d -q -sf /usr/lib/NetworkManager/nm-dhcp-help
generic@motorbrot:~$ ip route
default via 192.168.0.1 dev wlp0s20f3 proto dhcp metric 600
169.254.0.0/16 dev virbr0 scope link metric 1000 linkdown
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
192.168.0.0/24 dev wlp0s20f3 proto kernel scope link src 192.168.0.37 metric 600
192.168.122.0/24 dev virbr0 proto kernel scope link src 192.168.122.1 linkdown
generic@motorbrot:~$ systemctl restart NetworkManager
generic@motorbrot:~$ ip route
169.254.0.0/16 dev virbr0 scope link metric 1000 linkdown
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
192.168.122.0/24 dev virbr0 proto kernel scope link src 192.168.122.1 linkdown
~~I will report back how that affected behaviour if at all.~~
This did not stop the missing default route issue from happening though. That issue is temporarily fixed by turning off the wifi in the gui and turning it on again, but not by sudo dhclient wlp0s20f3
.
Since it seemed to have no observable effect, I have soon changed this back to managed=false
.
Observation 5
I think my suspicion is confirmed. After this change I now had a default route on my hotspot but still some issues.
- websites not loading, domains not resolving with ping
- Telegram worked
dig @8.8.8.8 google.com
resolving correctlydig google.com
not resolving
So it would have to be an issue with either my local dns resolver or some other networking issue.
The routes looked this way:
generic@motorbrot:~$ ip route
default via 192.168.43.143 dev wlp0s20f3 proto dhcp metric 600
169.254.0.0/16 dev virbr0 scope link metric 1000 linkdown
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
192.168.43.0/24 dev wlp0s20f3 proto kernel scope link src 192.168.43.144 metric 600
192.168.122.0/24 dev virbr0 proto kernel scope link src 192.168.122.1 linkdown
generic@motorbrot:~$ ping google.com
^C
generic@motorbrot:~$ dig google.com
; <<>> DiG 9.11.3-1ubuntu1.17-Ubuntu <<>> google.com
;; global options: +cmd
;; connection timed out; no servers could be reached
generic@motorbrot:~$ dig @8.8.8.8 google.com
; <<>> DiG 9.11.3-1ubuntu1.17-Ubuntu <<>> @8.8.8.8 google.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 17464
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;google.com. IN A
;; ANSWER SECTION:
google.com. 59 IN A 142.250.203.110
;; Query time: 44 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Wed Apr 13 09:01:30 CEST 2022
;; MSG SIZE rcvd: 55
To get my local DoH temporarily working again, sudo dhclient -r wlp0s20f3
did the trick once again.
Observation 6
systemctl status systemd-resolved
revealed that it was loaded, disabled, and active (running).
It should be disabled
, that's correct. Because I am using dnscrypt-proxy
as a local stub and don't need systemd-resolved
. But it should not be running... I don't know why it was running, but I stopped it again now.
I have now also deleted my /etc/network/interfaces
file, since this answer indicates that I do not want it. It would be used by ifupdown
but I am using network-manager.
Observation 7
Following this answer, I have set up auditing for the file my /etc/resolv.conf
symlink is pointing towards.
sudo apt install auditd
sudo systemctl status auditd
# shows it is running and enabled
# Set up a rule to watch the file
# and use an arbitrary key for later grepping it:
sudo auditctl -w /etc/resolv.conf.localdns -p wa -k lb_dhclient_issue
# list rules
sudo auditctl -l
# to remove the watch, use the same command but with -W instead of -w and match each other field in the rule.
# i.e.
# sudo auditctl -W /etc/resolv.conf.localdns -p wa -k lb_dhclient_issue
Very soon after, I already see activity on that file:
sudo ausearch -f /etc/resolv.conf.localdns --format text
At 13:47:15 25.04.2022 generic, acting as root, successfully renamed /etc/resolv.conf.localdns.dhclient-new.13892 to /etc/resolv.conf.localdns using /bin/mv
At 13:49:39 25.04.2022 generic, acting as root, successfully renamed /etc/resolv.conf.localdns.dhclient-new.15462 to /etc/resolv.conf.localdns using /bin/mv
At 13:53:08 25.04.2022 generic, acting as root, successfully renamed /etc/resolv.conf.localdns.dhclient-new.17715 to /etc/resolv.conf.localdns using /bin/mv
At 13:56:52 25.04.2022 generic, acting as root, successfully renamed /etc/resolv.conf.localdns.dhclient-new.20232 to /etc/resolv.conf.localdns using /bin/mv
At 13:59:51 25.04.2022 generic, acting as root, successfully renamed /etc/resolv.conf.localdns.dhclient-new.22822 to /etc/resolv.conf.localdns using /bin/mv
Roughly every three minutes, some process under my username (generic
) acts as root to move a file to /etc/resolv.conf.localdns
. And the source is /etc/resolv.conf.localdns.dhclient-new.22822
, which indicates that dhclient
is the culprit.
I guess I could use chattr +i /etc/resolv.conf
to make it un-editable, but that seems like a dirty approach. For now, I am doing that and it seems to successfully prevent dhclient form changing the file, but I would like to understand what went wrong and how to avoid the same issue in the future, perhaps even a cleaner fix.
Also, I don't really understand why manually running dhclient
helped me. I guess that was the problem with the missing default route, which has not been appearing anymore in a while now.
192.168.0.1
but some of the excerpts here are from my phone hotspot or the university network instead. – lucidbrot Apr 10 '22 at 12:00dnscrypt-proxy
setup on purpose (for a local DoH resolver stub). The rest I only started to touch now because things were not working anymore, all of a sudden. – lucidbrot Apr 10 '22 at 12:02