1

I'm running ubuntu on a small mini Z83 PC, that has an embedded wifi adapter. Having moved houses, I am now using this on wifi instead of ethernet (it controls a player piano).

I am finding it unreliable. I do not know if this is a new problem or not, as I also recently set up network monitoring which is how I noticed it. When "failed" the device runs but appears slow, so I may have ignored it in the past.

It is completely stable for hours, then it goes wonky - packet loss and long ping times. This shift always occurs immediately after a wifi rekey. The rekey succeeds, and the wifi remains associated, it just works poorly afterwards.

The device is associating on 2.4ghz, and the router shows a signal of -52 and noise of -80 dbm, i.e. a very good signal (and indeed it is about 20' from the router). Devices (google mini's, LG TV's) run fine several times as far away.

Pings

The above is about 24 hours, the part in the middle is really good, the part at each end is awful, working but barely.

I have upgraded from (I think it was) 17.10 to 19.10, and then updated that. I have turned off power management on the wifi. I have turned off IPv6 (some postings indicated it as a problem, it is back on in the most recent case). netstat -i shows dropped received packets but no errors on tx/rx and no errors on tx.

netplan apply does not fix it. A reboot will always fix it until a few hours later, and everytime it goes wonky it is at a rekey (at least +/- 1 minute which is how often I'm recording ping results).

The router is a fairly new Linksys AC1900ACS running OpenWRT 18.06; no other devices are having issues.

I'm going to, unfortunately, paste in all the logs from the last rekey. It's kind of long but not sure what in it may give someone a clue. I see four re-key statements, and I have four VLAN's, so let me mention that -- this is on a general use VLAN, and all the addresses are coming up correctly. Other VLAN's are for video, guests, automation, etc. I don't THINK there are issues there, but I also do not know why there are four rekey messages but there always are. Each VLAN has a different SSID and this is associating by that SSID, obviously, and only that one SSID.

Below is the log from the last event. I appreciate any advice, other than (which is my next step I think) running a wire up the wall to the router.

Sep  4 19:13:30 notes wpa_supplicant[878]: wlan0: WPA: Group rekeying completed with 26:f5:a2:bf:1f:0c [GTK=CCMP]
Sep  4 19:13:33 notes wpa_supplicant[878]: message repeated 3 times: [ wlan0: WPA: Group rekeying completed with 26:f5:a2:bf:1f:0c [GTK=CCMP]]
Sep  4 19:13:46 notes systemd-networkd[610]: wlan0: Lost carrier
Sep  4 19:13:46 notes avahi-daemon[716]: Withdrawing address record for 192.168.130.172 on wlan0.
Sep  4 19:13:46 notes wpa_supplicant[878]: wlan0: CTRL-EVENT-DISCONNECTED bssid=26:f5:a2:bf:1f:0c reason=7 locally_generated=1
Sep  4 19:13:46 notes wpa_supplicant[878]: wlan0: CTRL-EVENT-REGDOM-CHANGE init=CORE type=WORLD
Sep  4 19:13:46 notes systemd-networkd[610]: wlan0: DHCP lease lost
Sep  4 19:13:46 notes avahi-daemon[716]: Leaving mDNS multicast group on interface wlan0.IPv4 with address 192.168.130.172.
Sep  4 19:13:46 notes systemd-networkd[610]: wlan0: DHCPv6 lease lost
Sep  4 19:13:46 notes dbus-daemon[725]: [system] Activating via systemd: service name='org.freedesktop.hostname1' unit='dbus-org.freedesktop.hostname1.service' requested by ':1.1' (uid=101 pid=610 comm="/lib/systemd/systemd-networkd " label="unconfined")
Sep  4 19:13:46 notes systemd-timesyncd[605]: No network connectivity, watching for changes.
Sep  4 19:13:46 notes avahi-daemon[716]: Interface wlan0.IPv4 no longer relevant for mDNS.
Sep  4 19:13:46 notes avahi-daemon[716]: Withdrawing address record for fdf5:7fae:c03f::a51 on wlan0.
Sep  4 19:13:46 notes avahi-daemon[716]: Withdrawing address record for XXXX:XXX:XXXX:XXXX::a51 on wlan0.
Sep  4 19:13:46 notes avahi-daemon[716]: Withdrawing address record for fdf5:7fae:c03f:0:ce4b:73ff:fe43:5b0 on wlan0.
Sep  4 19:13:46 notes avahi-daemon[716]: Withdrawing address record for XXXX:XXX:XXXX:XXXX:ce4b:73ff:fe43:5b0 on wlan0.
Sep  4 19:13:46 notes avahi-daemon[716]: Leaving mDNS multicast group on interface wlan0.IPv6 with address XXXX:XXX:XXXX:XXXX:ce4b:73ff:fe43:5b0.
Sep  4 19:13:46 notes avahi-daemon[716]: Joining mDNS multicast group on interface wlan0.IPv6 with address fe80::ce4b:73ff:fe43:5b0.
Sep  4 19:13:46 notes avahi-daemon[716]: Registering new address record for fe80::ce4b:73ff:fe43:5b0 on wlan0.*.
Sep  4 19:13:46 notes systemd[1]: Starting Hostname Service...
Sep  4 19:13:48 notes wpa_supplicant[878]: wlan0: Trying to associate with SSID 'MYSSID'
Sep  4 19:13:51 notes systemd-networkd[610]: wlan0: Gained carrier
Sep  4 19:13:51 notes systemd-timesyncd[605]: Network configuration changed, trying to establish connection.
Sep  4 19:13:51 notes wpa_supplicant[878]: wlan0: Associated with 26:f5:a2:bf:1f:0d
Sep  4 19:13:51 notes wpa_supplicant[878]: wlan0: CTRL-EVENT-SUBNET-STATUS-UPDATE status=0
Sep  4 19:13:51 notes wpa_supplicant[878]: wlan0: CTRL-EVENT-REGDOM-CHANGE init=COUNTRY_IE type=COUNTRY alpha2=US
Sep  4 19:13:51 notes systemd-networkd[610]: wlan0: Lost carrier
Sep  4 19:13:51 notes wpa_supplicant[878]: wlan0: WPA: Key negotiation completed with 26:f5:a2:bf:1f:0d [PTK=CCMP GTK=CCMP]
Sep  4 19:13:51 notes wpa_supplicant[878]: wlan0: CTRL-EVENT-CONNECTED - Connection to 26:f5:a2:bf:1f:0d completed [id=0 id_str=]
Sep  4 19:13:51 notes systemd-networkd[610]: wlan0: Gained carrier
Sep  4 19:13:51 notes systemd-timesyncd[605]: Network configuration changed, trying to establish connection.
Sep  4 19:13:52 notes avahi-daemon[716]: Leaving mDNS multicast group on interface wlan0.IPv6 with address fe80::ce4b:73ff:fe43:5b0.
Sep  4 19:13:52 notes avahi-daemon[716]: Joining mDNS multicast group on interface wlan0.IPv6 with address XXXX:XXX:XXXX:XXXX:ce4b:73ff:fe43:5b0.
Sep  4 19:13:52 notes systemd-timesyncd[605]: Network configuration changed, trying to establish connection.
Sep  4 19:13:52 notes avahi-daemon[716]: Registering new address record for XXXX:XXX:XXXX:XXXX:ce4b:73ff:fe43:5b0 on wlan0.*.
Sep  4 19:13:52 notes avahi-daemon[716]: Withdrawing address record for fe80::ce4b:73ff:fe43:5b0 on wlan0.
Sep  4 19:13:52 notes avahi-daemon[716]: Registering new address record for fdf5:7fae:c03f:0:ce4b:73ff:fe43:5b0 on wlan0.*.
Sep  4 19:13:52 notes whoopsie[982]: [19:13:52] Cannot reach: https://daisy.ubuntu.com
Sep  4 19:13:52 notes whoopsie[982]: [19:13:52] Cannot reach: https://daisy.ubuntu.com
Sep  4 19:13:53 notes systemd-networkd[610]: wlan0: DHCPv6 address fdf5:7fae:c03f::a51/128 timeout preferred -1 valid -1
Sep  4 19:13:53 notes systemd-networkd[610]: wlan0: DHCPv6 address XXXX:XXX:XXXX:XXXX::a51/128 timeout preferred 261560 valid 261560
Sep  4 19:13:53 notes avahi-daemon[716]: Registering new address record for fdf5:7fae:c03f::a51 on wlan0.*.
Sep  4 19:13:53 notes avahi-daemon[716]: Registering new address record for XXXX:XXX:XXXX:XXXX::a51 on wlan0.*.
Sep  4 19:13:53 notes systemd-timesyncd[605]: Network configuration changed, trying to establish connection.
Sep  4 19:13:53 notes whoopsie[982]: [19:13:53] Cannot reach: https://daisy.ubuntu.com
Sep  4 19:13:53 notes whoopsie[982]: [19:13:53] Cannot reach: https://daisy.ubuntu.com
Sep  4 19:13:53 notes systemd-timesyncd[605]: Synchronized to time server [2001:67c:1560:8003::c7]:123 (ntp.ubuntu.com).
Sep  4 19:13:54 notes systemd-networkd[610]: wlan0: DHCPv4 address 192.168.130.172/24 via 192.168.130.1
Sep  4 19:13:54 notes avahi-daemon[716]: Joining mDNS multicast group on interface wlan0.IPv4 with address 192.168.130.172.
Sep  4 19:13:54 notes avahi-daemon[716]: New relevant interface wlan0.IPv4 for mDNS.
Sep  4 19:13:54 notes avahi-daemon[716]: Registering new address record for 192.168.130.172 on wlan0.IPv4.
Sep  4 19:13:54 notes whoopsie[982]: [19:13:54] Cannot reach: https://daisy.ubuntu.com
Sep  4 19:13:54 notes systemd-resolved[679]: Server returned error NXDOMAIN, mitigating potential DNS violation DVE-2018-0001, retrying transaction with reduced feature level UDP.
Sep  4 19:13:54 notes systemd-resolved[679]: Server returned error NXDOMAIN, mitigating potential DNS violation DVE-2018-0001, retrying transaction with reduced feature level UDP.
Sep  4 19:13:54 notes systemd-networkd[610]: wlan0: Configured
Sep  4 19:14:00 notes dbus-daemon[725]: [system] Successfully activated service 'org.freedesktop.hostname1'
Sep  4 19:14:00 notes systemd[1]: Started Hostname Service.
Sep  4 19:14:07 notes colord[938]: failed to get session [pid 2003]: No data available
Sep  4 19:14:30 notes systemd[1]: systemd-hostnamed.service: Succeeded.

The device shows up here:

  *-network
       description: Wireless interface
       physical id: 1
       logical name: wlan0
       serial: cc:4b:73:43:05:b0
       capabilities: ethernet physical wireless
       configuration: broadcast=yes driver=brcmfmac driverversion=7.45.18 firmware=01-6a2c8ad4 ip=192.168.130.172 multicast=yes wireless=IEEE 802.11
Linwood
  • 141
  • Have you tried disabling wifi power saving? – Kristopher Ives Sep 05 '19 at 01:06
  • Yes, as mentioned above, I have. Though let me be specific, I set /etc/NetworkManager/conf.d/default-wifi-powersave-ok.conf to 2 instead of 3. Now that you ask, iwconfig wlan0 says "Power Management: On". I really thought that the right way to disable it was the above edit, am I missing something? But... but... would power management only cause an issue on rekey? – Linwood Sep 05 '19 at 01:35
  • So I've done a couple things. One is I disabled NetworkManager (and associated gnome needs), and the good news is this made the issue completely repeatable. It did it immediately on boot. I THINK (emphasis on THINK) that both Networkd and NetworkManager were trying to configure and getting a race condition, and the latter worked, the former was flakey. Now I have only the former. I think the rekey caused it to race again and sometimes Networkd won. Now I need to find the issue in Networkd. – Linwood Sep 05 '19 at 16:02
  • I'm going to give it 24 hours or so to see if it works, but I think @KristopherIves hit it even though I had said it was off. I think that NetworkManager and NetworkD were competing and sometimes one got it, sometimes the other. NetworkManager had power management off, NetworkD on. During rekey I (sometimes) got NetworkD, and it started failing. I've disabled NetworkManager so there's no competition, and did a rc.local file to turn power management off in networkd (is there a better way?). I'll post again if it is stable. Advice still welcomed if this doesn't seem right. – Linwood Sep 05 '19 at 16:56
  • @KristopherIves if that's the solution, you should post an answer... – Fabby Sep 05 '19 at 20:26

1 Answers1

0

I have a solution though I hesitate to call it an answer.

The first problem was that, somehow, NetworkManager and networkd were both competing for the wlan0 (wifi) device. I found I was getting errors that the run file for wpa_supplicant was already present. I believe that when NetworkManager configured the device it worked, and when networkd it did not. Rekeying appeared to allow a race condition and networkd won. This is largely supposition, however I disabled NetworkManager with the four services as shown in this answer.

sudo systemctl stop NetworkManager.service
sudo systemctl disable NetworkManager.service
sudo systemctl stop NetworkManager-wait-online.service
sudo systemctl disable NetworkManager-wait-online.service
sudo systemctl stop NetworkManager-dispatcher.service
sudo systemctl disable NetworkManager-dispatcher.service
sudo systemctl stop network-manager.service
sudo systemctl disable network-manager.service

Afterwards, it rebooted and failed immediately (i.e. became slow and unreliable). My previous attempt to disable power management (@Kristopher Ives was on track) only applied to NetworkManager, apparently. Here's an example of pings:

Pinging noteswifi.leferguson.com [192.168.130.173] with 32 bytes of data:
Reply from 192.168.130.173: bytes=32 time=3ms TTL=64   # with iwconfig wlan0 power off
Reply from 192.168.130.173: bytes=32 time=1ms TTL=64
Reply from 192.168.130.173: bytes=32 time=3ms TTL=64
Reply from 192.168.130.173: bytes=32 time=3ms TTL=64
Reply from 192.168.130.173: bytes=32 time=3ms TTL=64
Reply from 192.168.130.173: bytes=32 time=3ms TTL=64    # iwconfig wlan0 power on
Reply from 192.168.130.173: bytes=32 time=97ms TTL=64
Reply from 192.168.130.173: bytes=32 time=91ms TTL=64
Reply from 192.168.130.173: bytes=32 time=509ms TTL=64
Reply from 192.168.130.173: bytes=32 time=118ms TTL=64
Reply from 192.168.130.173: bytes=32 time=137ms TTL=64
Reply from 192.168.130.173: bytes=32 time=873ms TTL=64
Reply from 192.168.130.173: bytes=32 time=3ms TTL=64
Reply from 192.168.130.173: bytes=32 time=631ms TTL=64
Reply from 192.168.130.173: bytes=32 time=415ms TTL=64
Reply from 192.168.130.173: bytes=32 time=433ms TTL=64
Reply from 192.168.130.173: bytes=32 time=247ms TTL=64
Reply from 192.168.130.173: bytes=32 time=294ms TTL=64   # iwconfig wlan0 power off
Reply from 192.168.130.173: bytes=32 time=3ms TTL=64
Reply from 192.168.130.173: bytes=32 time=3ms TTL=64
Reply from 192.168.130.173: bytes=32 time=5ms TTL=64
Reply from 192.168.130.173: bytes=32 time=3ms TTL=64
Reply from 192.168.130.173: bytes=32 time=4ms TTL=64

Note this device is line powered, not battery.

The change is immediate, on the next ping after such a change. I was unable to find a place to configure that for networkd in a straightforward way. This answer despite not being upvoted appears to work (I confirmed with udo systemctl edit --full rc-local that rc.local is configured to run)

sudo nano /etc/rc.local    # then add the text below: 

    iwconfig wlan0 power off   
    exit 0

At this point, with the above done, it has been stable for 24 hours, and also through a re-key. I can find no documentation to indicate such competition is a thing -- maybe this system, which went through several upgrades, ended up with a bad configuration. Also, as of this writing, 19.10 is pretty new (though my problems pre-dated it -- but that could simply have been powermanagement). This ubuntu install is a desktop version and gnome is present which appears to bring with it network manager overhead. Perhaps I should have used it and not networkd, but I wanted to configure the whole system from the terminal, not the gui.

So hopefully someone more into the internals can give feedback as to how this came to be and in particular, is there a better way to make power management permanently turn off than running a startup script.

Linwood
  • 141