5

On ubuntu 16.04.6 on trying to mask a systemd service. I am seeing errors like below.

systemctl mask hadoop-hdfs-zkfc.service
Failed to activate service 'org.freedesktop.systemd1': timed out

I am using the default systemd version that comes with Ubuntu 16.04.6.

ubuntu@platform1:~$ systemctl --version
systemd 229
+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP
+LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ -LZ4 +SECCOMP +BLKID +ELFUTILS
+KMOD -IDN

The issue does not come always. But once this issue is hit the only way to recover the systemd is to do a hard-reboot.

Looking at the syslog it appears systemd aborted.

May 18 08:49:24 platform3 systemd[1]: Removed slice User Slice of support.
May 18 08:49:27 platform3 systemd[1]: Assertion 's->type ==
SERVICE_ONESHOT' failed at ../src/core/service.c:1792, function
service_enter_start(). Aborting.
May 18 08:49:27 platform3 systemd[1]: Caught <ABRT>, dumped core as pid 15839.
May 18 08:49:27 platform3 systemd[1]: Freezing execution. 

busctl output looks like below

ubuntu@platform3:~/logs$ busctl
NAME                               PID PROCESS         USER             CONNECTION    UNIT                      SESSION    DESCRIPTION
:1.1                               976 systemd-logind  root             :1.1          systemd-logind.service    -          -
:1.3                               971 accounts-daemon root             :1.3          accounts-daemon.service   -          -
:1.5434                          49174 systemctl       root             :1.5434       cron.service              -          -
:1.5435                          49223 systemctl       root             :1.5435       ssh.service               -          -
:1.5436                          49408 busctl          ubuntu           :1.5436       ssh.service               -          -
:1.7                              1109 unattended-upgr root             :1.7          unattended-upgrades.se... -          -
com.ubuntu.LanguageSelector          - -               -                (activatable) -                         -
org.debian.AptXapianIndex            - -               -                (activatable) -                         -
org.freedesktop.Accounts           971 accounts-daemon root             :1.3          accounts-daemon.service   -          -
org.freedesktop.DBus               936 dbus-daemon     messagebus       org.freedesktop.DBus dbus.service              -          -
org.freedesktop.hostname1            - -               -                (activatable) -                         -
org.freedesktop.locale1              - -               -                (activatable) -                         -
org.freedesktop.login1             976 systemd-logind  root             :1.1          systemd-logind.service    -          -
org.freedesktop.network1             - -               -                (activatable) -                         -
org.freedesktop.resolve1             - -               -                (activatable) -                         -
org.freedesktop.systemd1             - -               -                (activatable) -                         -
org.freedesktop.thermald             - -               -                (activatable) -                         -
org.freedesktop.timedate1            - -               -                (activatable) -                         - 

Can someone let me know, how to debug this?

The issue looks similar to the one discussed here. So is this some known systemd issue on Ubuntu 16.04.6?

tuk
  • 231

2 Answers2

5

As it clear from looking through the output of the busctl call published above by you, your org.freedesktop.systemd1 daemon (init.scope unit) isn't active actually:

org.freedesktop.systemd1             - -               -                (activatable) -                         -

However it might be activated like this:

$ systemctl daemon-reexec

Or this:

$ sudo kill 1

Which should ask systemd, in a little bit more insistent manner - by sending a SIGTERM signal to systemd, to do essentially the same - run daemon-reexec. I encountred a look-alike problem after depleting all the free RAM while not having a swap file on the system. And reexecing the systemd daemon totally solved it for me without the need to reboot the machine (Though before it you would need to kill some user processes to free some RAM if the problem was caused in the same way as in my case). As it stated in the systemd's man pages, daemon-reexec is save to use:

   daemon-reexec
       Reexecute the systemd manager. This will serialize the manager
       state, reexecute the process and deserialize the state again. This
       command is of little use except for debugging and package upgrades.
       Sometimes, it might be helpful as a heavy-weight daemon-reload.
       While the daemon is being reexecuted, all sockets systemd listening
       on behalf of user configuration will stay accessible.
folivore
  • 131
  • I have this problem regularly since last 2 weeks during heavhy load (I think). Is it possible to diagnose if it is a problem of depleting RAM? Usually it takes more than 3 hour to recover. – Moberg Dec 15 '20 at 07:56
  • Only solution that worked for me on Ubuntu 18. – Ale Jun 16 '21 at 10:48
1

This has been answered in systemd mailing list. Cross-posting the answer

This bug (https://github.com/systemd/systemd/issues/4444) was fixed in systemd v236.

See if you can use a newer version of systemd.

tuk
  • 231
  • This does not help. How to get rid of this bug WITHOUT rebooting the system, because to determistically reboot the system it is REQUIRED to get rid of this bug first (corporate policy. This bug prevents the authenticated reboot). – Tino Feb 16 '24 at 12:07
  • I think you should ask this question in the github issue. – tuk Feb 16 '24 at 12:32
  • I think GH would be the wrong place, as I am already at SystemD v241 and the problem arose at a completely different layer. What made me angry about your post is the idea that updating SystemD resolves the problem. My experience is exactly the opposite: Usually after updating SystemD problems start to show up. YMMV. FYI I found the solution: https://serverfault.com/a/1153712/59497 but as I do not know if it is related to the question here I do not post an answer here. – Tino Feb 16 '24 at 13:08