Why is Ubuntu 18.04 shutting down system?

Question

Ever since I upgraded my main system from a ridiculously old open-SUSE distribution to Ubuntu 18.04, I have had problems with the system crashing on me. Whatever is going wrong takes a while to build up. It always happens when system is under a heavy CPU driven load. While normally it would handle that load with no difficulties, now response starts getting slower and slower until it becomes completely unusable. And if I just walk away in the hopes that it will eventually sort things out, the computer powers itself down after several minutes.

It is possible, even likely, that this is a hardware related issue that just happened to show up when I upgraded to Ubuntu 18.04. But until I have a better idea what is happening, I am afraid to upgrade any of my other systems.

I checked all of the /var/log/ logs after the last crash looking for oddities. And found indications that the CPU was overheating, but that it resumed normal operation almost immediately. I did check that all of the CPU fans are running normally. And I installed psensors, but do not really know what temperature levels I should be alarmed about. Is there anything else I can do about possible overheating issues?

/var/log/syslog:

Dec 10 02:37:27 corbin-goul systemd[1]: Started Message of the Day.
Dec 10 03:02:21 corbin-goul systemd[1]: Started Run anacron jobs.
Dec 10 03:02:21 corbin-goul anacron[15057]: Anacron 2.3 started on 2018-12-10
Dec 10 03:02:21 corbin-goul anacron[15057]: Normal exit (0 jobs run)
Dec 10 03:17:01 corbin-goul CRON[19073]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Dec 10 03:53:49 corbin-goul gnome-software[3017]: no app for changed [email protected]
Dec 10 03:53:49 corbin-goul gnome-software[3017]: no app for changed [email protected]
Dec 10 03:53:49 corbin-goul gvfsd-metadata[2373]: g_udev_device_has_property: assertion 'G_UDEV_IS_DEVICE (device)' failed
Dec 10 03:53:49 corbin-goul gvfsd-metadata[2373]: g_udev_device_has_property: assertion 'G_UDEV_IS_DEVICE (device)' failed
Dec 10 03:53:49 corbin-goul gnome-shell[2195]: [AppIndicatorSupport-DEBUG] Registering StatusNotifierItem :1.79/org/ayatana/NotificationItem/software_update_available
Dec 10 04:00:01 corbin-goul CRON[30688]: (root) CMD (/root/bin/sysBackup >/root/sysbackup.log 2>&1)
Dec 10 04:00:01 corbin-goul kernel: [51285.652208] EXT4-fs (dm-3): mounted filesystem with ordered data mode. Opts: acl,user_xattr
Dec 10 04:00:15 corbin-goul systemd[1]: Started Run anacron jobs.
Dec 10 04:00:15 corbin-goul anacron[30810]: Anacron 2.3 started on 2018-12-10
Dec 10 04:00:15 corbin-goul anacron[30810]: Normal exit (0 jobs run)
Dec 10 04:02:13 corbin-goul kernel: [51417.101352] CPU1: Core temperature above threshold, cpu clock throttled (total events = 37241)
Dec 10 04:02:13 corbin-goul kernel: [51417.101364] CPU0: Package temperature above threshold, cpu clock throttled (total events = 43282)
Dec 10 04:02:13 corbin-goul kernel: [51417.101367] CPU3: Package temperature above threshold, cpu clock throttled (total events = 42497)
Dec 10 04:02:13 corbin-goul kernel: [51417.101368] CPU2: Package temperature above threshold, cpu clock throttled (total events = 42919)
Dec 10 04:02:13 corbin-goul kernel: [51417.101371] CPU1: Package temperature above threshold, cpu clock throttled (total events = 43144)
Dec 10 04:02:13 corbin-goul kernel: [51417.102358] CPU1: Core temperature/speed normal
Dec 10 04:02:13 corbin-goul kernel: [51417.102359] CPU0: Package temperature/speed normal
Dec 10 04:02:13 corbin-goul kernel: [51417.102373] CPU1: Package temperature/speed normal
Dec 10 04:02:13 corbin-goul kernel: [51417.102379] CPU3: Package temperature/speed normal
Dec 10 04:02:13 corbin-goul kernel: [51417.102379] CPU2: Package temperature/speed normal
Dec 10 04:03:57 corbin-goul dbus-daemon[1319]: [system] Activating via systemd: service name='org.bluez' unit='dbus-org.bluez.service' requested by ':1.784' (uid=1000 pid=12967 comm="/opt/google/chrome/chrome     " label="unconfined")
Dec 10 04:03:58 corbin-goul org.gnome.Shell.desktop[2195]: Created new window in existing browser session.
Dec 10 04:04:03 corbin-goul org.gnome.Shell.desktop[2195]: [5590:5590:1210/040403.152180:ERROR:input_method_base.cc(146)] Not implemented reached in virtual ui::InputMethodKeyboardController *ui::InputMethodBase::GetInputMethodKeyboardController()Using InputMethodKeyboardControllerStub
Dec 10 04:07:13 corbin-goul kernel: [51717.104549] CPU1: Core temperature/speed normal
Dec 10 04:07:13 corbin-goul kernel: [51717.104582] CPU3: Package temperature/speed normal
Dec 10 04:07:13 corbin-goul kernel: [51717.104605] CPU2: Package temperature/speed normal
Dec 10 04:07:13 corbin-goul kernel: [51717.104612] CPU0: Package temperature/speed normal
Dec 10 04:07:13 corbin-goul kernel: [51717.104669] CPU1: Package temperature/speed normal
Dec 10 04:08:41 corbin-goul colord[1590]: failed to get session [pid 31199]: No data available
Dec 10 04:09:47 corbin-goul kernel: [51870.886383] perf: interrupt took too long (6307 > 6247), lowering kernel.perf_event_max_sample_rate to 31500
Dec 10 04:08:52 corbin-goul colord[1590]: message repeated 4 times: [ failed to get session [pid 31199]: No data available]

/var/log/kern

Dec 10 04:00:01 corbin-goul kernel: [51285.652208] EXT4-fs (dm-3): mounted filesystem with ordered data mode. Opts: acl,user_xattr
Dec 10 04:02:13 corbin-goul kernel: [51417.101352] CPU1: Core temperature above threshold, cpu clock throttled (total events = 37241)
Dec 10 04:02:13 corbin-goul kernel: [51417.101364] CPU0: Package temperature above threshold, cpu clock throttled (total events = 43282)
Dec 10 04:02:13 corbin-goul kernel: [51417.101367] CPU3: Package temperature above threshold, cpu clock throttled (total events = 42497)
Dec 10 04:02:13 corbin-goul kernel: [51417.101368] CPU2: Package temperature above threshold, cpu clock throttled (total events = 42919)
Dec 10 04:02:13 corbin-goul kernel: [51417.101371] CPU1: Package temperature above threshold, cpu clock throttled (total events = 43144)
Dec 10 04:02:13 corbin-goul kernel: [51417.102358] CPU1: Core temperature/speed normal
Dec 10 04:02:13 corbin-goul kernel: [51417.102359] CPU0: Package temperature/speed normal
Dec 10 04:02:13 corbin-goul kernel: [51417.102373] CPU1: Package temperature/speed normal
Dec 10 04:02:13 corbin-goul kernel: [51417.102379] CPU3: Package temperature/speed normal
Dec 10 04:02:13 corbin-goul kernel: [51417.102379] CPU2: Package temperature/speed normal
Dec 10 04:07:13 corbin-goul kernel: [51717.104549] CPU1: Core temperature/speed normal
Dec 10 04:07:13 corbin-goul kernel: [51717.104582] CPU3: Package temperature/speed normal
Dec 10 04:07:13 corbin-goul kernel: [51717.104605] CPU2: Package temperature/speed normal
Dec 10 04:07:13 corbin-goul kernel: [51717.104612] CPU0: Package temperature/speed normal
Dec 10 04:07:13 corbin-goul kernel: [51717.104669] CPU1: Package temperature/speed normal
Dec 10 04:09:47 corbin-goul kernel: [51870.886383] perf: interrupt took too long (6307 > 6247), lowering kernel.perf_event_max_sample_rate to 31500

ken@corbin-goul:~$ sensors

acpitz-virtual-0
Adapter: Virtual device
temp1:        +27.8°C  (crit = +105.0°C)
temp2:        +29.8°C  (crit = +105.0°C)

coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +52.0°C  (high = +80.0°C, crit = +100.0°C)
Core 0:        +51.0°C  (high = +80.0°C, crit = +100.0°C)
Core 1:        +52.0°C  (high = +80.0°C, crit = +100.0°C)
Core 2:        +50.0°C  (high = +80.0°C, crit = +100.0°C)
Core 3:        +51.0°C  (high = +80.0°C, crit = +100.0°C)

On boot you should be able to choose the "memtest" option. Might be worthwhile doing this to see if the problem reoccurs or it reports problems. — , Dec 10 '18 at 18:13
That doesn't look like a problem, actually. CPU throttling is normal when the system starts to heat up. Can you show us a log from one of the times when the machine shut down? Also, is this a laptop or a desktop? When did you last change the thermal paste on the CPU? Finally, what temperatures are you seeing? Can you show us the temperatures when the machine is idle and also show the output of sensors so we can see the relevant thresholds. — terdon, Dec 10 '18 at 18:17
Good point about the temps. How do you know that you are having an issue with temps, as these messages are pretty common, and all modern CPUs support stepping based on temperature, load, and wattage. — , Dec 10 '18 at 18:40
The included logs end at the point where it crashed. In this case I did not discover the system was down until 4 hours later, so it was pretty obvious just when the crash occurred.
This is a desktop system that doubles as the office server. It is a pretty old system, probably due for a CPU upgrade. But it serves my purposes. I did vacuum out the dust in the case. Will try to find some canned air to blow out the fans.

If I am reading this right, ambient temperature is 28. Have 4 cores that currently read 50-52. Reported past range is 26-100. — kencorbin, Dec 12 '18 at 00:30
The fact that the system powers down instead of just locking up suggests some kind of hardware issue. And the temperature warnings in the logs hinted that it might be overheating. But I really do not know for certain what is going on. — kencorbin, Dec 12 '18 at 00:50
There isn't anything in the logs suggesting a problem, as far as I can see. The CPU1: Package temperature above threshold, cpu clock throttled are normal. That just means that the CPU was getting too warm, so its clock speed was throttled. Good. If that hadn't happened you would have had a problem. Have a look (and post, if possible) the output of journalctl -b -1 -n250. That should show the log messages up to the last time the machine was powered down. If that was when it crashed, there might be something useful there. — terdon, Dec 13 '18 at 14:01

Why is Ubuntu 18.04 shutting down system?

0 Answers0