Ever since I upgraded my main system from a ridiculously old open-SUSE distribution to Ubuntu 18.04, I have had problems with the system crashing on me. Whatever is going wrong takes a while to build up. It always happens when system is under a heavy CPU driven load. While normally it would handle that load with no difficulties, now response starts getting slower and slower until it becomes completely unusable. And if I just walk away in the hopes that it will eventually sort things out, the computer powers itself down after several minutes.
It is possible, even likely, that this is a hardware related issue that just happened to show up when I upgraded to Ubuntu 18.04. But until I have a better idea what is happening, I am afraid to upgrade any of my other systems.
I checked all of the /var/log/
logs after the last crash looking for oddities. And found indications that the CPU was overheating, but that it resumed normal operation almost immediately. I did check that all of the CPU fans are running normally. And I installed psensors
, but do not really know what temperature levels I should be alarmed about. Is there anything else I can do about possible overheating issues?
/var/log/syslog:
Dec 10 02:37:27 corbin-goul systemd[1]: Started Message of the Day.
Dec 10 03:02:21 corbin-goul systemd[1]: Started Run anacron jobs.
Dec 10 03:02:21 corbin-goul anacron[15057]: Anacron 2.3 started on 2018-12-10
Dec 10 03:02:21 corbin-goul anacron[15057]: Normal exit (0 jobs run)
Dec 10 03:17:01 corbin-goul CRON[19073]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Dec 10 03:53:49 corbin-goul gnome-software[3017]: no app for changed ubuntu-dock@ubuntu.com
Dec 10 03:53:49 corbin-goul gnome-software[3017]: no app for changed ubuntu-appindicators@ubuntu.com
Dec 10 03:53:49 corbin-goul gvfsd-metadata[2373]: g_udev_device_has_property: assertion 'G_UDEV_IS_DEVICE (device)' failed
Dec 10 03:53:49 corbin-goul gvfsd-metadata[2373]: g_udev_device_has_property: assertion 'G_UDEV_IS_DEVICE (device)' failed
Dec 10 03:53:49 corbin-goul gnome-shell[2195]: [AppIndicatorSupport-DEBUG] Registering StatusNotifierItem :1.79/org/ayatana/NotificationItem/software_update_available
Dec 10 04:00:01 corbin-goul CRON[30688]: (root) CMD (/root/bin/sysBackup >/root/sysbackup.log 2>&1)
Dec 10 04:00:01 corbin-goul kernel: [51285.652208] EXT4-fs (dm-3): mounted filesystem with ordered data mode. Opts: acl,user_xattr
Dec 10 04:00:15 corbin-goul systemd[1]: Started Run anacron jobs.
Dec 10 04:00:15 corbin-goul anacron[30810]: Anacron 2.3 started on 2018-12-10
Dec 10 04:00:15 corbin-goul anacron[30810]: Normal exit (0 jobs run)
Dec 10 04:02:13 corbin-goul kernel: [51417.101352] CPU1: Core temperature above threshold, cpu clock throttled (total events = 37241)
Dec 10 04:02:13 corbin-goul kernel: [51417.101364] CPU0: Package temperature above threshold, cpu clock throttled (total events = 43282)
Dec 10 04:02:13 corbin-goul kernel: [51417.101367] CPU3: Package temperature above threshold, cpu clock throttled (total events = 42497)
Dec 10 04:02:13 corbin-goul kernel: [51417.101368] CPU2: Package temperature above threshold, cpu clock throttled (total events = 42919)
Dec 10 04:02:13 corbin-goul kernel: [51417.101371] CPU1: Package temperature above threshold, cpu clock throttled (total events = 43144)
Dec 10 04:02:13 corbin-goul kernel: [51417.102358] CPU1: Core temperature/speed normal
Dec 10 04:02:13 corbin-goul kernel: [51417.102359] CPU0: Package temperature/speed normal
Dec 10 04:02:13 corbin-goul kernel: [51417.102373] CPU1: Package temperature/speed normal
Dec 10 04:02:13 corbin-goul kernel: [51417.102379] CPU3: Package temperature/speed normal
Dec 10 04:02:13 corbin-goul kernel: [51417.102379] CPU2: Package temperature/speed normal
Dec 10 04:03:57 corbin-goul dbus-daemon[1319]: [system] Activating via systemd: service name='org.bluez' unit='dbus-org.bluez.service' requested by ':1.784' (uid=1000 pid=12967 comm="/opt/google/chrome/chrome " label="unconfined")
Dec 10 04:03:58 corbin-goul org.gnome.Shell.desktop[2195]: Created new window in existing browser session.
Dec 10 04:04:03 corbin-goul org.gnome.Shell.desktop[2195]: [5590:5590:1210/040403.152180:ERROR:input_method_base.cc(146)] Not implemented reached in virtual ui::InputMethodKeyboardController *ui::InputMethodBase::GetInputMethodKeyboardController()Using InputMethodKeyboardControllerStub
Dec 10 04:07:13 corbin-goul kernel: [51717.104549] CPU1: Core temperature/speed normal
Dec 10 04:07:13 corbin-goul kernel: [51717.104582] CPU3: Package temperature/speed normal
Dec 10 04:07:13 corbin-goul kernel: [51717.104605] CPU2: Package temperature/speed normal
Dec 10 04:07:13 corbin-goul kernel: [51717.104612] CPU0: Package temperature/speed normal
Dec 10 04:07:13 corbin-goul kernel: [51717.104669] CPU1: Package temperature/speed normal
Dec 10 04:08:41 corbin-goul colord[1590]: failed to get session [pid 31199]: No data available
Dec 10 04:09:47 corbin-goul kernel: [51870.886383] perf: interrupt took too long (6307 > 6247), lowering kernel.perf_event_max_sample_rate to 31500
Dec 10 04:08:52 corbin-goul colord[1590]: message repeated 4 times: [ failed to get session [pid 31199]: No data available]
/var/log/kern
Dec 10 04:00:01 corbin-goul kernel: [51285.652208] EXT4-fs (dm-3): mounted filesystem with ordered data mode. Opts: acl,user_xattr
Dec 10 04:02:13 corbin-goul kernel: [51417.101352] CPU1: Core temperature above threshold, cpu clock throttled (total events = 37241)
Dec 10 04:02:13 corbin-goul kernel: [51417.101364] CPU0: Package temperature above threshold, cpu clock throttled (total events = 43282)
Dec 10 04:02:13 corbin-goul kernel: [51417.101367] CPU3: Package temperature above threshold, cpu clock throttled (total events = 42497)
Dec 10 04:02:13 corbin-goul kernel: [51417.101368] CPU2: Package temperature above threshold, cpu clock throttled (total events = 42919)
Dec 10 04:02:13 corbin-goul kernel: [51417.101371] CPU1: Package temperature above threshold, cpu clock throttled (total events = 43144)
Dec 10 04:02:13 corbin-goul kernel: [51417.102358] CPU1: Core temperature/speed normal
Dec 10 04:02:13 corbin-goul kernel: [51417.102359] CPU0: Package temperature/speed normal
Dec 10 04:02:13 corbin-goul kernel: [51417.102373] CPU1: Package temperature/speed normal
Dec 10 04:02:13 corbin-goul kernel: [51417.102379] CPU3: Package temperature/speed normal
Dec 10 04:02:13 corbin-goul kernel: [51417.102379] CPU2: Package temperature/speed normal
Dec 10 04:07:13 corbin-goul kernel: [51717.104549] CPU1: Core temperature/speed normal
Dec 10 04:07:13 corbin-goul kernel: [51717.104582] CPU3: Package temperature/speed normal
Dec 10 04:07:13 corbin-goul kernel: [51717.104605] CPU2: Package temperature/speed normal
Dec 10 04:07:13 corbin-goul kernel: [51717.104612] CPU0: Package temperature/speed normal
Dec 10 04:07:13 corbin-goul kernel: [51717.104669] CPU1: Package temperature/speed normal
Dec 10 04:09:47 corbin-goul kernel: [51870.886383] perf: interrupt took too long (6307 > 6247), lowering kernel.perf_event_max_sample_rate to 31500
ken@corbin-goul:~$ sensors
acpitz-virtual-0
Adapter: Virtual device
temp1: +27.8°C (crit = +105.0°C)
temp2: +29.8°C (crit = +105.0°C)
coretemp-isa-0000
Adapter: ISA adapter
Package id 0: +52.0°C (high = +80.0°C, crit = +100.0°C)
Core 0: +51.0°C (high = +80.0°C, crit = +100.0°C)
Core 1: +52.0°C (high = +80.0°C, crit = +100.0°C)
Core 2: +50.0°C (high = +80.0°C, crit = +100.0°C)
Core 3: +51.0°C (high = +80.0°C, crit = +100.0°C)
sensors
so we can see the relevant thresholds. – terdon Dec 10 '18 at 18:17This is a desktop system that doubles as the office server. It is a pretty old system, probably due for a CPU upgrade. But it serves my purposes. I did vacuum out the dust in the case. Will try to find some canned air to blow out the fans.
If I am reading this right, ambient temperature is 28. Have 4 cores that currently read 50-52. Reported past range is 26-100.
– kencorbin Dec 12 '18 at 00:30CPU1: Package temperature above threshold, cpu clock throttled
are normal. That just means that the CPU was getting too warm, so its clock speed was throttled. Good. If that hadn't happened you would have had a problem. Have a look (and post, if possible) the output ofjournalctl -b -1 -n250
. That should show the log messages up to the last time the machine was powered down. If that was when it crashed, there might be something useful there. – terdon Dec 13 '18 at 14:01