3

Since about one week my LapTop on Ubuntu 16.04 crashes at random moments when under high cpu load. That is either during Audio Conversion via TAudioConverter / Wine or Chess Analysis with Stockfish or Komodo in SCIDvsPC, so it is not program-specific.

It immidiately shuts down without any sign of warning.

Where can I find a log file to post, so as to give you more information?


gratis@Aurora:~$ lsblk

NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda      8:0    0 238,5G  0 disk 
├─sda1   8:1    0 222,6G  0 part /
├─sda2   8:2    0     1K  0 part 
└─sda5   8:5    0  15,9G  0 part [SWAP]

gratis@Aurora:~$ df

Filesystem     1K-blocks      Used Available Use% Mounted on
udev             8153360         0   8153360   0% /dev
tmpfs            1634696      9688   1625008   1% /run
/dev/sda1      229572820 204352820  13535292  94% /
tmpfs            8173468     87072   8086396   2% /dev/shm
tmpfs               5120         4      5116   1% /run/lock
tmpfs            8173468         0   8173468   0% /sys/fs/cgroup
tmpfs            1634696       104   1634592   1% /run/user/1000

gratis@Aurora:~$ free

              total        used        free      shared  buff/cache   available
Mem:       16346940     3572364     5575488      753016     7199088    11623860
Swap:      16690172           0    16690172

gratis@Aurora:~$ grep -i sensor /var/log/syslog*

gratis@Aurora:~$ 

heynnema
  • 70,711
  • Give us the results from the terminal app, of df and free. – heynnema Nov 21 '16 at 08:12
  • Is your CPU overheating? – Ceda EI Nov 21 '16 at 13:05
  • I don't think so. I can run stable for hours performing either of the tasks mentioned above (even on surfaces that don't transport heat well) and sometimes it just crashes after a few minutes. @heynnema: Could you please elaborate on exactly what i should do. I google the terms "ubuntu crash df" and "ubuntu crash free", but didn't find anything useful. – Markus Gratis Nov 22 '16 at 12:50
  • Start the terminal app from the Unity dashboard, and type lsblk and df and free and grep -i sensor /var/log/syslog* (4 separate commands), and then copy all of the output to the clipboard, then edit your question here, and paste the output into your question (put <pre> in front of the beginning of the pasted text, and </pre> at the end). Also, do you run your laptop on battery, or AC power? – heynnema Nov 22 '16 at 15:51
  • If I remember correctly the last crashes happened while it was charging, but I haven't removed the battery if I understand correctly. – Markus Gratis Nov 22 '16 at 17:53
  • remember to start comments with @heynnema if you want to assure that I see them. See my answer, below. – heynnema Nov 23 '16 at 16:47

1 Answers1

0

You have possibly two different problems here.

First:

It appears that you're out of disk space. Some of the programs that you're running may generate huge data files. Open the Disk Usage Analyzer from the Unity dashboard and try to figure out where your disk space has gone.

You may need to re-partition your hard drive to obtain more working space. If you edit your question to include a current-window-only screenshot of gparted, we can explore that if we need to.

Second:

We can't tell if overheating is an issue for you, until we install tools to monitor this.

Open the terminal app from the Unity dash and type the following commands, one at a time:

  • sudo apt-get update
  • sudo apt-get install lm-sensors
  • sudo apt-get install sensord
  • sudo apt-get install sensors-applet
  • sudo apt-get install indicator-sensors

Then calibrate the hardware sensors by typing:

  • man sensors-detect # to read the man pages
  • sudo sensors-detect
  • sensors -f # to see it in action

Then start Hardware Sensors Indicator from the Unity dashboard, and set the preferences to auto-start at login, and to default to Fahrenheit. Monitor this indicator in your top panel and watch that the temperature stays around 120-150 degrees. More and you've possibly got a problem.

heynnema
  • 70,711
  • the indicator-sensors repo isn't properly maintained anymore, but simply using the "sensors" revealed the cpu temp peaks at about 82°C (which is about 30 higher than your 150°F limit), but right next to the value it says: (84°C high, +100°C critical).

    It is true that I am low on disk space, but it just crashed with about 15GB free space. Still I know that for audio conversion large .wav files are temporarily created and it could be the chess engine doesn't exclusively use ram. I would think though the program would rather report an error than the whole OS crash...

    – Markus Gratis Dec 01 '16 at 12:09
  • Indicator-sensors is in his trusty dist (deb http://ppa.launchpad.net/alexmurray/indicator-sensors/ubuntu trusty main), and he's made code changes as recent as July 2015. You can download source at https://launchpad.net/indicator-sensors and https://github.com/alexmurray/indicator-sensors. You can also install psensor as a substitute. 180 degrees hot, and your fans should be running, yes? You might also look at installing thermald. Its .conf file should work for most computers, but you may need to adjust it for yours. Mine is set for 140-145 degrees max. – heynnema Dec 01 '16 at 14:12
  • 1
    If you've got Intel processors, make sure that intel-microcode is installed. This will make sure that the processors are up to the latest revision. After installation and reboot, in terminal, you can type dmesg|grep microcode to see its operation in updating the processors. – heynnema Dec 01 '16 at 14:28
  • I did as you said and the output for dmesg|grep microcode was:

    [ 0.000000] microcode: CPU0 microcode updated early to revision 0x1e, date = 2015-08-13 [ 0.080703] microcode: CPU2 microcode updated early to revision 0x1e, date = 2015-08-13 [ 0.088470] microcode: CPU4 microcode updated early to revision 0x1e, date = 2015-08-13 [ 0.096284] microcode: CPU6 microcode updated early to revision 0x1e, date = 2015-08-13 [ 4.268992] microcode: CPU0 sig=0x306c3, pf=0x20, revision=0x1e

    – Markus Gratis Dec 01 '16 at 19:28
  • [ 4.269013] microcode: CPU1 sig=0x306c3, pf=0x20, revision=0x1e [ 4.269061] microcode: CPU2 sig=0x306c3, pf=0x20, revision=0x1e [ 4.269084] microcode: CPU3 sig=0x306c3, pf=0x20, revision=0x1e [ 4.269128] microcode: CPU4 sig=0x306c3, pf=0x20, revision=0x1e [ 4.269151] microcode: CPU5 sig=0x306c3, pf=0x20, revision=0x1e [ 4.269196] microcode: CPU6 sig=0x306c3, pf=0x20, revision=0x1e – Markus Gratis Dec 01 '16 at 19:28
  • [ 4.269221] microcode: CPU7 sig=0x306c3, pf=0x20, revision=0x1e [ 4.269336] microcode: Microcode Update Driver: v2.01 tigran@aivazian.fsnet.co.uk, Peter Oruba – Markus Gratis Dec 01 '16 at 19:28
  • Good. Lets see if the microcode updates help at all. Are you going to try and install psensor and/or indicator-sensors? How are your fans doing when you're at high temp? – heynnema Dec 01 '16 at 22:04
  • I installed psensor and now have that start with the system. Unfortunately neither adding the indicator-sensors ppa nor compiling it from source worked for me, but I guess it doesn't matter now. It is looking for some config file.

    I tried running thermald: "sudo thermald", but it just goes to the next line starting with user@...:~$. It does not report any error.

    If I just go "thermald" it says: You must be root to run thermald!

    – Markus Gratis Dec 02 '16 at 06:56
  • 1
    Hum, as I recall, the prefs file had to be created manually. Just create an empty file and start indicator-sensors. Thermald is a process (ps auxc|grep thermald)that runs, not an app you run directly. However, if you stop the process, you can run thermald -d and watch it in real time. Check out man thermald and man thermal-conf.xml. – heynnema Dec 02 '16 at 15:13
  • thanks! But it turns out, that wasn't the problem. Maybe also the program is running properly and I'm just misinterpreting the terminal output.

    gratis@Aurora:~$ indicator-sensors

    [main] ERROR: Failed to open icon cache dir /home/gratis/.cache/indicator-sensors/icons

    [aticonfig] DEBUG: Checking for hybrid system with integrated GPU active

    [aticonfig] WARNING: Error calling aticonfig to detect if running on a hybrid system with integrated GPU active: Failed to execute child

    – Markus Gratis Dec 05 '16 at 06:01
  • process "aticonfig" (No such file or directory)

    [dbus-plugin] DEBUG: Acquired a message bus connection

    [dbus-plugin] DEBUG: Acquired the name com.github.alexmurray.IndicatorSensors

    – Markus Gratis Dec 05 '16 at 06:01
  • Are you running indicator-sensors from the terminal, or from the Unity dashboard. It should be the dashboard. Search for "hardware" to find it. Also, please start new comments with @heynnema if you want me to see the comment. It'll flag me that way. – heynnema Dec 05 '16 at 14:31
  • ok, thanks. If I start it from the dashboard its icon it will blink in the applications bar on the left for a couple seconds and then disappear. This always happens when a program fails to load. When I try to run it via the terminal I get the output mentioned above.

    Are there any config presets I can download for thermald? If yes, which would you recommend?

    – Markus Gratis Dec 05 '16 at 17:54
  • What you do is kill the thermald process, then run sudo thermald -d to watch it in real time. That'll give you the info you need to modify a thermal-conf.xml. The default file may work for you, or require only a few mods. If you do the man page I suggested earlier, you'll also see some examples. – heynnema Dec 05 '16 at 20:30