1

Laptop is often shutting down out of a sudden after 20-30 min of heavy usage (almost always in game). Then after rebooting BIOS presents a message that the CPU was too hot and it had to be turned off.

The setup:

  • Dell XPS (with i7-7700 2.8Ghz CPU)
  • Ubuntu 17.10 (basic setup)
  • nVidia GTP 1050 mobile with nVidia 384.9 driver

This did not occur with the same machine, same game and Win10. I suspect I have a bad system setting or missing driver maybe - but couldn't really find how to fix it. Is there anything in Ubuntu that can prevent this - maybe by just throttling the CPU instead of hard shutting it?

itarato
  • 155

2 Answers2

2

Anything you do in software to mitigate the problem is only going to restrict your ability to fully use and enjoy your CPU (except for increasing fan speed, if this is possible). This is a hardware flaw, in the thermal design of the laptop.

While out of scope at a site like this, I'm talking about making sure fans are turning, air can flow around and under the laptop (it's not on a cushion or bed), the CPU is correctly bound to its heatsink or whatever thermal conductor is used with proper thermal paste, there isn't dust build-up that would prevent air movement, and so on.

But it is a sad fact that some laptops are just inadequately designed to cool a CPU if it ever sees heavy use, relying on throttling to make up for poor design.

thomasrutter
  • 36,774
  • I agree completely with your general assessment of laptop thermal design. However, the OP did indicate that "This did not occur with the same machine, same game and Win10." That said, this comment tends to suggest that laptop design may not be at fault, and that there may be an issue with how Ubuntu is managing thermal issues. – richbl Jan 04 '18 at 06:14
  • It is definitely possible that the game runs more poorly, taxing the system harder on Linux than it does on Windows due to general inefficiencies. While that is a disappointing flaw in the game (or Wine if that's what's being used), just fixing that one game would not solve the more general problem that any time the CPU is really taxed it can't cope with it. – thomasrutter Jan 04 '18 at 22:19
0

I used to have similar problems for multiple laptops. It seems that the CPU in the laptops tends to overheat more easily over time and shut down. Replacing the CPU fan and quality thermal paste never helped me in these situations. So far I limited the max frequency on Ubuntu, but it might happen that you just leave your laptop while doing some processing for a moment under the Sun and it just overheats the whole laptop body, causing a shut down eventually.

I learned that the newest laptops with Intel chips don't work with cpufreq-set properly, but only with likwid tools.

Installing this package:

sudo apt install likwid

I wrote the following python script to decrease/increase the max CPU frequency (manipulate_cpu_freq.py) under Ubuntu 18.04 (requires Python 3.7):

#!/usr/bin/python3.7

import argparse
import os
import subprocess

parser = argparse.ArgumentParser(description = "Manipulate CPU frequencies", prefix_chars = '-')
parser.add_argument("-d", "--decrease", help = "decrease the max frequency", type = bool, default = False)
parser.add_argument("-i", "--increase", help = "increase the max frequency", type = bool, default = False)
parser.add_argument("-s", "--silent", help = "silent mode", type = bool, default = False)
args = parser.parse_args()

query_freqs_output = subprocess.run(["likwid-setFrequencies", "-l"], capture_output = True)
query_freqs_output = query_freqs_output.stdout.decode('utf-8').split('\n')[1]
query_freqs_output = query_freqs_output.split(' ')
available_freqs = list(map(float, query_freqs_output))

query_curr_freq_output = subprocess.run(["likwid-setFrequencies", "-p"], capture_output = True)
query_curr_freq_output = query_curr_freq_output.stdout.decode('utf-8').split('\n')[1]
query_curr_freq_output = query_curr_freq_output.split('/')[-1]
current_freq = float(query_curr_freq_output.split(' ')[0])
curr_freq_index = min(range(len(available_freqs)), key = lambda i: abs(available_freqs[i]-current_freq))

if not args.silent:
  print("Available frequencies:", available_freqs)
  print("Current frequency:", current_freq)

if args.decrease:
  print("Decrease the frequency")
  if curr_freq_index == 0:
    print("Warning: Can't decrease the frequency because it is already at min")
    exit(1)

  print("Set to frequency", available_freqs[curr_freq_index-1], "Ghz")
  subprocess.run(["likwid-setFrequencies", "-y", str(available_freqs[curr_freq_index-1])])
  exit(0)

if args.increase:
  print("Increase the frequency")
  if curr_freq_index == len(available_freqs)-1:
    print("Warning: Can't increase the frequency because it is already at max")
    exit(1)

  print("Set to frequency", available_freqs[curr_freq_index+1], "Ghz")
  subprocess.run(["likwid-setFrequencies", "-y", str(available_freqs[curr_freq_index+1])])
  exit(0)

And I use a script running in the background to monitor the CPU temperature (run_cpu_policy.sh):

#!/bin/bash

while true
do
  CPU_TEMP=$(cat /sys/devices/virtual/thermal/thermal_zone0/temp)
  echo CPU Temperature: $(echo ${CPU_TEMP}/1000 | bc)°C
  if [ "$CPU_TEMP" -gt 76000 ]; then
    echo Decrease the max CPU frequency
    sudo manipulate_cpu_freq.py -s 1 -d 1
  fi
  if [ "$CPU_TEMP" -le 68000 ]; then
    echo Increase the max CPU frequency
    sudo manipulate_cpu_freq.py -s 1 -i 1
  fi
  sleep 10
done

Surely, you must check which sys point (e.g. /sys/devices/virtual/thermal/thermal_zone0/temp) contains your CPU temperature and adapt the script above. I increase the CPU max frequency when the temperature is below 68°C and decrease if it is above 76°C. It is very conservative policy, but the temperature may reach quickly above 100°C (around thermal shutdown threshold), if it sits above 80°C permanently thus I try to keep always below 80°C, just to be sure.

I had to develop the above solution yesterday because I got two thermal shutdowns because of the sunny, hot day while running intensive computations on my laptop CPU (Intel i7-6600U) continuously.

You can run the script after every startup with adding to the cron jobs (/etc/crontab):

@reboot root systemd-run --scope sudo -u YOUR_USER screen -dmS cpu_policy /home/YOUR_USER/run_cpu_policy.sh

Be sure to have screen installed:

sudo apt install screen

You can check it while running:

screen -r cpu_policy
kecsap
  • 101