1

My Asus Vivobook K571GT dual booting in Ubuntu 20.04 is recently started shutting down due to high temperature (reaching 99c+). These temperature are reached only when the laptop is plugged in.

The BIOS is updated to the latest, Ubuntu updated to the latest kernel. I've seen it might be due to nvidia driver not installed properly, so I tried a bunch of different nvidia drivers (460, 470 & 495). Tried disabling nvdia altogether running only with the integrated GPU. They all had the same results, when plugged in the temperature spike from a respectable 40c-45c to 95c in a second (without that much CPU load, i.e. running the apt update command will make the CPU temperature rise to 90c+), if I don't stop what I am doing or a command is running & I can't stop it in time the CPU will hit the 100c mark which trigger the shutdown. Interestingly if I unplugged while I get a high temperature warning the temperature goes back down to 45-50c in a second.

Has anyone experience something similar? The only thing I can think of for the rapid CPU temperature spike when plugged in but not on battery is the CPU getting "overclocked" when somehow. I'm not sure how I can verify this & if it somehow does how to prevent this from happening? An hardware issue like the AC adapter providing too much power?

Edit

grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_driver

/sys/devices/system/cpu/cpu0/cpufreq/scaling_driver:intel_pstate
/sys/devices/system/cpu/cpu10/cpufreq/scaling_driver:intel_pstate
/sys/devices/system/cpu/cpu11/cpufreq/scaling_driver:intel_pstate
/sys/devices/system/cpu/cpu1/cpufreq/scaling_driver:intel_pstate
/sys/devices/system/cpu/cpu2/cpufreq/scaling_driver:intel_pstate
/sys/devices/system/cpu/cpu3/cpufreq/scaling_driver:intel_pstate
/sys/devices/system/cpu/cpu4/cpufreq/scaling_driver:intel_pstate
/sys/devices/system/cpu/cpu5/cpufreq/scaling_driver:intel_pstate
/sys/devices/system/cpu/cpu6/cpufreq/scaling_driver:intel_pstate
/sys/devices/system/cpu/cpu7/cpufreq/scaling_driver:intel_pstate
/sys/devices/system/cpu/cpu8/cpufreq/scaling_driver:intel_pstate
/sys/devices/system/cpu/cpu9/cpufreq/scaling_driver:intel_pstate

grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor:powersave
/sys/devices/system/cpu/cpu10/cpufreq/scaling_governor:powersave
/sys/devices/system/cpu/cpu11/cpufreq/scaling_governor:powersave
/sys/devices/system/cpu/cpu1/cpufreq/scaling_governor:powersave
/sys/devices/system/cpu/cpu2/cpufreq/scaling_governor:powersave
/sys/devices/system/cpu/cpu3/cpufreq/scaling_governor:powersave
/sys/devices/system/cpu/cpu4/cpufreq/scaling_governor:powersave
/sys/devices/system/cpu/cpu5/cpufreq/scaling_governor:powersave
/sys/devices/system/cpu/cpu6/cpufreq/scaling_governor:powersave
/sys/devices/system/cpu/cpu7/cpufreq/scaling_governor:powersave
/sys/devices/system/cpu/cpu8/cpufreq/scaling_governor:powersave
/sys/devices/system/cpu/cpu9/cpufreq/scaling_governor:powersave

grep "model name" /proc/cpuinfo

model name  : Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz

cat /sys/devices/system/cpu/intel_pstate/no_turbo

0

Edit

ps auxc | grep -i therm

root         167  0.0  0.0      0     0 ?        I<   10:18   0:00 acpi_thermal_pm
root        1049  0.0  0.0 128808  9456 ?        Ssl  10:18   0:00 thermald

sudo dmidecode -s bios-version

X571GT.311

ls -al /etc/thermald

total 28
drwxr-xr-x   2 root root  4096 Sep  8 13:48 .
drwxr-xr-x 148 root root 12288 Nov  2 12:01 ..
-rw-r--r--   1 root root  4605 Jan 14  2019 thermal-conf.xml
-rw-r--r--   1 root root   508 Jan 14  2019 thermal-cpu-cdev-order.xml

The laptop is just a year or two old. The latest BIOS update was release just a couple of weeks ago.

cat /etc/thermald/thermal-conf.xml

<?xml version="1.0"?>

<!-- use "man thermal-conf.xml" for details -->

<!-- BEGIN --> <ThermalConfiguration> <Platform> <Name>Generic X86 Laptop Device</Name> <ProductName>EXAMPLE_SYSTEM</ProductName> <Preference>QUIET</Preference> <ThermalSensors> <ThermalSensor> <Type>TSKN</Type> <AsyncCapable>1</AsyncCapable> </ThermalSensor> </ThermalSensors> <ThermalZones> <ThermalZone> <Type>SKIN</Type> <TripPoints> <TripPoint> <SensorType>TSKN</SensorType> <Temperature>55000</Temperature> <type>passive</type> <ControlType>SEQUENTIAL</ControlType> <CoolingDevice> <index>1</index> <type>rapl_controller</type> <influence> 100 </influence> <SamplingPeriod> 16 </SamplingPeriod> </CoolingDevice> <CoolingDevice> <index>2</index> <type>intel_powerclamp</type> <influence> 100 </influence> <SamplingPeriod> 12 </SamplingPeriod> </CoolingDevice> </TripPoint> </TripPoints> </ThermalZone> </ThermalZones> </Platform>

<!-- Thermal configuration example only --> <Platform> <Name>Example Platform Name</Name> <!--UUID is optional, if present this will be matched --> <!-- Both product name and UUID can contain wild card "", which matches any platform --> <UUID>Example UUID</UUID> <ProductName>Example Product Name</ProductName> <Preference>QUIET</Preference> <ThermalSensors> <ThermalSensor> <!-- New Sensor with a type and path --> <Type>example_sensor_1</Type> <Path>/some_path</Path> <AsyncCapable>0</AsyncCapable> </ThermalSensor> <ThermalSensor> <!-- Already present in thermal sysfs, enable this or add/change config For example, here we are indicating that sensor can do async events to avoid polling --> <Type>example_thermal_sysfs_sensor</Type> <!-- If async capable, then we don't need to poll --> <AsyncCapable>1</AsyncCapable> </ThermalSensor> <ThermalSensor> <!-- Examle of a virtual sensor. This sensor depends on other real sensor or virtual sensor. E.g. here the temp will be temp of example_sensor_1 0.5 + 10 --> <Type>example_virtual_sensor</Type> <Virtual>1</Virtual> <SensorLink> <SensorType>example_sensor_1</SensorType> <Multiplier> 0.5 </Multiplier> <Offset> 10 </Offset> </SensorLink> </ThermalSensor>

&lt;/ThermalSensors&gt;
&lt;ThermalZones&gt;
    &lt;ThermalZone&gt;
        &lt;Type&gt;Example Zone type&lt;/Type&gt;
        &lt;TripPoints&gt;
            &lt;TripPoint&gt;
                &lt;SensorType&gt;example_sensor_1&lt;/SensorType&gt;
                &lt;!-- Temperature at which to take action --&gt;
                &lt;Temperature&gt; 75000 &lt;/Temperature&gt;
                &lt;!-- max/passive/active
                    If a MAX type is specified, then
                    daemon will use PID control
                    to aggresively throttle to avoid
                    reaching this temp.
                 --&gt;
                &lt;type&gt;max&lt;/type&gt;
                &lt;!-- SEQUENTIAL | PARALLEL
                When a trip point temp is violated, then
                number of cooling device can be activated.
                If control type is SEQUENTIAL then
                It will exhaust first cooling device before trying
                next.
                --&gt;
                &lt;ControlType&gt;SEQUENTIAL&lt;/ControlType&gt;
                &lt;CoolingDevice&gt;
                    &lt;index&gt;1&lt;/index&gt;
                    &lt;type&gt;example_cooling_device&lt;/type&gt;
                    &lt;!-- Influence will be used order cooling devices.
                        First cooling device will be used, which has
                        highest influence.
                    --&gt;
                    &lt;influence&gt; 100 &lt;/influence&gt;
                    &lt;!-- Delay in using this cdev, this takes some time
                    too actually cool a zone
                    --&gt;
                    &lt;SamplingPeriod&gt; 12 &lt;/SamplingPeriod&gt;
                &lt;/CoolingDevice&gt;
            &lt;/TripPoint&gt;

        &lt;/TripPoints&gt;
    &lt;/ThermalZone&gt;
&lt;/ThermalZones&gt;
&lt;CoolingDevices&gt;
    &lt;CoolingDevice&gt;
        &lt;!--
            Cooling device can be specified
            by a type and optionally a sysfs path
            If the type already present in thermal sysfs
            no need of a path.
            Compensation can use min/max and step size
            to increasing cool the system.
            Debounce period can be used to force
            a waiting period for action
        --&gt;
        &lt;Type&gt;example_cooling_device&lt;/Type&gt;
        &lt;MinState&gt;0&lt;/MinState&gt;
        &lt;IncDecStep&gt;10&lt;/IncDecStep&gt;
        &lt;ReadBack&gt; 0 &lt;/ReadBack&gt;
        &lt;MaxState&gt;50&lt;/MaxState&gt;
        &lt;DebouncePeriod&gt;5000&lt;/DebouncePeriod&gt;
        &lt;!--
            If there are no PID parameter
            compensation increase step wise and exponentaially
            if single step is not able to change trend.
            Alternatively a PID parameters can be specified
            then next step will use PID calculation using
            provided PID constants.
        --&gt;&gt;
        &lt;PidControl&gt;
            &lt;kp&gt;0.001&lt;/kp&gt;
            &lt;kd&gt;0.0001&lt;/kd&gt;
            &lt;ki&gt;0.0001&lt;/ki&gt;
        &lt;/PidControl&gt;
    &lt;/CoolingDevice&gt;
&lt;/CoolingDevices&gt;

</Platform> </ThermalConfiguration> <!-- END -->

top

top - 13:16:27 up  1:37,  1 user,  load average: 0.85, 1.32, 1.11
Tasks: 487 total,   2 running, 484 sleeping,   1 stopped,   0 zombie
%Cpu(s):  5.1 us,  2.0 sy,  1.5 ni, 90.6 id,  0.1 wa,  0.0 hi,  0.7 si,  0.0 st
GiB Mem :     15.5 total,      4.5 free,      5.0 used,      5.9 buff/cache
GiB Swap:      2.0 total,      2.0 free,      0.0 used.     10.1 avail Mem
PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                       

35883 root 39 19 84636 68132 12616 R 19.8 0.4 0:00.60 apt-check
4842 haleks 20 0 4487900 483220 120988 S 2.6 3.0 1:49.49 gnome-shell
7291 haleks 20 0 923372 60172 45804 S 2.3 0.4 1:34.25 psensor
32705 haleks 20 0 24.5g 130676 77652 S 2.3 0.8 0:14.20 brave
975 message+ 20 0 40380 34872 4068 S 1.0 0.2 0:31.14 dbus-daemon
1002 root 20 0 2332860 32620 16456 S 1.0 0.2 0:05.98 snapd
4555 haleks 20 0 24.7g 147872 79744 S 1.0 0.9 1:10.25 Xorg
5229 haleks 20 0 2258744 131912 45796 S 1.0 0.8 1:16.97 keybase
35782 root 20 0 287276 16044 14104 S 1.0 0.1 0:00.03 packagekitd
663 root -51 0 0 0 0 S 0.7 0.0 0:38.09 irq/152-nvidia
21473 haleks 20 0 819496 53768 39012 S 0.7 0.3 0:07.86 gnome-terminal-
32564 haleks 20 0 16.6g 410380 190120 S 0.7 2.5 0:42.65 brave
32596 haleks 20 0 16.6g 182632 87372 S 0.7 1.1 0:47.20 brave
34076 root 20 0 25368 13280 7900 S 0.7 0.1 0:00.16 apt
357 root 19 -1 68944 30764 29000 S 0.3 0.2 0:01.12 systemd-journal
387 root 20 0 24164 7796 4236 S 0.3 0.0 0:02.20 systemd-udevd
517 root -51 0 0 0 0 S 0.3 0.0 0:00.73 irq/148-iwlwifi
992 root 20 0 235188 10276 6928 S 0.3 0.1 0:02.17 polkitd
1065 root 20 0 716580 12360 9072 S 0.3 0.1 0:01.60 canonical-livep
1349 gdm 20 0 317300 9004 7968 S 0.3 0.1 0:00.28 goa-identity-se
1864 root 20 0 2432052 150584 31964 S 0.3 0.9 0:07.40 lxd
4545 haleks 20 0 8748 5860 4012 S 0.3 0.0 0:01.37 dbus-daemon
5448 haleks 20 0 2370936 172572 33964 S 0.3 1.1 0:27.26 kbfsfuse
7473 haleks 20 0 503408 143448 66476 S 0.3 0.9 0:35.84 Keybase
7575 haleks 20 0 463344 40076 32528 S 0.3 0.2 0:00.39 update-notifier
10111 haleks 20 0 582224 166968 80480 S 0.3 1.0 0:37.21 gitkraken
32662 haleks 20 0 24.4g 121680 81520 S 0.3 0.7 0:03.68 brave
35783 root 20 0 24164 5228 1652 S 0.3 0.0 0:00.01 systemd-udevd
35784 root 20 0 24164 5228 1652 S 0.3 0.0 0:00.01 systemd-udevd
35786 root 20 0 24164 5228 1652 S 0.3 0.0 0:00.01 systemd-udevd
1 root 20 0 168176 12092 8296 S 0.0 0.1 0:08.88 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:00.02 kthreadd
3 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 rcu_gp
4 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 rcu_par_gp
6 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kworker/0:0H-kblockd
9 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 mm_percpu_wq
10 root 20 0 0 0 0 S 0.0 0.0 0:00.11 ksoftirqd/0
11 root 20 0 0 0 0 I 0.0 0.0 0:09.66 rcu_sched
12 root rt 0 0 0 0 S 0.0 0.0 0:00.02 migration/0
13 root -51 0 0 0 0 S 0.0 0.0 0:00.00 idle_inject/0
14 root 20 0 0 0 0 S 0.0 0.0 0:00.00 cpuhp/0
15 root 20 0 0 0 0 S 0.0 0.0 0:00.00 cpuhp/1
16 root -51 0 0 0 0 S 0.0 0.0 0:00.00 idle_inject/1
17 root rt 0 0 0 0 S 0.0 0.0 0:00.18 migration/1
18 root 20 0 0 0 0 S 0.0 0.0 0:00.06 ksoftirqd/1

  • What CPU frequency scaling driver? grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_driver. What governor, plugged in and unplugged? grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor. What CPU make and model? grep "model name" /proc/cpuinfo. Is turbo enabled, plugged in and unplugged? (Method is driver dependant, intel_pstate shown): cat /sys/devices/system/cpu/intel_pstate/no_turbo. – Doug Smythies Nov 03 '21 at 14:41
  • Edit your question and show me ps auxc | grep -i therm and sudo dmidecode -s bios-version. How old is this laptop? Is it very dusty? Start comments to me with @heynnema or I'll miss them. – heynnema Nov 03 '21 at 14:55
  • Reset the Power Manager by shutting down the laptop, then holding down the POWER button for ~20 seconds, then reboot and retest. – heynnema Nov 03 '21 at 15:02
  • BIOS is current. Show me ls -al /etc/thermald. – heynnema Nov 03 '21 at 15:04
  • And/or set a lower trip point temperature for thermald. Is turbo disabled when you are unplugged, or no change? – Doug Smythies Nov 03 '21 at 15:04
  • Show me cat /etc/thermald/thermal-conf.xml and top. – heynnema Nov 03 '21 at 15:06
  • @DougSmythies I'm not sure about turbo & thermald trip point. How I can verify this? – haleksandre Nov 03 '21 at 15:09
  • I'll try the Power Management reset @heynnema suggested & post if there is any changes. – haleksandre Nov 03 '21 at 15:10
  • Rename /etc/thermald/thermal-conf.xml to thermal-conf.xml.HOLD and restart thermald and retest. – heynnema Nov 03 '21 at 15:11
  • I agree with @heynnema on thermald. – Doug Smythies Nov 03 '21 at 15:20

1 Answers1

2

Your /etc/thermald/thermal-conf.xml is incorrect. It's two example files tacked together.

Try this somewhat generic .xml file shown below.

Note: You may end up customizing the following line...

<Temperature>60000</Temperature>

Then restart thermald with:

sudo systemctl restart thermald

<?xml version="1.0"?>
<ThermalConfiguration>
  <Platform>
    <Name>Override CPU default passive</Name>
    <ProductName>*</ProductName>
    <Preference>QUIET</Preference>
    <ThermalZones>
      <ThermalZone>
        <Type>cpu</Type>
        <TripPoints>
          <TripPoint>
            <Temperature>60000</Temperature>
            <type>passive</type>
          </TripPoint>
        </TripPoints>
      </ThermalZone>
    </ThermalZones>
  </Platform>
</ThermalConfiguration>
heynnema
  • 70,711
  • I've updated the configuration file. So far it seems to have helped. I'll keep testing with the laptop plugged in throughout the day & report if I've had CPU spike 90c+. Thanks for your help really appreciate it! – haleksandre Nov 03 '21 at 15:31
  • @haleksandre Good! You didn't show me the top command yet. – heynnema Nov 03 '21 at 15:34
  • @haleksandre Do a sudo apt update while running top and look for cpu throttling processes at the same time, and monitor the temps. – heynnema Nov 03 '21 at 15:42
  • After a few hous, I'm still experiencing CPU temperature spikes in the 90c, but so far it hasn't hit the threshold of 100c causing a shutdown. Should I lower the temperature tip point? – haleksandre Nov 03 '21 at 17:09
  • @haleksandre Yes. Try 55000, or 50000. Monitor with my previous comment. You don't want to see throttling at normal usage. Note the minor edit in my .xml text. – heynnema Nov 03 '21 at 17:35
  • It looks like it helped making the laptop useful again when plugged in. Still have the occasional CPU spike temperature but they've became manageable. Thanks again! – haleksandre Nov 03 '21 at 20:10
  • @haleksandre I have similar CPU temp spikes. I think it's the Nvidia. – heynnema Nov 03 '21 at 22:06
  • I think it is the slow response time of thermald relative to the incredibly fast processor temperature rate of increase under step function load. The temperature overshoots before themald has time to respond. – Doug Smythies Nov 04 '21 at 15:26
  • @DougSmythies Yes. The thermald response time can be configured in the thermal-conf.xml file. See my answer at https://askubuntu.com/questions/1400361/optimize-thermal-daemon/1400418?noredirect=1#comment2429746_1400418 – heynnema Apr 03 '22 at 19:14
  • @heynnema : Yes, thanks for the comment. The point is that the temperature can ramp up very fast. I am liking the new TCC offset method, because it carries no kernel code overhead. – Doug Smythies Apr 04 '22 at 18:57
  • @DougSmythies Where can I read about the TCC offset method? – heynnema Apr 04 '22 at 23:12
  • @heynnema : Sorry, I thought you had seen my post, method 2. – Doug Smythies Apr 04 '22 at 23:31
  • @DougSmythies Complicated. My computer does have TCC offset. I don't have TurboStat. What does TCC offset have to do with thermald, if anything? – heynnema Apr 05 '22 at 00:12
  • Nothing to do with themald, but rather instead of thermald. @heynnema. – Doug Smythies Apr 05 '22 at 00:22