Since updating to Ubuntu 18.10, CPU power management seems to be having problems. I think (not sure) it occurs when the laptop starts running a little hot, and the intel_pstate switches the CPUs to a lower speed. This happened in the past, but once things cooled down, the speed would switch higher again.
Now, since the update, it switches to a VERY low speed. For example, from cpufreq-info
:
analyzing CPU 7:
driver: intel_pstate
CPUs which run at the same hardware frequency: 7
CPUs which need to have their frequency coordinated by software: 7
maximum transition latency: 4294.55 ms.
hardware limits: 1.20 GHz - 3.30 GHz
available cpufreq governors: performance, powersave
current policy: frequency should be within 1.32 GHz and 1.32 GHz.
The governor "powersave" may decide which speed to use
within this range.
current CPU frequency is 353 MHz.
All 8 CPUs had speeds in the range 344 -- 373 MHz. These are "Intel(R) Core(TM) i7-3610QM CPU @ 2.30GHz" CPUs. The highest temperature i can find right now is 78°C, well below where speeds would normally pick back up.
Something seems to have changed between 18.04 and 18.10 that causes this over-enthusiastic power management. Any thoughts on where to look to configure this properly?
Edited to add:
rdmsr -a 0x19a
gives an output of
8
8
8
8
8
8
8
8
Note that this is after rebooting, which seems to be the only way to recover a normal CPU speed. If the slow down happens again, i'll see if it changes.
This is an HP laptop, if that matters. The links seem to refer to Dell laptops, and slow speeds after resuming from sleep. There was no sleep/suspend that caused the slow speeds in my case. It has also never happened before updating to 18.10.
Update: The slow-down has recurred, and i re-ran the rdmsr
command. The output is the same as when the laptop was running at a normal speed: the number 8 repeated 8 times.
If there's no clear cause for this slowing, is there at least some way to set the speeds back to normal short of rebooting each time? cpufreq-info
shows all 8 CPUs were set to "powersave". I can set the governer to "performance" with cpufreq-set -c 0 -g performance
for each CPU in turn, but the speeds stay locked in the 316 -- 357 MHz range.
Another update: I'm not sure if this has anything to do with the problem, but around the time the slow-down happened, there is a change in behaviour of thermald. In my syslog, i see lines similar to the following:
thermald[1401]: >>thd_cdev_set_state index:1 state:1 :3:0:0:2147483647 force:0
thermald[1401]: cdev index:1 consecutive call, increment exponentially state 11
thermald[1401]: Set : threshold:96000, temperature:96000, cdev:1(Processor), curr_state:10, max_state:10
thermald[1401]: >>thd_cdev_set_state index:1 state:0 :3:0:0:2147483647 force:0
thermald[1401]: match zone 3 trip 0 clamp_valid 0 clamp 2147483647
thermald[1401]: Erased [3: 0 2147483647
thermald[1401]: Set : threshold:96000, temperature:95000, cdev:1(Processor), curr_state:9, max_state:10
Prior to this, only cdev:9(intel_pstate)
was mentioned. In fact, the last line where it is, seems to indicate it maxed out and thermald switched to cdev:1(Processor)
:
thermald[1401]: >>thd_cdev_set_state index:9 state:1 :3:0:0:2147483647 force:0
thermald[1401]: cdev index:9 consecutive call, increment exponentially state 11
thermald[1401]: Set : threshold:96000, temperature:96000, cdev:9(intel_pstate), curr_state:10, max_state:10
thermald[1401]: >>thd_cdev_set_state index:1 state:1 :3:0:0:2147483647 force:0
thermald[1401]: Added zone 3 trip 0 clamp_valid 0 clamp 2147483647
thermald[1401]: Set : threshold:96000, temperature:97000, cdev:1(Processor), curr_state:1, max_state:10
Is there a chance the intel_pstate governor is not to blame, but thermald instead? It switched to a different cooling method, and didn't switch back / undo whatever throttling it used? How would i manually undo it to check? And what changed between bionic and cosmic so i can fix this?
One more update: the culprit is definitely thermald. I added intel_pstate=disable
to the linux command line in grub
. I had to put more load on my machine to get cdev:9(cpufreq)
to its maximum state of 12 (as opposed to 10 for intel_pstate), but as soon as it did, thermald switched to cdev:1(Processor)
again. The CPUs were again locked at a slow speed, though about 3x faster than with pstate. Nothing i tried would recover, even after the temperatures had long since gone back down.
Current work-around: i've removed the line
<CoolingDevice>Processor</CoolingDevice>
from /etc/thermald/thermal-cpu-cdev-order.xml
. Still no clue why this is now an issue since the update.
sudo rdmsr -a 0x19a
when the issue occurs? (dosudo modprobe msr
first)(Note; you need the msr-tools package forrdmsr
) – Doug Smythies Oct 30 '18 at 14:50