2

Ok, first, english is not my native language, so I apologize for any poor phrasing.

Second, I'm still learning about Linux, Ubuntu 18.04 to be more specific. This is the first time I use this OS and my knowledge about terminal, commands, packages is still very basic. And the things I CAN do I'm not sure I understand them entirely. Patience, please.

So, to the problem. I decided to try some gaming recently and to my surprise some games I could play before on W10 are unplayable duo to some performance issue. These games start at 35-45 fps and all of sudden drop to 1-15 fps (yes, ONE fps!).

At first I thought it could be something related to the gpu, drivers maybe, but no, I got the latest drivers from some ppa. Messed with the graphical quality of the games, but I started to notice that only the more cpu demanding were having this trouble.

Then I started searching ways to keep track of cpu usage. The system monitor wasn't enough, so I found some watch commands that track cpu speed and cpu temperature. I had to install lm-sensors for the temperature.

Finally, it appears the thinness of my laptop is making it overheat and then the cpu throttles, making these abismal drops in fps. I concluded this based on sudden drops in cpu speed as the temperatures were getting high. But I actually can't say if the temperatures I got were THAT high, the maximum I got was around 80°C on the CPU. And also, the fan appears to be working properly, it got around 5000 RPM.

To improve this I tried changing the govern parameter of cpufreq from powersave to performance. Although it didn't fix the performance drop, I noticed some improvement. The CPU speed dropped to 1600 MHz instead of 600 Mhz. Then it got me wondering if I should set a minimum CPU frequency or deactivate the scalling. But I fear it could led to an overheat and then melting it. I could also try some cooler support, but I'm not sure about the efficiency of those.

So, can anyone shed some light in this?

sudo lshw -c cpu

   descrição: CPU
   produto: Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz
   fabricante: Intel Corp.
   ID físico: 36
   informações do barramento: cpu@0
   versão: Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz
   serial: To Be Filled By O.E.M.
   slot: U3E1
   tamanho: 2228MHz
   capacidade: 3100MHz
   largura: 64 bits
   clock: 100MHz

lspci | grep -i VGA

00:02.0 VGA compatible controller: Intel Corporation HD Graphics 620 (rev 02)

lspci | grep -i 3D

01:00.0 3D controller: NVIDIA Corporation GM108M [GeForce 940MX] (rev a2)

xandr | grep connected

eDP-1-1 connected primary 1920x1080+0+0 (normal left inverted right x axis y axis) 309mm x 174mm

tomaz@tomaz-Inspiron-7460:~$ stress-ng -t 5m -v --tz -c 4

stress-ng: debug: [15960] 4 processors online, 4 processors configured
stress-ng: info:  [15960] dispatching hogs: 4 cpu
stress-ng: debug: [15960] cache allocate: default cache size: 3072K
stress-ng: debug: [15960] starting stressors
stress-ng: debug: [15961] stress-ng-cpu: started [15961] (instance 0)
stress-ng: debug: [15962] stress-ng-cpu: started [15962] (instance 1)
stress-ng: debug: [15960] 4 stressors spawned
stress-ng: debug: [15961] stress-ng-cpu using method 'all'
stress-ng: debug: [15964] stress-ng-cpu: started [15964] (instance 3)
stress-ng: debug: [15963] stress-ng-cpu: started [15963] (instance 2)
stress-ng: debug: [15964] stress-ng-cpu using method 'all'
stress-ng: debug: [15963] stress-ng-cpu using method 'all'
stress-ng: debug: [15962] stress-ng-cpu using method 'all'
stress-ng: debug: [15961] stress-ng-cpu: exited [15961] (instance 0)
stress-ng: debug: [15960] process [15961] terminated
stress-ng: debug: [15963] stress-ng-cpu: exited [15963] (instance 2)
stress-ng: debug: [15962] stress-ng-cpu: exited [15962] (instance 1)
stress-ng: debug: [15960] process [15962] terminated
stress-ng: debug: [15960] process [15963] terminated
stress-ng: debug: [15964] stress-ng-cpu: exited [15964] (instance 3)
stress-ng: debug: [15960] process [15964] terminated
stress-ng: info:  [15960] successful run completed in 300.05s (5 mins, 0.05 secs)
stress-ng: info:  [15960] cpu:
stress-ng: info:  [15960]          pch_skylake   59.25 °C
stress-ng: info:  [15960]                 B0D4   62.12 °C
stress-ng: info:  [15960]      INT3400 Thermal   48.08 °C
stress-ng: info:  [15960]                 SEN2   50.81 °C
stress-ng: info:  [15960]                 TMEM   50.15 °C
stress-ng: info:  [15960]         x86_pkg_temp   54.88 °C
stress-ng: info:  [15960]               acpitz   50.61 °C
stress-ng: info:  [15960]                 SEN1   50.78 °C
  • what is the output of lspci | grep -i VGA (shows vendor of graphics chip) and xrandr | grep connected (shows info about your displays used) – knb Nov 04 '18 at 10:03
  • @knb I'll add to the post. – Tomaz Pablo Nov 04 '18 at 14:03
  • remember water boils at 100°C so when it hits 80°C or before it will automatically throttle back to avoid a meltdown - you probably knew this however I mention it to help others – Scott Stensland Nov 04 '18 at 21:42
  • Processor auto throttle should be above 80. Turbostat, without the --guiet, should tell you if the proc_hot bit has been set. – Doug Smythies Nov 04 '18 at 21:47
  • @DougSmythies Couldn't find proc_hot, but I found this: cpu0: MSR_IA32_TEMPERATURE_TARGET: 0x02640000 (100 C) and this: cpu0: MSR_IA32_PACKAGE_THERM_INTERRUPT: 0x00000003 (100 C, 100 C) – Tomaz Pablo Nov 04 '18 at 22:05

1 Answers1

0

You should be able to limit the upper CPU frequency, to for example 65%, using this command:

echo 65 | sudo tee /sys/devices/system/cpu/intel_pstate/max_perf_pct

The above assumes you are using the intel_pstate CPU frequency scaling driver, which you should be by default. To check:

doug@s15:~/temp$ cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_driver
intel_pstate
intel_pstate
intel_pstate
intel_pstate
intel_pstate
intel_pstate
intel_pstate
intel_pstate

A very good tool to use to monitor things is turbostat, which is included in the linux-tools-common package.

Below is an example where my computer is heavily loaded and the processor package temperature is has reached it highest point. In another terminal, I then limit the CPU frequency (shown further down) and you can observe the frequency, temperature and wattage drop:

doug@s15:~/temp$ sudo turbostat --quiet --Summary --show Busy%,Bzy_MHz,PkgTmp,PkgWatt --interval 15
Busy%   Bzy_MHz PkgTmp  PkgWatt
100.00  3500    79      63.91
100.00  3500    78      63.91
100.00  3500    78      63.91
100.00  3500    78      63.88
100.00  3500    79      63.89
100.00  3500    79      63.90
100.00  2755    70      45.86
100.00  2500    68      39.42
100.00  2500    67      39.26
100.00  2500    66      39.10
100.00  2500    65      39.07
100.00  2500    65      38.94
100.00  2500    64      38.93
100.00  2500    65      38.92

What was done in another terminal:

doug@s15:~$ cat /sys/devices/system/cpu/intel_pstate/max_perf_pct
100
doug@s15:~$ echo 65 | sudo tee /sys/devices/system/cpu/intel_pstate/max_perf_pct
65
doug@s15:~$ cat /sys/devices/system/cpu/intel_pstate/max_perf_pct
65

Note: the 2500 MHz CPU frequency is the nearest 100 MHz rounded value (nearest pstate of 25) to 3800 * 0.65. But hey, turbostat showed 3500 MHz before. Why? Because all cores were busy, and therefore the maximum CPU frequency was limited to 3500 MHZ internal to the processor itself. This information is also available via turbostat by not using the quiet directive. Example:

doug@s15:~/temp$ sudo turbostat --Summary --show Busy%,Bzy_MHz,PkgTmp,PkgWatt --interval 15
... [snip]...
cpu4: MSR_PLATFORM_INFO: 0x100070012200
16 * 100.0 = 1600.0 MHz max efficiency frequency
34 * 100.0 = 3400.0 MHz base frequency
cpu4: MSR_IA32_POWER_CTL: 0x0004005d (C1E auto-promotion: DISabled)
cpu4: MSR_TURBO_RATIO_LIMIT: 0x23242526
35 * 100.0 = 3500.0 MHz max turbo 4 active cores
36 * 100.0 = 3600.0 MHz max turbo 3 active cores
37 * 100.0 = 3700.0 MHz max turbo 2 active cores
38 * 100.0 = 3800.0 MHz max turbo 1 active cores
...[snip]...

There are many ways to limit the CPU temperatures to less than extreme values automatically. One is with thermald, and I have an example configuration script in another answer.

Doug Smythies
  • 15,448
  • 5
  • 44
  • 61
  • I used 80% and 70% but still cpu throttles. And turbostat showed a maximum Busy% of 50. I'm monitoring gpu through Nvidia X server settings and clock speed was stable. Maybe this laptop just has cooling problems, turbostat showed temperatures around 95°C. – Tomaz Pablo Nov 04 '18 at 17:31
  • As a test, keep dropping the maximum CPU frequency %, until your temperature definitely doesn't approach critical. It might be that your root issue is graphics power, and if so I do not know anything about it. – Doug Smythies Nov 04 '18 at 17:43
  • I do not know if it will give more insight but you could also try sudo turbostat --quiet --Summary --show Busy%,Bzy_MHz,PkgTmp,PkgWatt,GFXMHz,GFXWatt --interval 15 – Doug Smythies Nov 04 '18 at 17:51
  • I dropped as far as 50% and CPU is still throttling. Temperature always below 80°C, what makes me think that overheating isn't the problem. GFXMHz and GFXWatt showed 300 and 0, and never changed, maybe this command doesn't support my GPU. It could simply be a poor game port to Linux, but I noticed this same thing in more than a few games. – Tomaz Pablo Nov 04 '18 at 19:26
  • How could I stress test the CPU? It could determine if the problem is CPU or not, right? – Tomaz Pablo Nov 04 '18 at 19:28
  • O.K. you are using external graphics, which turbostat doesn't know about (Nvidia is mentioned above). If the temperature was always below 80 degrees then it is not clear to me why the CPU is throttling. Do you have TLP installed and running? If so, maybe it is kicking in. There are many ways to stress the CPUs. One way is to spin out a bunch of yes > /dev/null & threads. When you want to end it all do killall yes. – Doug Smythies Nov 04 '18 at 19:58
  • Yes, it's a geforce 940mx. And yes, I installed TLP, but I had this problem even before. Actually I installed it believing it could help, but I couldn't understand how to use it. I just installed stress-ng and performed a test. I'll add it to the question. – Tomaz Pablo Nov 04 '18 at 20:17