6

I have quite a modern Laptop, HP OMEN With GTX1070 8GB Graphics and Corei77700HQ. Latest drivers from graphics-drivers ppa. All games start fine, after playing for a few minutes the frame drops to 15 fps. Then again after some time it returns back to full fps which is more than 75. Then this cycle continues and I can't play. This is a consistent behavior with 18.04 series and any distribution based on 18.04. I have a GSYNC Enabled Laptop screen which is only 1080p. I do not see this issue on any other distribution except Ubuntu base. I hear people saying that it could be Nvidia, If it was nvidia then it should happen to 16.04 too which it does not.

Can anyone help me please?


75FPS:-

affected_cpus                             0
cpuinfo_max_freq                          3800000
cpuinfo_min_freq                          800000
cpuinfo_transition_latency                0
energy_performance_available_preferences  default performance balance_performance balance_power power 
energy_performance_preference             balance_performance
related_cpus                              0
scaling_available_governors               performance powersave
scaling_cur_freq                          899381
scaling_driver                            intel_pstate
scaling_governor                          powersave
scaling_max_freq                          3800000
scaling_min_freq                          800000
scaling_setspeed                          <unsupported>

15FPS:-

affected_cpus                             0
cpuinfo_max_freq                          3800000
cpuinfo_min_freq                          800000
cpuinfo_transition_latency                0
energy_performance_available_preferences  default performance balance_performance balance_power power 
energy_performance_preference             balance_performance
related_cpus                              0
scaling_available_governors               performance powersave
scaling_cur_freq                          800030
scaling_driver                            intel_pstate
scaling_governor                          powersave
scaling_max_freq                          2800000
scaling_min_freq                          800000
scaling_setspeed                          <unsupported>

Again.

75 FPS


affected_cpus                             0
cpuinfo_max_freq                          3800000
cpuinfo_min_freq                          800000
cpuinfo_transition_latency                0
energy_performance_available_preferences  default performance balance_performance balance_power power 
energy_performance_preference             balance_performance
related_cpus                              0
scaling_available_governors               performance powersave
scaling_cur_freq                          900004
scaling_driver                            intel_pstate
scaling_governor                          powersave
scaling_max_freq                          3800000
scaling_min_freq                          800000
scaling_setspeed                          <unsupported>
  • can you list the nvidia driver versions you are using in 16.04 & 18.04 along with the kernel versions you are using on 16.04 & 18.04? – WinEunuuchs2Unix Sep 03 '18 at 15:27
  • 16.04 was the 4.13 Kernel Series, 18.04 is the 4.15 Series, Drives used were 390 from Repositories and 396 from the graphics drives PPA. I see this issue even with 18.10 Daily builds. 16.04 had no problems 18.04 has been a disaster so far. – Rajat Pandita Sep 03 '18 at 16:17
  • 1
    4.15.0-33-generic causes me all kinds of grief. Can't suspend laptop. Extra ordinary keyboard lags typing in Firefox. Like 5 to 10 second input delay every 10 to 15 minutes (haven't timed it with watch). So I switched to 4.13.0-36-generic kernel for infinite improvement. Also for my GTX 970M I'm using nVidia proprietary driver 384.130. Quite happy overall now. – WinEunuuchs2Unix Sep 03 '18 at 16:20
  • So you are saying that using the specific kernel 4.13.0-36-generic and also nvidia 384.130. Did you face the same issues as I faced with 18.04? Frame rate drops etc? I am curious to understand what led you to try this combination? Did you arrive at this by trial and error or is there any bug reports etc which can be referenced? BTW the same issue happens with 4.17 on Comic 18.10 too. I don't believe it is kernel specific. – Rajat Pandita Sep 04 '18 at 07:37
  • 1
    I'll repeat the upgrade from 16.04 to 18.04 again soon and pay closer attention. The last time it converted nVidia 384.130 to 390.xxx I believe. I think there were problems that led me to revert to 384.130. Mind you I"m on GTX 970M which is a couple of years old. I don't think 4.13.0-36 exactly matters as much as the kernel chain 4.13.0-xx. – WinEunuuchs2Unix Sep 04 '18 at 10:07
  • Awesome..! Will give it a go tonight and see if it works. If it works the Bounty is all yours.. This is driving me mad.. I can't stay away from Ubuntu and I can't use it because my games don't work... I will owe you one of this works. – Rajat Pandita Sep 04 '18 at 12:52
  • I have the same problem, I don't think that the kernel version have influence on this issue, because I tested many kernel versions and recently I'm using 4.15.x-xx. – Marcos Silveira Sep 04 '18 at 14:14
  • Thanks for confirming my observation Marcos, I believe the same. I installed Ubuntu 18.04 tried different kernels and the issue stays. I am going to try low latency kernel now and see if that makes any difference. If not I guess opening a bug report may be on the cards then!. – Rajat Pandita Sep 04 '18 at 15:44
  • Lowlatency or not, The issue is still there. Can't get rid of it and it is driving me nuts! – Rajat Pandita Sep 04 '18 at 16:05
  • When you're operating at 75 FPS and at 15 FPS can you run a quick test in terminal? Use these two commands: CPU0_DIR=/sys/devices/system/cpu/cpu0/cpufreq to set variable. Then use: paste <(ls $CPU0_DIR) <(cat $CPU0_DIR/*) | column -s $'\t' -t to get CPU and governor status. Then copy and paste the output into your question with appropriate 75 / 15 FPS headings. I'm not sure low latency will make things better. It could complicate the diagnosis. – WinEunuuchs2Unix Sep 06 '18 at 02:58
  • I have measured the Kernel Frequency and there are no drops at all. Governor is ondemand. I changed governor to performance and still no go. I also used nvidia-smi to obtain GPU Clock speeds and Memory usage. I see no drops or anything. I will do these tests again and upload all observations here. I will include the results for the commands you have asked too. – Rajat Pandita Sep 07 '18 at 06:04
  • Updated kernel frequency details as requested – Rajat Pandita Sep 07 '18 at 14:14
  • With some more research:- Disabled Intel_pstate with no luck, I can clearly see now there is throttling, However I Think it may be the version of xorg server in Ubuntu, I played game in windowed mode, CPU Frequency Drops and then Game frame rates drop too, I minimize the Game window and then maximize and boom.. Frames are back. What is this crazy behaviour! – Rajat Pandita Sep 07 '18 at 14:38
  • And CPU goes back to full frequency Not able to understand where the throttling is coming from – Rajat Pandita Sep 07 '18 at 14:40
  • It is the intel_pstate which is crapping out for some reason. It drops the CPU Frequency to Minimum. I tried setting up governor forcefully to performance and it still does it. Tried compiling kernel directly from kernel.org and nvidia drivers from nvidia website. Same issue. Someone needs to fix Intel_pstate driver – Rajat Pandita Sep 07 '18 at 16:52
  • From all of the outputs above, they say the cpu governor is on powersave mode. How did you set the governor to performance, it looks like it didn't work. Make sure you're already in performance mode. – aasril Sep 08 '18 at 01:30
  • Yes I am in performance mode.. cpupower was used to confirm. I used i7z to monitor frequency and it drops to 800Mhz for 10-15 seconds and that is the time where frames drop too. Then it again runs at full speed and frame rates build up again. – Rajat Pandita Sep 08 '18 at 03:40
  • looks like a Thermald/Overheating issue. However the threshold for Thermald to consider clamping the CPU cycles is way lower than other distros. – Rajat Pandita Sep 08 '18 at 12:11
  • Yes it thermald. I can confirm. I have managed to resolve this by sudo systemctl disable thermald && sudo reboot. – Rajat Pandita Sep 08 '18 at 12:56

1 Answers1

5

After all this troubleshooting, I concluded that it is indeed overheating as suggested by Alan and Martin in Ubuntu Podcast Season 11 Episode 26. In Ubuntu 18.04 the thermald Daemon is enabled by default for maintaining the CPU Temperature. This should not impact generally as the danger thresholds are quite high. However in my particular case it was throttling the CPU to lowest frequency based on CPU Temperature, Which is expected. I think it is a bug with thermald. I have since tried thermald in other distributions on the same hardware and i could not reproduce the bug. So i decided to come back to Ubuntu again and this time disable thermald. It is now fixed. Thanks a lot to the ask Ubuntu community for their support and special shout out to Alan Pope and Martin Wimpress.

Solution:-

sudo systemctl disable thermald && sudo reboot.

P.S Note, Disabling thermald will let the temperatures to be unchecked, Please use this with caution. Last think we want is to fry someone's hardware. Thermald is there for a reason. Unless you are affected by this issue i do not recommend this solution.

Ever since Ubuntu Podcast Season11 Episode 28, Where Alan and Martin again read my answer and suggested me to file a bug report against thermald, I have been investigating the reasons for CPU Throttling in greater detail. I have seen occasional Throttling even after I disabled thermald. Courtesy of the gnome shell extension cpufreq, I was able to see that irqbalance was throttling my CPU and causing the Frame rate drops.

To prove this I enabled thermald and removed irqbalance package. Now, I get even higher FPS and there is no throttling at all.

I have been looking at the package irqbalance

Helpful Resource for irqbalance :- http://konkor.github.io/cpufreq/faq/

It states :-

irqbalance is not a part of the Linux kernel

It designed for special server configurations with many RAID/HDD/SDD controllers.

Any user-space application (like games, compilation…) can not get 100% of CPU resources on any thread because it’s always sharing this resources with IO tasks.

I think I can now consider this resolved

sudo apt remove irqbalance    

This is the actual solution!

  • Glad you found the answer. I was actually thinking of overheating causing slowdown yesterday. – WinEunuuchs2Unix Sep 08 '18 at 21:46
  • Well, Not sure what changed, However with the latest nvidia-430 Drivers issue is completely gone. Did a fresh install of KDE NEON which is based on 18.04.2 and all is working as expected. Not sure what to say. – Rajat Pandita Jun 10 '19 at 11:40