14

My Ubuntu 16.04 machine has 4 CPU cores, and one of them (which one exactly varies) always has a load of 90% to 100%.

This is true no matter what I'm doing, and even when I'm not doing anything at all, with no window open. It's happening right after boot or hours into work.

I've read the two popular related questions here and here, but these didn't help, unfortunately.

During the major part of my working time, my productivity is not affected by this problem. The only reasons I know about the problem then is that the fans are always working at their maximum power and Ubuntu's system monitoring says that one of the cores is under heavy load.

enter image description here

But perhaps 10% of my working time is affected by incredibly bad responsiveness, as an additional manifestation.

Especially the UI (during animations and when reacting to clicks) is super slow. That lead me to the idea that, perhaps, the CPU is doing the GPU's work as well. But that was probably an unfounded belief only, and the data below seems to contradict as well.

My concern is whether this problem, if I can't fix it, will have a (significant) impact on my computer's lifetime or not. I don't know what a constant load of >90% does to a CPU over months or years.

Anyway, here's the data that I could collect from my machine, which may be related or helpful:

top:

 PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND  
 415 root      20   0       0      0      0 R  97,3  0,0   1:39.30 kworker/2:2                                                                                       
2442 john      20   0  663828  38704  29852 S   3,3  0,5   0:00.90 gnome-terminal-                                                                                   
1194 root      20   0  335728  69900  48392 S   2,3  0,9   0:08.36 Xorg                                                                                              
1821 john      20   0 1423440 114660  77600 S   1,3  1,5   0:03.77 compiz                                                                                            
6 root         20   0       0      0      0 D   0,3  0,0   0:00.84 kworker/u8:0                                           

grep . -r /sys/firmware/acpi/interrupts/:

/sys/firmware/acpi/interrupts/sci:        36
/sys/firmware/acpi/interrupts/error:       0
/sys/firmware/acpi/interrupts/gpe00:       0   invalid
/sys/firmware/acpi/interrupts/gpe01:       0   invalid
/sys/firmware/acpi/interrupts/gpe02:       0   invalid
/sys/firmware/acpi/interrupts/gpe03:      36   enabled
/sys/firmware/acpi/interrupts/gpe04:       0   invalid
(...)
/sys/firmware/acpi/interrupts/gpe1F:       0   disabled
/sys/firmware/acpi/interrupts/sci_not:     0
/sys/firmware/acpi/interrupts/ff_pmtimer:  0   invalid
/sys/firmware/acpi/interrupts/ff_rt_clk:   0   disabled
/sys/firmware/acpi/interrupts/gpe_all:    36
/sys/firmware/acpi/interrupts/ff_gbl_lock: 0   enabled
/sys/firmware/acpi/interrupts/ff_pwr_btn:  0   enabled
/sys/firmware/acpi/interrupts/ff_slp_btn:  0   invalid

uname -a:

Linux my-host-name 4.4.0-47-generic #68-Ubuntu SMP Wed Oct 26 19:39:52 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

cat /proc/cmdline:

BOOT_IMAGE=/vmlinuz-4.8.0-28-generic.efi.signed root=/dev/mapper/ubuntu--vg-root ro quiet splash vt.handoff=7

lspci -v:

00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1576
    Subsystem: Hewlett-Packard Company Device 81f9
    Flags: bus master, fast devsel, latency 0

00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Device 1577
    Subsystem: Hewlett-Packard Company Device 81f9
    Flags: bus master, fast devsel, latency 0, IRQ 24
    Capabilities: <access denied>

00:01.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Carrizo (rev ca) (prog-if 00 [VGA controller])
    DeviceName: ATI EG BROADWAY
    Subsystem: Hewlett-Packard Company Carrizo
    Flags: bus master, fast devsel, latency 0, IRQ 227
    Memory at e0000000 (64-bit, prefetchable) [size=256M]
    Memory at f0800000 (64-bit, prefetchable) [size=8M]
    I/O ports at 4000 [size=256]
    Memory at f0500000 (32-bit, non-prefetchable) [size=256K]
    Expansion ROM at f0580000 [disabled] [size=128K]
    Capabilities: <access denied>
    Kernel driver in use: amdgpu
    Kernel modules: amdgpu

...

00:08.0 Encryption controller: Advanced Micro Devices, Inc. [AMD] Device 1578
    Subsystem: Hewlett-Packard Company Device 81f9
    Flags: bus master, fast devsel, latency 0, IRQ 255
    Memory at f0540000 (64-bit, prefetchable) [size=128K]
    Memory at f0300000 (32-bit, non-prefetchable) [size=1M]
    Memory at f0570000 (32-bit, non-prefetchable) [size=4K]
    Memory at f056a000 (32-bit, non-prefetchable) [size=8K]
    Capabilities: <access denied>

...

01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8101/2/6E PCI Express Fast/Gigabit Ethernet controller (rev 07)
    Subsystem: Hewlett-Packard Company RTL8101/2/6E PCI Express Fast/Gigabit Ethernet controller
    Flags: bus master, fast devsel, latency 0, IRQ 225
    I/O ports at 3000 [size=256]
    Memory at f0400000 (64-bit, non-prefetchable) [size=4K]
    Memory at f0100000 (64-bit, prefetchable) [size=16K]
    Capabilities: <access denied>
    Kernel driver in use: r8169
    Kernel modules: r8169

02:00.0 Network controller: Realtek Semiconductor Co., Ltd. RTL8723BE PCIe Wireless Network Adapter
    DeviceName: Sanji2
    Subsystem: Hewlett-Packard Company RTL8723BE PCIe Wireless Network Adapter
    Flags: bus master, fast devsel, latency 0, IRQ 231
    I/O ports at 2000 [size=256]
    Memory at f1000000 (64-bit, non-prefetchable) [size=16K]
    Capabilities: <access denied>
    Kernel driver in use: rtl8723be
    Kernel modules: rtl8723be

Can anybody help?

caw
  • 406
  • 1
  • 8
  • 30
  • 1
    So, what process or processes use that CPU core? It's very easy to check in the System monitor. – mikewhatever Nov 25 '16 at 10:30
  • Possible duplicate: http://askubuntu.com/questions/33640/kworker-what-is-it-and-why-is-it-hogging-so-much-cpu – AnotherKiwiGuy Nov 25 '16 at 10:33
  • @mikewhatever There's nothing to see. If I view the processes in the system monitor, every process has "0%" next to it, one or two have "1%", and gnome-system-monitor (which I'm just viewing) accounts for 4%. That's all. Apart from that, I did include the output from top in the question. – caw Nov 26 '16 at 02:18
  • 2
    @ThatGuy I specifically mentioned that question, along with one other question, in my description above. As I said, nothing from that discussion helped, unfortunately. Please see the third paragraph of my question for that part. – caw Nov 26 '16 at 02:20
  • To sum up, I think I've done my research and also collected as much information as I could. If there's anything else that could be helpful, I'd be happy to include that as well. – caw Nov 26 '16 at 02:21
  • 1
    I know. I'm only posting it here as part of the process. I can't offer a fix, but this way, when another person see the post, they can see at a glace if they can help. Good luck in finding a fix though. :) – AnotherKiwiGuy Nov 26 '16 at 02:48
  • Can you add the output of cat /proc/cmdline to your question? – WinEunuuchs2Unix Dec 02 '16 at 23:15
  • @WinEunuuchs2Unix Sure, thanks! It's BOOT_IMAGE=/vmlinuz-4.8.0-28-generic.efi.signed root=/dev/mapper/ubuntu--vg-root ro quiet splash vt.handoff=7. Please note that I've since switched from Ubuntu 16.04 to 16.10, where I could still re-produce the problem. – caw Dec 03 '16 at 08:33
  • @caw thanks for the update on 16.10 to explain kernel 4.8. Could you edit your question and include the output from free -t a couple minutes after fresh boot (without loading apps) and cron jobs, etc. complete? Then open some browsers, videso, etc. and run free -t a second time. Could you post both these scenarios please? Thanks. – WinEunuuchs2Unix Dec 05 '16 at 00:05
  • @WinEunuuchs2Unix I cannot verify this right now, but I know for sure that, at the multiple times when I triggered the bug, "Memory" and "Swap" were absolutely fine (in the low ranges) in Ubuntu's "System monitor". Looking at all those values in the monitor was the first thing I did, of course. The bug was exclusively CPU-related. – caw Dec 05 '16 at 01:59
  • @caw understood.... How much RAM and how much SWAP? – WinEunuuchs2Unix Dec 05 '16 at 02:21
  • @WinEunuuchs2Unix About 25% RAM (of 7,3 GiB) and 0% swap (of 7,5 GiB). – caw Dec 05 '16 at 17:35
  • @caw I know you said you tried some of the suggestions from the other questions, but specifically did you try the suggested backtrace method in http://askubuntu.com/a/421916/30304 ? – FCTW Dec 05 '16 at 21:12
  • 1
    It looks like it could be this bug: https://bugs.freedesktop.org/show_bug.cgi?id=97471 – Kim Phillips Dec 06 '16 at 07:42
  • @KimPhillips Thank you! This looks very similar, indeed. Running lshw -c video tells me that I have product: Carrizo by vendor: Advanced Micro Devices, Inc. [AMD/ATI] with configuration: driver=amdgpu latency=0. So that seems to be another similarity to the issue that you linked. There's also a duplicate there that mentions problems with HDMI-only (and built-in display switched off) explicitly, which is what I have. Strangely, I cannot really re-produce this on kernel version 4.8.0-28 right now, although I could definitely re-produce this on a fresh Ubuntu 16.10 (consistently). – caw Dec 06 '16 at 10:42
  • @caw I read one user with this problem discovered it was caused by USB device plugged in (webcam) and unplugging it made the problem go away. There are actually lots of kworker taking 100% CPU since Kernel 3.18 bug reports. – WinEunuuchs2Unix Dec 06 '16 at 12:45
  • Rather than [edit] your solution into your question why not write it up as an answer so that others with the same issue can benefit from your experience self answering is encouraged here. There's even a guide – Elder Geek Feb 09 '17 at 18:15
  • @ElderGeek Sorry, didn't know that this really qualified as an "answer", since it doesn't solve the problem but is merely the description of how to re-produce it consistently. When answering one's own question, the site even discourages this by saying: "Are you sure you want to answer your question?" And further: "Edit your question if you need to add more details." So that's what I did. But as per your request, I moved those details into an answer now. – caw Feb 10 '17 at 16:35
  • +1 I may be in the minority but I consider a workaround that preserves my hardware a useful answer. – Elder Geek Feb 10 '17 at 16:41

1 Answers1

4

I've been able to track down the precise cause of this problem, although that's not really a solution to the problem:

Doing a fresh re-install, changing settings one-by-one and installing packages one-by-one, it turned out that I could consistently re-produce (and even "toggle") the problem by setting "Built-in Display" to "Off" in the system settings.

I had an external monitor connected via HDMI, and in order to save power, I wanted to turn off the built-in display completely (which worked but caused high CPU load and slowed-down UI) instead of just mirroring it (which worked without any downsides).

In addition to the problem of high CPU load, there was a continuous, high-pitched but quiet, cheeping sound coming from the computer (laptop) when "Built-in Display" was turned off.

caw
  • 406
  • 1
  • 8
  • 30