1

I'm quite new to the beautiful word of Ubuntu, really. This is my first question, so excuse me but I can not provide the right number of details, because I don't know the right command.

The problem:

Start PC process: no problems

First Suspend: no problems

Second Suspend: I'm not able to put the pc in sleep mode anymore and the screen first remain black and nothing seems to working, then after 20s less or more the pc start working but from this time to the next shutdown sleep mode doesn't work anymore. Sometimes i've also some mouse freeze-problems.

I've tried different kernel version and nothing changed. I think the problem is relative to the dual GPU card system. Thanks in advance to everyone and sorry for the lack of details.

Notebook: HP Zbook 15u G5, dual gpu (intel and AMD Pro WX3100).

Result of command lspci -nn | grep -E 'VGA|Display'

00:02.0 VGA compatible controller [0300]: Intel Corporation UHD Graphics 620 [8086:5917] (rev 07) 01:00.0 Display controller [0380]: Advanced Micro Devices, Inc. [AMD/ATI] Lexa XT [Radeon PRO WX 3100] [1002:6985]

Ubuntu: 18.04

kernel: 4.18.10-041810-generic

Isacco
  • 11

1 Answers1

0

Same here ! Same machine as you (HP Zbook G5). Output of lspci :

00:02.0 VGA compatible controller [0300]: Intel Corporation UHD Graphics 620 [8086:5917] (rev 07)
01:00.0 Display controller [0380]: Advanced Micro Devices, Inc. [AMD/ATI] Polaris12 [1002:6985]

Kernel : 4.15.0-36-generic. I'm running Mint 19, based on Ubuntu 18.04 LTS Bionic.

I've noticed also that lspci will block (and impossible to kill) after the first suspend.

Resuming has some traces in syslog around amdgpu:

Oct 13 10:57:36 TIX-02 kernel: [  114.169456] [drm] PCIE GART of 256M enabled (table at 0x000000F400040000).
Oct 13 10:57:36 TIX-02 kernel: [  114.211766] e1000e: enp0s31f6 NIC Link is Down
Oct 13 10:57:36 TIX-02 kernel: [  114.214529] IPv6: ADDRCONF(NETDEV_UP): enp0s31f6: link is not ready
Oct 13 10:57:36 TIX-02 kernel: [  114.225226] BUG: unable to handle kernel paging request at ffffb57f01b08fec
Oct 13 10:57:36 TIX-02 kernel: [  114.225269] IP: smu7_populate_single_firmware_entry.isra.6+0x5b/0xe0 [amdgpu]
Oct 13 10:57:36 TIX-02 kernel: [  114.225271] PGD 45ed48067 P4D 45ed48067 PUD 0 
Oct 13 10:57:36 TIX-02 kernel: [  114.225275] Oops: 0002 [#1] SMP PTI
Oct 13 10:57:36 TIX-02 kernel: [  114.225276] Modules linked in: rfcomm pci_stub vboxpci(O) vboxnetadp(O) vboxnetflt(O) vboxdrv(O) bnep binfmt_misc nls_iso8859_1 arc4 snd_hda_codec_hdmi snd_hda_codec_conexant snd_hda_codec_generic snd_soc_skl snd_soc_skl_ipc snd_hda_ext_core snd_soc_sst_dsp snd_soc_sst_ipc snd_soc_acpi snd_soc_core snd_compress ac97_bus snd_pcm_dmaengine hid_multitouch intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass snd_hda_intel crct10dif_pclmul snd_hda_codec snd_hda_core crc32_pclmul snd_hwdep ghash_clmulni_intel pcbc snd_pcm snd_seq_midi snd_seq_midi_event iwlmvm mac80211 snd_rawmidi aesni_intel btusb btrtl aes_x86_64 btbcm crypto_simd glue_helper btintel cryptd intel_cstate bluetooth idma64 virt_dma snd_seq snd_seq_device snd_timer ecdh_generic intel_rapl_perf iwlwifi uvcvideo
Oct 13 10:57:36 TIX-02 kernel: [  114.225346] CPU: 2 PID: 61 Comm: kworker/2:1 Tainted: G           O     4.15.0-36-generic #39-Ubuntu
Oct 13 10:57:36 TIX-02 kernel: [  114.225347] Hardware name: HP HP ZBook 15u G5/83B2, BIOS Q78 Ver. 01.03.00 07/18/2018
Oct 13 10:57:36 TIX-02 kernel: [  114.225351] Workqueue: pm pm_runtime_work
Oct 13 10:57:36 TIX-02 kernel: [  114.225383] RIP: 0010:smu7_populate_single_firmware_entry.isra.6+0x5b/0xe0 [amdgpu]
Oct 13 10:57:36 TIX-02 kernel: [  114.225384] RSP: 0018:ffffb56301b6fb98 EFLAGS: 00010246
Oct 13 10:57:36 TIX-02 kernel: [  114.225386] RAX: 0000000000000089 RBX: ffffb57f01b08fec RCX: 0000000000534000
Oct 13 10:57:36 TIX-02 kernel: [  114.225387] RDX: ffffffffc08be38d RSI: 0000000000000000 RDI: ffff9ece9c4a4cc0
Oct 13 10:57:36 TIX-02 kernel: [  114.225389] RBP: ffffb56301b6fbe8 R08: 000000000003fa80 R09: ffffb56301b6fbcc
Oct 13 10:57:36 TIX-02 kernel: [  114.225390] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000003
Oct 13 10:57:36 TIX-02 kernel: [  114.225391] R13: ffff9ece9ed44010 R14: ffff9ece8fd02000 R15: 000000000000047e
Oct 13 10:57:36 TIX-02 kernel: [  114.225393] FS:  0000000000000000(0000) GS:ffff9eceaf480000(0000) knlGS:0000000000000000
Oct 13 10:57:36 TIX-02 kernel: [  114.225394] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 13 10:57:36 TIX-02 kernel: [  114.225396] CR2: ffffb57f01b08fec CR3: 0000000166e0a005 CR4: 00000000003606e0
Oct 13 10:57:36 TIX-02 kernel: [  114.225397] Call Trace:
Oct 13 10:57:36 TIX-02 kernel: [  114.225427]  smu7_request_smu_load_fw+0xb7/0x340 [amdgpu]
Oct 13 10:57:36 TIX-02 kernel: [  114.225454]  polaris10_start_smu+0xdd/0x220 [amdgpu]
Oct 13 10:57:36 TIX-02 kernel: [  114.225484]  pp_resume+0x49/0xb0 [amdgpu]
Oct 13 10:57:36 TIX-02 kernel: [  114.225510]  amdgpu_pp_resume+0x25/0x30 [amdgpu]
Oct 13 10:57:36 TIX-02 kernel: [  114.225529]  amdgpu_resume_phase2+0x4b/0xc0 [amdgpu]
Oct 13 10:57:36 TIX-02 kernel: [  114.225549]  amdgpu_device_resume+0x15f/0x3d0 [amdgpu]
Oct 13 10:57:36 TIX-02 kernel: [  114.225551]  ? __pci_set_master+0x34/0xe0
Oct 13 10:57:36 TIX-02 kernel: [  114.225555]  ? vga_switcheroo_set_dynamic_switch+0x80/0x80
Oct 13 10:57:36 TIX-02 kernel: [  114.225573]  amdgpu_pmops_runtime_resume+0x76/0xc0 [amdgpu]
Oct 13 10:57:36 TIX-02 kernel: [  114.225576]  pci_pm_runtime_resume+0x7b/0xb0
Oct 13 10:57:36 TIX-02 kernel: [  114.225579]  vga_switcheroo_runtime_resume+0x53/0x60
Oct 13 10:57:36 TIX-02 kernel: [  114.225580]  __rpm_callback+0xca/0x210
Oct 13 10:57:36 TIX-02 kernel: [  114.225583]  ? vga_switcheroo_set_dynamic_switch+0x80/0x80
Oct 13 10:57:36 TIX-02 kernel: [  114.225585]  rpm_callback+0x24/0x80
Oct 13 10:57:36 TIX-02 kernel: [  114.225587]  ? vga_switcheroo_set_dynamic_switch+0x80/0x80
Oct 13 10:57:36 TIX-02 kernel: [  114.225589]  rpm_resume+0x4e1/0x7d0
Oct 13 10:57:36 TIX-02 kernel: [  114.225591]  pm_runtime_work+0x55/0xa0
Oct 13 10:57:36 TIX-02 kernel: [  114.225593]  process_one_work+0x1de/0x410
Oct 13 10:57:36 TIX-02 kernel: [  114.225595]  worker_thread+0x32/0x410
Oct 13 10:57:36 TIX-02 kernel: [  114.225598]  kthread+0x121/0x140
Oct 13 10:57:36 TIX-02 kernel: [  114.225600]  ? process_one_work+0x410/0x410
Oct 13 10:57:36 TIX-02 kernel: [  114.225602]  ? kthread_create_worker_on_cpu+0x70/0x70
Oct 13 10:57:36 TIX-02 kernel: [  114.225605]  ret_from_fork+0x35/0x40
Oct 13 10:57:36 TIX-02 kernel: [  114.225607] Code: 00 48 89 45 e0 31 c0 f3 48 ab 49 8b 7d 00 89 f0 0f b6 b0 a0 55 9b c0 48 8b 07 48 8b 40 70 e8 2d dd 72 ca 85 c0 75 48 0f b7 45 b2 <66> 44 89 23 48 c7 43 0c 00 00 00 00 66 89 43 02 48 8b 45 c0 48 
Oct 13 10:57:36 TIX-02 kernel: [  114.225659] RIP: smu7_populate_single_firmware_entry.isra.6+0x5b/0xe0 [amdgpu] RSP: ffffb56301b6fb98
Oct 13 10:57:36 TIX-02 kernel: [  114.225661] CR2: ffffb57f01b08fec
Oct 13 10:57:36 TIX-02 kernel: [  114.225662] ---[ end trace 13979f48dea591d0 ]---

The subsequent suspend fails because of a pending Working Queue :

Oct 13 11:00:26 TIX-02 kernel: [  283.318936] Freezing of tasks failed after 20.010 seconds (0 tasks refusing to freeze, wq_busy=1):
Oct 13 11:00:26 TIX-02 kernel: [  283.318944] Showing busy workqueues and worker pools:
Oct 13 11:00:26 TIX-02 kernel: [  283.318958] workqueue pm: flags=0x4
Oct 13 11:00:26 TIX-02 kernel: [  283.318967]   pwq 6: cpus=3 node=0 flags=0x0 nice=0 active=0/0
Oct 13 11:00:26 TIX-02 kernel: [  283.318990]     delayed: pm_runtime_work
Oct 13 11:00:26 TIX-02 kernel: [  283.319061]   pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=2/0
Oct 13 11:00:26 TIX-02 kernel: [  283.319081]     in-flight: 2481:pm_runtime_work pm_runtime_work
Oct 13 11:00:26 TIX-02 kernel: [  283.319109] workqueue writeback: flags=0x4e
Oct 13 11:00:26 TIX-02 kernel: [  283.319115]   pwq 16: cpus=0-7 flags=0x4 nice=0 active=0/0
Oct 13 11:00:26 TIX-02 kernel: [  283.319133]     delayed: wb_workfn
Oct 13 11:00:26 TIX-02 kernel: [  283.319177] pool 4: cpus=2 node=0 flags=0x0 nice=0 hung=12s workers=5 idle: 676 2036 23 481
Oct 13 11:00:26 TIX-02 kernel: [  283.319334] Restarting kernel threads ... done.
Oct 13 11:00:26 TIX-02 kernel: [  283.319837] OOM killer enabled.

So globally ampgpu and resume do not work well together ... It is also mentioned here: https://www.linuxquestions.org/questions/linux-laptop-and-netbook-25/suspend-resume-dosen%27t-work-on-hybrid-graphic-laptop-4175627735/ but solution was a workaround (hibernate instead of suspend).

I've removed firmwares for Polaris10 and Polaris11, from /lib/firmware/amdgpu: same behavior.

My amdgpu driver is the X.org one, version 18.0.1-1; which means I'm not on the latest version, which is 18.1.0.

I first tried to install the latest AMD driver, tricking the install script (os-release mint/ubuntu issue). But when restarting Cinnamon went in fallback mode, and suspend behaviour was the same: so I uninstalled the AMD driver...

Then, I tried the solution described here. But this is worse: when resuming the system, the display is frozen, no mouse or keyboard response, even Magis Sysrq does nothing.

Then ... got latest ubuntu 18.1.0-1 xserver-xorg-video-amdgpu package sources, compiled it and installed it: no change!

Got latest Ubuntu linux-firmware package, installed it: no change.

git clone git://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git and put Polaris12 firmware in /lib/firmware ... problem is still there.

This seems to be related to the kernel itself, or amdgpu kernel driver or DRM.

I've installed some kernels through UKUU:

  1. 4.18.14 works fine! Try it! (for me not OK, I have virtualbox 5.2.10 and vboxdrv module does not compile on 4.18.14)
  2. Same for 4.14.76! (and OK for me for virtualbox 5.2.10)
  3. 4.4.161 will not boot