Problem
On normal occasions when xorg
and compiz
is running in my gpu, I
can Suspend
peacefully. However if I run some intense (90% GPU
in use) training (via jupyter) related to pytorch
, and subsequently
suspend after the processes are over, it refuses to sleep/wakeup.
I am positive GPU being full or not empty is causing the issue. I
don't know why "some process" possibly related to the GPU is not
Suspending. When I run jupyter
and run 1+1
(or a simple process)
and Suspend
, then also no issues.
Question
Kernlog shows me nothing "fishy". I have tried a bunch of online remedies. Now at a dead end.
How do I identify what is happening? any ideas?
Other symptoms
It sort-of sleeps but I still hear some sound from the laptop when I hit a key (it sounds as if it is booting up). And then blank screen after that. Sometimes I get to go to the TTY but can't type anything.
My system
- Ubuntu 16.04
- Nvidia 1050 GeForce
- Acer nitro 5 8gb ram
What all I tried to rectify this issue?
Spent a good 5 full days understanding and searching and re-installing etc... Now at a dead end.
Checked the kern logs (pastebin link) but didn't see anything "fishy". (at
02:08
I start sleeping and at10:21
I hit hard reset).Here is a tiny exerpt:
Oct 2 02:08:06 eghx-nitro NetworkManager[8152]: <info> [1601597286.6443] manager: sleep requested (sleeping: no enabled: yes)
Oct 2 02:08:06 eghx-nitro NetworkManager[8152]: <info> [1601597286.6443] manager: sleeping...
Oct 2 02:08:06 eghx-nitro NetworkManager[8152]: <info> [1601597286.6447] manager: NetworkManager state is now ASLEEP
Oct 2 02:08:06 eghx-nitro NetworkManager[8152]: <info> [1601597286.6453] device (wlp2s0): state change: activated -> deactivating (reason 'sleeping') [100 110 37]
Oct 2 02:08:06 eghx-nitro NetworkManager[8152]: <info> [1601597286.8169] device (wlp2s0): state change: deactivating -> disconnected (reason 'sleeping') [110 30 37]
Oct 2 02:08:06 eghx-nitro NetworkManager[8152]: <info> [1601597286.8356] dhcp4 (wlp2s0): canceled DHCP transaction, DHCP client pid 8328
Oct 2 02:08:06 eghx-nitro NetworkManager[8152]: <info> [1601597286.8356] dhcp4 (wlp2s0): state changed bound -> done
Oct 2 02:08:06 eghx-nitro NetworkManager[8152]: <info> [1601597286.8363] dns-mgr: Writing DNS information to /sbin/resolvconf
Oct 2 02:08:06 eghx-nitro kernel: [24100.153393] wlp2s0: deauthenticating from e8:cc:18:41:3c:15 by local choice (Reason: 3=DEAUTH_LEAVING)
Oct 2 02:08:07 eghx-nitro NetworkManager[8152]: <warn> [1601597287.0509] sup-iface[0xb4a6f0,wlp2s0]: connection disconnected (reason -3)
Oct 2 02:08:07 eghx-nitro NetworkManager[8152]: <info> [1601597287.0511] device (wlp2s0): supplicant interface state: completed -> disconnected
Oct 2 02:08:07 eghx-nitro NetworkManager[8152]: <info> [1601597287.0525] device (wlp2s0): state change: disconnected -> unmanaged (reason 'sleeping') [30 10 37]
Oct 2 02:08:08 eghx-nitro kernel: [24101.983885] PM: suspend entry (deep)
Oct 2 02:08:09 eghx-nitro kernel: [24101.983888] PM: Syncing filesystems ... done.
Oct 2 10:21:32 eghx-nitro kernel: [24103.953554] Freezing user space
processes ... (elapsed 0.002 seconds) done.
Based on Nvidia forum added the following to grub and updated.
GRUB_CMDLINE_LINUX_DEFAULT="quiet acpi_rev_override=1 acpi_osi=Linux scsi_mod.use_blk_mq=1 nouveau.modeset=0 nouveau.runpm=0 mem_sleep_default=deep"
Added the following to iniramfs-tools/modules and updated.
nvidia nvidia_modeset nvidia_uvm nvidia_drm
Didn't change kernel as there was no evidence towards it. People changed to 4.17. Mine is currently 4.15.
Blind try: Trying different (Suspend)s
systemctl suspend
pm-suspend
Tried downgrading the drivers to 384 from 430 with changing it at
additional drivers
. This was not useful as this was not capable of co-existing withpytorch=1.6.0
Complete remove and re-install of
nvdia-430
as per here:
purge
,add-apt-repository ppa:graphics-drivers/ppa
,update
andautoinstall
.This ended in the black screen of death. Recovered it with
noveau.modeset=0
. Somehow GPU was not working anymore.At this point did a complete re-install of
xserver
,unity
,lightdm
andnvidia-430
over tty terminal before login screen.This recovered the system to it's previous state i.e.,
suspend
when GPU full hangs the system.