2

Problem

On normal occasions when xorg and compiz is running in my gpu, I can Suspend peacefully. However if I run some intense (90% GPU in use) training (via jupyter) related to pytorch, and subsequently suspend after the processes are over, it refuses to sleep/wakeup.

I am positive GPU being full or not empty is causing the issue. I don't know why "some process" possibly related to the GPU is not Suspending. When I run jupyter and run 1+1 (or a simple process) and Suspend, then also no issues.

Question

Kernlog shows me nothing "fishy". I have tried a bunch of online remedies. Now at a dead end.

How do I identify what is happening? any ideas?

Other symptoms

It sort-of sleeps but I still hear some sound from the laptop when I hit a key (it sounds as if it is booting up). And then blank screen after that. Sometimes I get to go to the TTY but can't type anything.


My system

  • Ubuntu 16.04
  • Nvidia 1050 GeForce
  • Acer nitro 5 8gb ram

What all I tried to rectify this issue?

Spent a good 5 full days understanding and searching and re-installing etc... Now at a dead end.

  1. Checked the kern logs (pastebin link) but didn't see anything "fishy". (at 02:08 I start sleeping and at 10:21 I hit hard reset).

    Here is a tiny exerpt:

Oct  2 02:08:06 eghx-nitro NetworkManager[8152]: <info>  [1601597286.6443] manager: sleep requested (sleeping: no  enabled: yes)
Oct  2 02:08:06 eghx-nitro NetworkManager[8152]: <info>  [1601597286.6443] manager: sleeping...
Oct  2 02:08:06 eghx-nitro NetworkManager[8152]: <info>  [1601597286.6447] manager: NetworkManager state is now ASLEEP
Oct  2 02:08:06 eghx-nitro NetworkManager[8152]: <info>  [1601597286.6453] device (wlp2s0): state change: activated -> deactivating (reason 'sleeping') [100 110 37]
Oct  2 02:08:06 eghx-nitro NetworkManager[8152]: <info>  [1601597286.8169] device (wlp2s0): state change: deactivating -> disconnected (reason 'sleeping') [110 30 37]
Oct  2 02:08:06 eghx-nitro NetworkManager[8152]: <info>  [1601597286.8356] dhcp4 (wlp2s0): canceled DHCP transaction, DHCP client pid 8328
Oct  2 02:08:06 eghx-nitro NetworkManager[8152]: <info>  [1601597286.8356] dhcp4 (wlp2s0): state changed bound -> done
Oct  2 02:08:06 eghx-nitro NetworkManager[8152]: <info>  [1601597286.8363] dns-mgr: Writing DNS information to /sbin/resolvconf
Oct  2 02:08:06 eghx-nitro kernel: [24100.153393] wlp2s0: deauthenticating from e8:cc:18:41:3c:15 by local choice (Reason: 3=DEAUTH_LEAVING)
Oct  2 02:08:07 eghx-nitro NetworkManager[8152]: <warn>  [1601597287.0509] sup-iface[0xb4a6f0,wlp2s0]: connection disconnected (reason -3)
Oct  2 02:08:07 eghx-nitro NetworkManager[8152]: <info>  [1601597287.0511] device (wlp2s0): supplicant interface state: completed -> disconnected
Oct  2 02:08:07 eghx-nitro NetworkManager[8152]: <info>  [1601597287.0525] device (wlp2s0): state change: disconnected -> unmanaged (reason 'sleeping') [30 10 37]
Oct  2 02:08:08 eghx-nitro kernel: [24101.983885] PM: suspend entry (deep)
Oct  2 02:08:09 eghx-nitro kernel: [24101.983888] PM: Syncing filesystems ... done.
Oct  2 10:21:32 eghx-nitro kernel: [24103.953554] Freezing user space
processes ... (elapsed 0.002 seconds) done.
  1. Based on Nvidia forum added the following to grub and updated.

     GRUB_CMDLINE_LINUX_DEFAULT="quiet acpi_rev_override=1
     acpi_osi=Linux scsi_mod.use_blk_mq=1 nouveau.modeset=0
     nouveau.runpm=0 mem_sleep_default=deep"
    

    Added the following to iniramfs-tools/modules and updated.

     nvidia
     nvidia_modeset
     nvidia_uvm
     nvidia_drm
    
  2. Didn't change kernel as there was no evidence towards it. People changed to 4.17. Mine is currently 4.15.

  3. Blind try: Trying different (Suspend)s

     systemctl suspend
    

    pm-suspend

  4. Tried downgrading the drivers to 384 from 430 with changing it at additional drivers. This was not useful as this was not capable of co-existing with pytorch=1.6.0

  5. Complete remove and re-install of nvdia-430 as per here:
    purge, add-apt-repository ppa:graphics-drivers/ppa, update and autoinstall.

    This ended in the black screen of death. Recovered it with noveau.modeset=0. Somehow GPU was not working anymore.

  6. At this point did a complete re-install of xserver,unity, lightdm and nvidia-430 over tty terminal before login screen.

    This recovered the system to it's previous state i.e., suspend when GPU full hangs the system.

0 Answers0