2

System Information

  • MSI Creator 15 Laptop
  • NVIDIA GeForce RTX 2070 SUPER Mobile / Max-Q
  • External LG Ultrawide monitor
  • Windows 10 / Ubuntu 20.04 dual boot

The Problem

I have been using the nvidia 455 drivers on my Ubuntu 20.04 machine successfully now for about six months. I rarely use the Windows partition, but I was using it yesterday. After shutting down Windows 10 and returning to Ubuntu, my external display stopped working entirely.

(Note: it's possible Windows has nothing to do with the issue -- restarting may have given Ubuntu the chance to update packages and break itself)

Apparently, the NVIDIA drivers no longer work. Running nvidia-smi and other commands produced the following error:

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver.  Make sure that the latest NVIDIA driver is installed and running.

Googling for answers, most of the solutions recommended reinstalling the NVIDIA drivers when this happens. Note that I need the graphics drivers as well as the CUDA toolkit along with nvcc etc..

Purge Nvidia

I have tried many different solutions, and I run these commands whenever I get stuck and need to start fresh.

sudo apt purge nvidia*
sudo apt purge libnvidia*
sudo apt autoremove

Normally I'm running these in recovery mode after freshly-installed drivers cause Ubuntu to get stuck in the startup process after rebooting.

I also check dpkg -l | grep nvidia and remove any of the packages left over by the installation process. This was necessary when I wanted to install older versions of the drivers.

Attempted Solutions

Here's a list of everything I've tried:

  • restarting my machine countless times (including full power off and unplugging for a while)

  • Following the official NVIDIA Cuda Installation Guide to reinstall drivers and manage conflicts. For example,

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/11.4.1/local_installers/cuda-repo-ubuntu2004-11-4-local_11.4.1-470.57.02-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2004-11-4-local_11.4.1-470.57.02-1_amd64.deb
sudo apt-key add /var/cuda-repo-ubuntu2004-11-4-local/7fa2af80.pub
sudo apt-get update
sudo apt-get -y install cuda
  • Tried to blacklist noveau and nvidiafb:
blacklist nvidiafb
blacklist nouveau
options nouveau modeset=0
  • When reinstalling nvidia drivers, I tried multiple driver versions (470, 465, 460, 455) using multiple installation methods (first deb, then ubuntu-distributed, then runfile). All of them failed in different ways. Most commonly, when I reboot after installing the drivers, Ubuntu hangs infinitely on startup (I see a black screen with an MSI logo and an "ubuntu" logo, sometimes with a spinning circle).

  • The NVIDIA drivers seem to still be working fine in Windows, so I don't think my graphics card is fried or anything like that.

  • booting into Ubuntu recovery mode from grub and select the dpkg repaiingr option -- didn't seem to help anything

  • sudo ubundu-drivers autoinstall -- this installed the nvidia 470 drivers, unsuccessfully

  • I noticed that uname -r indicated my kernel version was 5.11, when the support table for the Nvidia drivers shows that only 5.4 is supported for Ubuntu 20.04. So, I downgraded o 5.4 and re-installed the nvidia drivers, again with no success.

Observations

nvidia-smi does produce output (instead of an error) in the following situations:

  • after reinstalling drivers but before restarting the system
  • in recovery mode after reinstalling drivers

What now?

I am at a complete loss for what to do. The only thing I can think of is to completely re-install Ubuntu, which seems crazy when everything was working just fine yesterday.

References

AskUbuntu.SE, "Blank screen after installing nvidia restricted driver"

AskUbuntu.SE, Ubuntu 18.04 and nVidia. Stuck after boot

AskUbuntu.SE, Boot hangs after installing the latest driver from PPA and Ctrl+Alt+F1 keyboard shortcut doesn't work

AskUbuntu.SE, Stuck at boot screen, Nvidia graphics driver issues

AskUbuntu.SE Changing NVIDIA Drivers makes Ubuntu freeze on startup

AskUbuntu.SE Blank screen after installing nvidia restricted driver

AskUbuntu.SE graphics driver stopped working

AskUbuntu.SE Ubuntu 20.04 Nvidia graphics unusable (recommends switching to kernel 5.4)

System Info

Before writing this question, I again purged everything from my system using the method described above. In this state, here is some system information:

Kernel Version

$ uname -r
5.4.0-80-generic

Secure Boot

$ sudo mokutil --sb-state
SecureBoot disabled

lshw

$ sudo lshw -C display
  *-display UNCLAIMED       
       description: VGA compatible controller
       product: TU104M [GeForce RTX 2070 SUPER Mobile / Max-Q]
       vendor: NVIDIA Corporation
       physical id: 0
       bus info: pci@0000:01:00.0
       version: a1
       width: 64 bits
       clock: 33MHz
       capabilities: pm msi pciexpress vga_controller cap_list
       configuration: latency=0
       resources: memory:ac000000-acffffff memory:80000000-8fffffff memory:90000000-91ffffff ioport:3000(size=128) memory:ad000000-ad07ffff
  *-display
       description: VGA compatible controller
       product: UHD Graphics
       vendor: Intel Corporation
       physical id: 2
       bus info: pci@0000:00:02.0
       version: 05
       width: 64 bits
       clock: 33MHz
       capabilities: pciexpress msi pm vga_controller bus_master cap_list rom
       configuration: driver=i915 latency=0
       resources: irq:191 memory:ab000000-abffffff memory:40000000-4fffffff ioport:4000(size=64) memory:c0000-dffff

hwinfo

$ hwinfo --gfxcard
16: PCI 100.0: 0300 VGA compatible controller (VGA)             
  [Created at pci.386]
  Unique ID: VCu0.pBgP2fGEzV8
  Parent ID: vSkL.sXdMPV6yXb4
  SysFS ID: /devices/pci0000:00/0000:00:01.0/0000:01:00.0
  SysFS BusID: 0000:01:00.0
  Hardware Class: graphics card
  Model: "nVidia VGA compatible controller"
  Vendor: pci 0x10de "nVidia Corporation"
  Device: pci 0x1e91 
  SubVendor: pci 0x1462 "Micro-Star International Co., Ltd. [MSI]"
  SubDevice: pci 0x12c6 
  Revision: 0xa1
  Memory Range: 0xac000000-0xacffffff (rw,non-prefetchable,disabled)
  Memory Range: 0x80000000-0x8fffffff (ro,non-prefetchable,disabled)
  Memory Range: 0x90000000-0x91ffffff (ro,non-prefetchable,disabled)
  I/O Ports: 0x3000-0x307f (rw,disabled)
  Memory Range: 0xad000000-0xad07ffff (ro,non-prefetchable,disabled)
  IRQ: 255 (no events)
  Module Alias: "pci:v000010DEd00001E91sv00001462sd000012C6bc03sc00i00"
  Driver Info #0:
    Driver Status: nvidiafb is not active
    Driver Activation Cmd: "modprobe nvidiafb"
  Driver Info #1:
    Driver Status: nouveau is not active
    Driver Activation Cmd: "modprobe nouveau"
  Driver Info #2:
    Driver Status: nvidia_drm is not active
    Driver Activation Cmd: "modprobe nvidia_drm"
  Driver Info #3:
    Driver Status: nvidia is not active
    Driver Activation Cmd: "modprobe nvidia"
  Config Status: cfg=new, avail=yes, need=no, active=unknown
  Attached to: #11 (PCI bridge)

34: PCI 02.0: 0300 VGA compatible controller (VGA) [Created at pci.386] Unique ID: _Znp.7YEiQ6GHkFE SysFS ID: /devices/pci0000:00/0000:00:02.0 SysFS BusID: 0000:00:02.0 Hardware Class: graphics card Device Name: "Onboard - Video" Model: "Intel VGA compatible controller" Vendor: pci 0x8086 "Intel Corporation" Device: pci 0x9bc4 SubVendor: pci 0x1462 "Micro-Star International Co., Ltd. [MSI]" SubDevice: pci 0x12c6 Revision: 0x05 Driver: "i915" Driver Modules: "i915" Memory Range: 0xab000000-0xabffffff (rw,non-prefetchable) Memory Range: 0x40000000-0x4fffffff (ro,non-prefetchable) I/O Ports: 0x4000-0x403f (rw) Memory Range: 0x000c0000-0x000dffff (rw,non-prefetchable,disabled) IRQ: 192 (55080 events) Module Alias: "pci:v00008086d00009BC4sv00001462sd000012C6bc03sc00i00" Driver Info #0: Driver Status: i915 is active Driver Activation Cmd: "modprobe i915" Config Status: cfg=new, avail=yes, need=no, active=unknown

Primary display adapter: #16

ubuntu-drivers

$ ubuntu-drivers devices
== /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0 ==
modalias : pci:v000010DEd00001E91sv00001462sd000012C6bc03sc00i00
vendor   : NVIDIA Corporation
driver   : nvidia-driver-450-server - distro non-free
driver   : nvidia-driver-460 - distro non-free recommended
driver   : nvidia-driver-470-server - distro non-free
driver   : nvidia-driver-470 - third-party non-free
driver   : nvidia-driver-460-server - distro non-free
driver   : xserver-xorg-video-nouveau - distro free builtin

Thank You!

Please let me know if any further information is needed and I'll do my best to provide it! Thanks for any help you can provide!

  • See the nice writeup https://askubuntu.com/questions/1077061/how-do-i-install-nvidia-and-cuda-drivers-into-ubuntu/1077063#1077063 for using the runfile to install CUDA. Basically, install the (470 for your card) Nvidia driver from the standard repos, then (optionally) override the runfile default (system) locations to your local cuda setup. Treat CUDA like an app, it doesn't dictate the system video driver or compiler. You can install all the CUDA files locally, then add overrides as needed for gcc, etc. to that CUDA/bin, which gets put early in the PATH. – ubfan1 Aug 07 '21 at 17:24
  • Since trying different drivers, have you totally purged before attempting install of new driver? If not purged you get conflicts and then nothing works. nVidia install, purge if needed. https://ubuntuforums.org/showthread.php?t=2383560&p=13735336#post13735336 Purge then install the recommended driver. – oldfred Aug 07 '21 at 22:37
  • @oldfred, yes I purge between each attempted reinstall using the steps listed in my question. Is there any diagnostic tool for discovering improperly installed/uninstalled graphics drivers? – Benjamin Bray Aug 08 '21 at 04:27
  • @ubfan1 Thanks -- but every other source that I've seen has said that installing from a runfile is a big no-no unless you really know what you're doing (which I really don't!). I worry that it might leave my system in a state that's even more difficult to diagnose / upgrade later. – Benjamin Bray Aug 08 '21 at 04:28

3 Answers3

2

I ran the following today (after purging as described above) and it seems to be working again after a reboot:

sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update
sudo apt install nvidia-driver-460

Don't ask me why it works -- I tried literally the same thing yesterday with no success.

  • I have limited experience and migrated to Mint for my desktop OS, but these Nvidia driver issues are persistent across Ubuntu and as best I can tell other variants. It is likely a sub-optimal suggestion for you, but consider at least not running dual boot and consider migrating to an AMD GPU (or Intel, if that can do everything you require), and I know how hard that is for laptops. – Paul Aug 08 '21 at 13:55
0

I solved the problem by reinstalling the driver and enabling all GPUs through the Nvidia driver:

  1. Run the command:

    sudo nvidia-config --enable-all-gpus.
    
  2. Shut down and power up (not reboot).

ThunderBird
  • 1,955
0

This recently happened to me on Fedora as well, though I noticed that Windows chose to update the system firmware (UEFI/BIOS) just before Linux NVIDIA drivers stopped working. I understand that this happened almost a year ago now, but is it possible that this was the case for you as well?

In my case, I'd previously installed the NVIDIA kernel drivers and signed them using a signing key I'd manually generated. I did this using the process lined out in the following link, though not all of the steps in the process will apply to Ubuntu users:

https://blog.monosoul.dev/2021/12/29/automatically-sign-nvidia-kernel-module-in-fedora/

As it turns out, keys enrolled on your system with mokutil can be lost in firmware updates: this is what happened to me. To fix my issue, all I needed to do was track down my signing key and re-enroll the key I'd signed the NVIDIA drivers with:

sudo mokutil --import /etc/pki/akmods/certs/public_key.der

Of course, the location of your public key will likely be different, as Ubuntu does not use akmods.

JustALawnGnome7
  • 411
  • 1
  • 3
  • 8