7

I run Ubuntu 20.04 and after the last reboot I had trouble with my graphics driver - the system is in low res, only one monitor is working.

Debug Output

$ sudo lshw -C display
  *-display UNCLAIMED       
       description: VGA compatible controller
       product: TU104 [GeForce RTX 2070 SUPER]
       vendor: NVIDIA Corporation
       physical id: 0
       bus info: pci@0000:31:00.0
       version: a1
       width: 64 bits
       clock: 33MHz
       capabilities: pm msi pciexpress vga_controller cap_list
       configuration: latency=0
       resources: memory:f5000000-f5ffffff memory:e0000000-efffffff memory:f0000000-f1ffffff ioport:f000(size=128) memory:f6000000-f607ffff
$ sudo dkms status
nvidia, 510.47.03: added

That status seems a bit exotic, at least I did not find many similar cases while googling.

$ nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
$ modinfo nvidia
modinfo: ERROR: Module nvidia not found.

In the system info I see "llvmpipe (LLVM 12.0.0, 256 bits)" as my graphics.

What I Tried

I have tried multiple ways of installing Nvidia drivers, I used apt sudo apt autoremove --purge nvidia* && sudo apt install nvidia-driver-510, "Additional Drivers" UI and ubuntu-drivers, I tried the currently latest version 510 and the older one that worked before - 470. I also tried selecting nvidia sudo prime-select nvidia as well as selecting intel and swithcing back to nvidia - same result.

Background

I used Nvidia driver 470 and Kernel 5.13.0.26, then after reboot I got Kernel .27 and no wifi, I had that problem recently due to Nvidia driver (needed to install linux-modules-extra for the new Kernel) so I decided to upgrade drivers hoping everything will be resolved. That lead to the current situation: installing linux-modules-extra-5.13.0-27-generic and then after switch to 510 - same for .28 fixed wifi issue, but the video driver is broken. While using 5.13.0.27 I was able to boot 5.13.0.26 and there I had working video, now it's not the case cause .27 is the oldest of recent Kernels in the Grub menu.

I feel like I am missing some step that would fix that, would appreciate any help.

UPD

sudo dkms install -m nvidia -v 510.47.03 -k 5.13.0-28-generic --force
Error! Your kernel headers for kernel 5.13.0-28-generic cannot be found.
Please install the linux-headers-5.13.0-28-generic package,
or use the --kernelsourcedir option to tell DKMS where it's located
$ sudo dkms build -m nvidia -v 510.47.03
Error! Your kernel headers for kernel 5.13.0-28-generic cannot be found.
Please install the linux-headers-5.13.0-28-generic package,
or use the --kernelsourcedir option to tell DKMS where it's located

So it seems dkms is somehow unaware of my kernel. I used above error message's recommendation and installed the headers sudo apt install linux-headers-5.13.0-28-generic, after that the output looks better:

sudo dkms build -m nvidia -v 510.47.03
Module nvidia/510.47.03 already built for kernel 5.13.0-28-generic/4
sudo dkms status
nvidia, 510.47.03, 5.13.0-28-generic, x86_64: installed

I'll try rebooting now and then install the driver as per recommendation in comments.

UPD2

That's it, everything seems to work now. There is no need to do anything about the drivers, it seems the problem was with missing headers.

heik
  • 93
  • 1
  • 1
  • 7
  • 1
    sudo dkms install -m nvidia -v 510.47.03 -k 5.13.0-28-generic --force should be able to install the driver into that kernel. – Terrance Feb 12 '22 at 14:49
  • 1
    @Terrance The dkms build/install probably failed during the install of Nvidia 510 because secure boot is enabled... or a kernel lib/extras is missing. – heynnema Feb 12 '22 at 15:13
  • 1
    @heynnema I guess that is possible that Secure Boot is enabled. Usually once you see the dkms driver showing added is that the 2 of the 3 steps are done where it is dkms add and dkms build performed. Just the dkms install step wasn't ran or failed. Or maybe the dkms build might have failed too. – Terrance Feb 12 '22 at 18:24
  • 1
    What output do you get when you run sudo dkms build -m nvidia -v 510.47.03? – Terrance Feb 12 '22 at 18:29
  • 1
    @Terrance Yes, I suspect that the dkms build failed, either because Secure Boot was enabled, or if some libs are missing. dkms status didn't show prior builds against older kernels, which probably meant that OP never had Nivdia installed before. We'll see if your dkms build command works, or errors out. Then a dkms install would be next. – heynnema Feb 12 '22 at 19:35
  • @Terrance @heynnema sorry, I forgot to mention I checked secure boot, it is disabled. But it is possible that some libs are missing - some time ago I used aptitude but then I remembered it can mess up dependencies, so I suspect that was the root cause.
    $ sudo dkms build -m nvidia -v 510.47.03
    Error! Your kernel headers for kernel 5.13.0-28-generic cannot be found.
    Please install the linux-headers-5.13.0-28-generic package,
    or use the --kernelsourcedir option to tell DKMS where it's located
    
    – heik Feb 12 '22 at 20:03
  • The comments already helped a lot, at least it is clear that something was missing and what it was, I have updated the post. – heik Feb 12 '22 at 20:12
  • @Terrance please post your comments and recommendation to install linux-headers-5.13.0-28-generic (see updated post) so I will accept it as the answer. Your comments lead me to the solution. Many thanks! – heik Feb 12 '22 at 20:18
  • 1
    @heik If you want to you can go ahead and write up the answer and I will upvote it. I have no problem stopping in and helping where I can, and I am glad that you were able to solve the issue. ;) – Terrance Feb 12 '22 at 23:03
  • 1
    Hi! I'm sorry for being unclear... While writing my answer I wrongly assumed that you have all the prerequisites installed. Of course, Linux Headers are needed! I've updated my answer and improved clarity. Also, you've installed the kernel headers of your current kernel... You'll have to repeat this process each time your kernel gets an upgrade. Consider installing the package sudo apt install linux-headers-generic so that you won't have to repeat this process. As, I have clarified my answer, you may accept it or post a new answer... – Error404 Feb 13 '22 at 02:19
  • 1
    @Terrance I wish the world had more people like you :) – heik Feb 13 '22 at 09:54
  • Thank you for this question and the answer. It turns out the problem on my system was exactly the same. Installing the missing headers fixed everything. This was so frustrating... – Marcin Zalewski Jul 14 '23 at 05:51

2 Answers2

6

Assuming that you have all the prerequisites installed (sudo apt install linux-headers-generic), you can follow these steps to fix the issue:

  1. (Optional) Boot into a root shell to safely run the commands.

  2. Remove your dkms file for NVIDIA drivers:

    sudo rm -r /var/lib/dkms/nvidia
    
  3. Purge NVIDIA drivers:

    sudo dpkg -P --force-all $(dpkg -l | grep "nvidia-" | grep -v lib | awk '{print $2}')
    
  4. Reinstall NVIDIA drivers:

    sudo ubuntu-drivers autoinstall
    
  5. Reboot!

Now, your NVIDIA drivers should work properly!

Error404
  • 7,440
  • I'd do this slightly differently. Wrong way to remove dkms/nvidia with rm. No need to reinstall dkms. I'd first check that secure boot is disabled. The nvidia dkms is added, but not built or installed. dkms build and dkms install. Reboot. Check dkms status and nvidia-smi. – heynnema Feb 12 '22 at 14:22
  • I tried that, but in the end I get the same result. – heik Feb 12 '22 at 20:05
  • 1
    @heik Hi! I'm sorry for being unclear... While writing my answer I wrongly assumed that you have all the prerequisites installed. Of course, Linux Headers are needed! I've updated my answer and improved clarity. Also, you've installed the kernel headers of your current kernel... You'll have to repeat this process each time your kernel gets an upgrade. Consider installing the package sudo apt install linux-headers-generic so that you won't have to repeat this process. As, I have clarified my answer, you may accept it or post a new answer... – Error404 Feb 13 '22 at 02:20
  • 1
    @Someone thank you for your feedback! I honestly also assumed I have all the prerequisites installed, I guess an attempt to use apritide in the past messed up my dependencies more then I thought. I do have the latest version of linux-headers-generic despite installing the specific one as mentioned in my update: "linux-headers-generic is already the newest version (5.4.0.99.103)." I've accepted your answer cause together with my updates it should cover everything a googling person might need in a similar situation. – heik Feb 13 '22 at 09:51
  • 3
    It is simply unbelievable: this is the second time an automatic update breask the nVidia installation on 20.04. Why on earth they keep updating the drivers if it does not work at all. – Antonio Sesto Jul 07 '22 at 11:42
  • @AntonioSesto its December 2023 and this just screwed me for the 3rd time this year. – Xavier Hubbard Anderson Dec 22 '23 at 18:00
  • @XavierHubbardAnderson Yes, I know: the same happened here... it is unbelievable. – Antonio Sesto Jan 11 '24 at 10:30
0

I have the same graphics card and Ubuntu version. I was getting similar errors, but my solution was to do:

sudo init 3
# Then after logging in again:
sudo apt install nvidia-driver-525 nvidia-dkms-525
sudo reboot