1

In my new laptop with dual boot (Ubuntu 22.04 with kernel 6.7 + Windows 11), I tried installing the most recent production-branch Nvidia driver (version 535) available for my GPU (GeForce 4070). While no error seems to be thrown during installation, when I try nvidia-smi I get the infamous error:

NIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

To solve that, I have gone through dozens of previous questions/answers asked here in the past. I tried:

  1. disabling Secure Boot as suggested in this question.

  2. purging + reinstalling as suggested in this prior question:

    sudo apt-get remove --purge '^nvidia-.*'   
    sudo apt-get remove --purge '^libnvidia-.*'  
    sudo apt-get remove --purge '^cuda-.*'  
    sudo apt install nvidia-driver-535  
    

(I also tried purging and then installing via Software & Updates, or purging and then installing via the terminal at Ubuntu recovery terminal as root).

  1. issuing prime-select nvidia as suggested in this question.

  2. making sure I have Wayland disabled, as suggested in multiple places.

Nothing helped. What else could I try?

karel
  • 114,770
YzSun
  • 13
  • 2
  • After installing the NVIDIA driver did you reboot the system so that the driver is activated? Also, you didn't post if there was any errors during the install with the 6.7 Kernel as that is a mainline Kernel and may not be ready for normal use yet. The HWE Kernel version for 22.04.3 LTS is 6.5.0-15-generic which I know that the nvidia-driver-535 installs just fine into. – Terrance Feb 01 '24 at 02:59
  • @Terrance thanks for your comment. So: yes I rebooted every time I purged / installed. No errors at all with the kernel installation and it's working perfectly (I need it because it solves issues with SSD and power management with novel laptops). – YzSun Feb 01 '24 at 03:29
  • Check the output of dkms status. If the driver was able to install into the Kernel correctly then it should be listed. – Terrance Feb 01 '24 at 03:35
  • Is/has the gcc-12 package been installed? If not, install it and try the Nvidia driver selection again so the gcc-12 may be used to build the module. An ongoing issue here and in forums (and bugs). – ubfan1 Feb 01 '24 at 03:42
  • @Terrance dkms status returns: nvidia/535.154.05: added – YzSun Feb 01 '24 at 04:17
  • @ubfan1 thanks for your comment. So, gcc-12 is available in the system, yes. – YzSun Feb 01 '24 at 04:18
  • Yep, it didn't install the driver into the Kernel. From that 6.7 Kernel try running sudo dkms build nvidia/535.154.05 and then run sudo dkms install nvidia/535.154.05 --force. Then rerun the dkms status. – Terrance Feb 01 '24 at 04:20
  • @Terrance hm, interesting. That throws an error saying that kernel readers cannot be found. To solve that I tried the usual sudo apt-get install linux-headers-$(uname -r) but then I get E: Package 'linux-headers-6.7.2-060702-generic' has no installation candidate. – YzSun Feb 01 '24 at 05:43
  • Download the headers from https://kernel.ubuntu.com/mainline/v6.7.2/ and install them, then try again. If you haven't, install the -image and the -modules as well. – Terrance Feb 01 '24 at 05:49
  • @Terrance fantastic - it worked. If you want to put the steps in an answer I would like to upvote and accept it. – YzSun Feb 01 '24 at 07:35
  • Added an answer for you. – Terrance Feb 01 '24 at 14:20

1 Answers1

1

The present Kernel version you have installed, it seems to be missing the headers for it. Since you are using a mainline kernel version 6.7.2-060702, you should download both the headers and modules from https://kernel.ubuntu.com/mainline/v6.7.2/

Note that mainline Kernels are still in development and can be prone to errors and missing components. Any bugs and errors should be reported to launchpad for tracking.

To install them is just done by running sudo dpkg -i *.deb

After they are installed I recommend a reboot.

Since the driver installation only added the driver to the Kernel but not installed it by checking dkms status it will still need to be built and installed.

$ dkms status
nvidia/535.154.05: added

You can build it and install it now because the headers are installed so all you should have to run is the following to get it to complete the installation from the current Kernel of 6.7:

sudo dkms build nvidia/535.154.05
sudo dkms install nvidia/535.154.05 --force

Then reboot again so that the driver loads.

Hope this helps!

Terrance
  • 41,612
  • 7
  • 124
  • 183
  • Thanks, this solved the original issue and actually a couple more that happened to be related to the missing headers. I accepted the answer and upvoted (although my upvote does not register since I have less than 15 rep). – YzSun Feb 01 '24 at 16:29
  • @YzSun Feel free to edit questions here to earn Rep. That is actually how I started here. :) I also just gave you a +1. :) – Terrance Feb 01 '24 at 16:47
  • thanks for the +1 and for the advice - it makes sense I'll do it! – YzSun Feb 01 '24 at 17:01