0

There was an update in my Ubuntu 22.04.03 LTS OS last week, and my nvidia-515 with cuda 11.7 broke. On reinstalling, it fails. I'm stuck, tried to debug and inspect logs - which points to version mismatch, but the versions are indeed the same 11.4.0.

Is it the name that needs some change. Forcing alias of cc and gcc to x86_64 also hasn't change anything. Tried multiple iterations with reboots after each attempt, to no avail. Sadly using "apt install nvidia-driver-515" is not an option because it install 525 due to its transition. Have a hard need for driver-515 with cuda 11.7.

$ sudo ./cuda_11.7.0_515.43.04_linux.run
[Error - listed in nvidia-installer.log below]

$ vim /var/log/nvidia-installer.log ... The kernel was built by: x86_64-linux-gnu-gcc-11 (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 You are using: cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0

$ ll /usr/bin/gcc lrwxrwxrwx 1 root root 32 Aug 20 09:53 /usr/bin/gcc -> /usr/bin/x86_64-linux-gnu-gcc-11*

$ ll /usr/bin/cc lrwxrwxrwx 1 root root 20 Aug 20 09:16 /usr/bin/cc -> /etc/alternatives/cc*

$ ll /etc/alternatives/cc lrwxrwxrwx 1 root root 12 Aug 20 09:16 /etc/alternatives/cc -> /usr/bin/gcc*

$ gcc --version gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0

$ x86_64-linux-gnu-gcc-11 --version x86_64-linux-gnu-gcc-11 (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0

$ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 22.04.3 LTS Release: 22.04 Codename: jammy

  • Which kernel are you using? What video card? 5.15 is the only valid Ubuntu 22.04 kernel that takes gcc-11. The 6.5 kernel takes gcc-12 to build the Nvidia 535/545 module, but just having gcc-12 installed is enough, no need to mess with /bin/gcc links. Your video card may have max Nvidia drivers/CUDA versions, but CUDA installs I've made since 8 don't care about the Nvidia driver. My CUDA11.8 samples still run fine with the Nvidia 545 driver. – ubfan1 Feb 21 '24 at 01:56

1 Answers1

0

My cuda install recently broke after a system update, too. I'm on 22.04 had a hard need for nvidia-driver-535 and cuda toolkit 12.1. Perhaps this will work for you with driver-515 and 11.4.

I was getting the same compiler mismatch errors you saw, so I manually updated the symlink in /usr/bin/gcc from gcc-11 to gcc-12.

I removed and purged all nvidia drivers with

sudo apt remove --purge '^nvidia-.*'

and restarted to confirm there were no nvidia drivers in place.

Then I followed the instructions here:

How do I install NVIDIA and CUDA drivers into Ubuntu?

essentially:

  • apt install nvidia-driver-535
  • restart to confirm driver-535 was in place
  • ran the local runfile ignoring the runfile installer's warning and de-selecting the driver:
  • sudo ./cuda_12.1.1_530.30.02_linux.run

Currently, nvidia-smi, nvcc --version, and all the cuda-samples are Working as Designed. I hope this helps you, too.

smcnally
  • 396
  • I'm not certain the compiler updates from gcc-11 to gcc-12 were required for this fix. I am certain I'd prefer to handle this through update-alternatives vs manually re-creating the symlink. – smcnally Feb 20 '24 at 22:35