2

I use 2080Ti, Here is what I did:

1: I installed a clean xubuntu 20.04 from scratch up.

2: I noticed the default driver is xorg driver, so I installed nvidia-driver-470 from

software & update > additional driver > using NVIDIA driver metapackage from nvidia-driver-470 (proprietary, tested)

, and the installation succeeded, I saw the beautiful table outprint from command nvidia -smi.

3: then I realized I also need CUDA 11.3, so I just did the following commands according to CUDA official website:

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/11.3.0/local_installers/cuda-repo-ubuntu2004-11-3-local_11.3.0-465.19.01-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2004-11-3-local_11.3.0-465.19.01-1_amd64.deb
sudo apt-key add /var/cuda-repo-ubuntu2004-11-3-local/7fa2af80.pub
sudo apt-get update
sudo apt-get -y install cuda

And from there, everything breakup, casue I lost the display resolution to 800*600 after reboot. And nvidia -smi didn't output that beautiful table anymore. The error message is a two-line warning, which I forgot to save.

4: I realized I didn't purge remove nvidia before install CUDA, cause CUDA included nvidia-driver. so I did the following:

sudo apt-get --purge -y remove '*nvidia*'
sudo apt-get --purge -y remove '*cuda*'
sudo apt-get update
sudo apt-get upgrade
sudo apt autoremove -y

after reboot, the output of nvidia -smi is:

nvidia: command not found

5: looks the purge succeeded, and then I did step3 again. But not working.

6: then I did the purge step4 again, and then try to use ppa instead of dpkg to install CUDA again:

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/7fa2af80.pub
sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/ /"
sudo apt-get update
sudo apt-get -y install cuda

not working. no matter nvidia -smi or nvcc -V. showing command not found.

7: Then I when I am rebooting, I saw the safe loading choice become ubuntu linux 5.14 -oem, it used to be generic, now it is oem. I don't know when exactly this change happened. I realized maybe the linux loading image maybe damaged. I used the older head 5.10 generic to load, and then step4-step6 install CUDA. Results: not working.

8: Then I purged (step4) again, and upgrade my linux from 20.04 to 20.10. and then did the purge install things again. And not working.

I really ran out of methods. So my question is:

1: If I reinstall xubuntu 20.04, and use step3 to give a clean CUDA installation, would that work? I guess yes.

2: If I reinstall the system, should I use ppa(step6) to install CUDA or use dpkg(step3) to do that, which is better? Considering that dpkg method can limit the version to 11.3, without need to worry about unwanted updates. But I heard I can use command sudo apt-mark hold <package-name> to prevent it from updating, which I never tried.

3: I really don't want to reinstall the system, how can got CUDA and the driver running up based on current conditions?

4: further thinking, since CUDA is only a HPC thing and only related to development for me, should I only install nvidia-driver on my machine, and install CUDA into docker and let it running there? Can it work without CUDA installed in the real machine?

5: further thinking, how could the loading menu become linux 5.14 -oem? what nvidia did to the kernel irreversiblly?

Thanks a lot!

Westack
  • 131
  • No one answer my question? looks like there is only one way, to reinstall the entire system. Er~ – Westack Jan 06 '22 at 02:11
  • That's nvidia-smi (no space). See https://askubuntu.com/questions/1077061/how-do-i-install-nvidia-and-cuda-drivers-into-ubuntu/1077063#1077063 https://askubuntu.com/questions/1219761/cuda-10-2-different-installation-paths/1244010#1244010 and use Nvidia drivers from the standard repos, use the .run file and put cuda libs and exes where you want them in your own directory, not a system area. – ubfan1 Jan 06 '22 at 04:11

0 Answers0