This is a question on a topic that has, in different variations, been asked already. However, since none of the answers I found was applicable to my problem, I will first outline the problem and then, in case anyone else finds themselves in the same spot, outline the answers I tried. Perhaps they work for you. In any case, I would be grateful for any new information on this issue.
Version: 16.04
Kernel: 4.15.0-133-generic
Since I wanted to use CUDA 11, I uninstalled my previous NVIDIA driver with
sudo apt --purge remove "*nvidia*"
as well as tried to remove everything from the previous CUDA versions via
sudo apt --purge remove "*cuda*" "*cublas*" "*cufft*" "*curand*" "*cusolver*" "*cusparse*" "*npp*" "*nvjpeg*" "cuda*" "nsight*"
and
sudo apt-get autoremove
.
I then installed the graphics driver and CUDA from command line as described in the nvidia page, as well as here. For a successful installation, this step needed to be performed in the terminal with Ctrl+Alt+F1. Also, the XServer needed to be stopped via sudo service lightdm stop
(at least I think that's what it does). After the installation of both driver and the CUDA toolkit and rebooting the system, I ran the deviceQuery program as well as a simulation I wrote for CUDA succesfully. However, in the graphical interface I was stuck in a log-in loop (references to similar posts below).
Since none of the below listed remedies worked, I tried to install CUDA and the NVIDIA driver from the graphics-drivers ppa via sudo add-apt-repository ppa:graphics-drivers/ppa
. After installing the appropriate driver via sudo apt-get install nvidia-460
and rebooting, I could access the graphical interface again. nvidia-smi
shows a running nvidia driver:
$ nvidia-smi
Tue Feb 23 14:50:14 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: N/A |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Quadro P3000 Off | 00000000:01:00.0 On | N/A |
| N/A 50C P0 23W / N/A | 405MiB / 6078MiB | 2% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1322 G /usr/lib/xorg/Xorg 260MiB |
| 0 N/A N/A 2502 G compiz 49MiB |
| 0 N/A N/A 32082 G ...gAAAAAAAAA --shared-files 91MiB |
+-----------------------------------------------------------------------------+
On the other hand, no method of installing CUDA (either via the runfile but without a new installation of the driver, nor through sudo apt install nvidia-cuda-toolkit
or sudo apt install cuda-toolkit-11-2
) leads to a successful installation of CUDA. Programs compile via the nvcc without problems, however ./deviceQuery
returns
$ ./deviceQuery
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
cudaGetDeviceCount returned 35
-> CUDA driver version is insufficient for CUDA runtime version
Result = FAIL
and other programs terminate once CUDA-parts are reached. Note that the reason for failing (driver version is insufficient) is not correct, since the installed driver is 460.32.03, which is sufficient according to the nvidia manual. On the other hand, the nvidia-smi also doesn't seem to notice CUDA is installed. Currently, with the driver installed from the ppa and CUDA installed from the runfile, and I have
$ lspci -k | grep -EA3 'VGA|3D|Display'
00:02.0 VGA compatible controller: Intel Corporation Device 591b (rev 04)
Subsystem: Lenovo Device 224c
Kernel driver in use: i915
Kernel modules: i915
--
01:00.0 3D controller: NVIDIA Corporation GP104GLM [Quadro P3000 Mobile] (rev a1)
Subsystem: Lenovo Device 224c
Kernel driver in use: nvidia
Kernel modules: nvidiafb, nouveau, nvidia_460_drm, nvidia_460
I would be very grateful for any ideas on how to either make the driver installed via the runfile work together with the Xserver or to make the driver from the ppa work together with CUDA.
Thank you and best,
David
Now for some tried and failed solutions: With driver installed from runfile:
- try installing gdm instead of lightdm as specified here by WindowsEscapist
- make sure Xauthority user rights are with user.
with driver installed from ppa:graphics-drivers/ppa: