nvidia-smi stops working after reboot ununtu 18.04

Question

I have Ubuntu 18.04 installed on ASUS laptop with GEFORCE 940MX GPU card. I have tried everything using proprietary drivers or using NVIDIA run file to install cuda drivers. Finally, I was able to install NVIDIA and CUDA drivers using cuda run file. OpenGL and NVIDIA-X-config were not installed during this installation. Also, secure boot is disabled prior to installation.

Now, nvidia-smi works after this installation, but whenever I reboot system it gives error: "NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running."

It will be really helpful if experts can comment how to overcome this issue. Thanks in advance

Serge Rogatch · Answer 1 · 2021-09-29T08:49:25.630

I solved this problem by installing DKMS first, then installing the NVIDIA driver within DKMS, so that NVIDIA kernel module is rebuilt upon a Linux Kernel update during the reboot. More specifically:

sudo apt-get install -y dkms libglvnd-dev
# Assume you want the driver from CUDA 11.4.2
wget https://developer.download.nvidia.com/compute/cuda/11.4.2/local_installers/cuda_11.4.2_470.57.02_linux.run
sudo sh cuda_11.4.2_470.57.02_linux.run --extract=$(pwd)/cuda_11.4.2
cd cuda_11.4.2
sudo ./NVIDIA-Linux-x86_64-470.57.02.run --dkms

In the text GUI that the driver installation process shows you will need to press YES for DKMS again.

There is a caveat though: NVIDIA kernel module sources must be compiled with the same gcc/g++ versions as the ones used to compile the kernel of your OS. For Ubuntu 20.04 it's gcc-9. If you have a different gcc version of the default, CC=gcc-9 CXX=g++-9 sudo ./NVIDIA-Linux-x86_64-470.57.02.run --dkms doesn't work because the driver seems to use a different environment for compiling the sources. So I just replaced the gcc and g++ symlinks in my system from gcc-7/g++-7 to gcc-9/g++-9:

sudo apt-get install gcc-9 g++-9
cd /usr/bin
sudo unlink g++
sudo ln -s x86_64-linux-gnu-g++-9 g++
sudo unlink gcc
sudo ln -s x86_64-linux-gnu-gcc-9 gcc

score -1 · Answer 2 · answered Jun 16 '19 at 04:57

-1

This is most probably the nvidia graphics compatibility problem, so you shouldn't use the .run installer. Use the driver from ubuntu graphics ppa, then download the .deb and install cuda-toolkit.

answered Jun 16 '19 at 04:57

Mike Chen

1

This answer is suboptimal. I had to use the run-installer to prevent the installation of "drm" (direct rendering manager) which apparantly allows the intel onboard graphics card to be used by the graphical user interface (gdm? lightdm? xorg?). The GPU memory should be untouched by the GUI since the NVIDIA GPU is used for deep-learning and is needed in it's full capacity. Therefore I had to install it like this sudo ./NVIDIA-Linux-x86_64-430.40.run --no-open-gl-files --no-drm – mcExchange Sep 06 '19 at 16:17

nvidia-smi stops working after reboot ununtu 18.04

2 Answers2

Linked