1

I'm trying to follow https://www.tensorflow.org/install/gpu#ubuntu_1804_cuda_101 to get docker working with GPU on Ubuntu 18.04.4 LTS.

I'll copy the instructions here for reference:

# Add NVIDIA package repositories
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-repo-ubuntu1804_10.1.243-1_amd64.deb
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
sudo dpkg -i cuda-repo-ubuntu1804_10.1.243-1_amd64.deb
sudo apt-get update
wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
sudo apt install ./nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
sudo apt-get update

# Install NVIDIA driver
sudo apt-get install --no-install-recommends nvidia-driver-430
# Reboot. Check that GPUs are visible using the command: nvidia-smi

# Install development and runtime libraries (~4GB)
sudo apt-get install --no-install-recommends \
    cuda-10-1 \
    libcudnn7=7.6.4.38-1+cuda10.1  \
    libcudnn7-dev=7.6.4.38-1+cuda10.1


# Install TensorRT. Requires that libcudnn7 is installed above.
sudo apt-get install -y --no-install-recommends libnvinfer6=6.0.1-1+cuda10.1 \
    libnvinfer-dev=6.0.1-1+cuda10.1 \
    libnvinfer-plugin6=6.0.1-1+cuda10.1

I made it halfway through the steps above and got an error:

$ sudo apt-get install --no-install-recommends nvidia-driver-430
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 nvidia-driver-430 : Depends: libnvidia-gl-430 (= 430.50-0ubuntu0.18.04.2) but it is not going to be installed
                     Depends: nvidia-dkms-430 (= 430.50-0ubuntu0.18.04.2)
                     Depends: nvidia-kernel-source-430 (= 430.50-0ubuntu0.18.04.2) but it is not going to be installed
                     Depends: libnvidia-decode-430 (= 430.50-0ubuntu0.18.04.2) but it is not going to be installed
                     Depends: libnvidia-encode-430 (= 430.50-0ubuntu0.18.04.2) but it is not going to be installed
                     Depends: nvidia-utils-430 (= 430.50-0ubuntu0.18.04.2) but it is not going to be installed
                     Depends: xserver-xorg-video-nvidia-430 (= 430.50-0ubuntu0.18.04.2) but it is not going to be installed
                     Depends: libnvidia-cfg1-430 (= 430.50-0ubuntu0.18.04.2) but it is not going to be installed
                     Depends: libnvidia-ifr1-430 (= 430.50-0ubuntu0.18.04.2) but it is not going to be installed
E: Unable to correct problems, you have held broken packages.

I noticed that I already have an nvidia driver installed, but it is not version 430: what I have in my apt list --installed includes:

nvidia-384/bionic-updates,now 390.116-0ubuntu0.18.04.3 amd64 [installed,upgradable to: 418.87.01-0ubuntu1]
nvidia-384-dev/bionic-updates,now 390.116-0ubuntu0.18.04.3 amd64 [installed,upgradable to: 418.87.01-0ubuntu1]
nvidia-common/now 1:0.5.3~ppa3 amd64 [installed,local]
nvidia-compute-utils-390/bionic-updates,now 390.116-0ubuntu0.18.04.3 amd64 [installed]
nvidia-container-toolkit/bionic,now 1.0.5-1 amd64 [installed]
nvidia-dkms-390/bionic-updates,now 390.116-0ubuntu0.18.04.3 amd64 [installed]
nvidia-driver-390/bionic-updates,now 390.116-0ubuntu0.18.04.3 amd64 [installed]
nvidia-headless-390/bionic-updates,now 390.116-0ubuntu0.18.04.3 amd64 [installed]
nvidia-headless-no-dkms-390/bionic-updates,now 390.116-0ubuntu0.18.04.3 amd64 [installed]
nvidia-kernel-common-390/bionic-updates,now 390.116-0ubuntu0.18.04.3 amd64 [installed]
nvidia-kernel-source-390/bionic-updates,now 390.116-0ubuntu0.18.04.3 amd64 [installed]
nvidia-libopencl1-384/bionic-updates,now 390.116-0ubuntu0.18.04.3 amd64 [installed,upgradable to: 418.87.01-0ubuntu1]
nvidia-machine-learning-repo-ubuntu1804/unknown,now 1.0.0-1 amd64 [installed]
nvidia-opencl-icd-384/bionic-updates,now 390.116-0ubuntu0.18.04.3 amd64 [installed,upgradable to: 418.87.01-0ubuntu1]
nvidia-prime/now 0.8.9~ppa3 all [installed,local]
nvidia-settings/unknown,now 440.64.00-0ubuntu1 amd64 [installed]
nvidia-utils-390/bionic-updates,now 390.116-0ubuntu0.18.04.3 amd64 [installed]

Here is what does currently work:

  • I'm able to run nvidia-smi. It says I have Driver Version: 390.116
  • I have Docker version 19.03.8, build afacb8b7f0
  • My apt list --installed includes nvidia-container-toolkit/bionic,now 1.0.5-1 amd64 [installed], which I installed following some of the instructions at https://github.com/NVIDIA/nvidia-docker
  • My apt list --installed includes cuda-repo-ubuntu1804/unknown,now 10.2.89-1 amd64 [installed]

Here is what does not work:

$ sudo docker run --gpus all nvidia/cuda:10.0-base nvidia-smi
docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: requirement error: unsatisfied condition: cuda>=10.0, please update your driver to a newer version, or use an earlier cuda container\\\\n\\\"\"": unknown.
ERRO[0000] error waiting for container: context canceled 

The error says I need cuda>=10.0, which is why I was trying to follow https://www.tensorflow.org/install/gpu#ubuntu_1804_cuda_101.

What should I do to get sudo docker run --gpus all nvidia/cuda:10.0-base nvidia-smi working?


Edit: I noticed that https://github.com/NVIDIA/nvidia-docker/wiki/Frequently-Asked-Questions#how-do-i-install-the-nvidia-driver says to install the cuda-drivers package. I got this error when attempting to install it:

$ sudo apt install cuda-drivers
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 cuda-drivers : Depends: libnvidia-encode-440 (>= 440.64.00) but it is not going to be installed
                Depends: libnvidia-fbc1-440 (>= 440.64.00) but it is not going to be installed
                Depends: libnvidia-ifr1-440 (>= 440.64.00) but it is not going to be installed
                Depends: nvidia-compute-utils-440 (>= 440.64.00) but it is not going to be installed
                Depends: nvidia-dkms-440 (>= 440.64.00)
                Depends: nvidia-driver-440 (>= 440.64.00) but it is not going to be installed
                Depends: xserver-xorg-video-nvidia-440 (>= 440.64.00) but it is not going to be installed
E: Unable to correct problems, you have held broken packages.

Are my apt install errors (E: Unable to correct problems, you have held broken packages.) related to the sources I have in my /etc/apt config?

$ rg "cuda" /etc/apt/sources.list.d
/etc/apt/sources.list.d/cuda.list.save
1:deb http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 /

/etc/apt/sources.list.d/cuda.list
1:deb http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 /

$ rg "nvidia" /etc/apt/sources.list.d
/etc/apt/sources.list.d/cuda.list.save
1:deb http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 /

/etc/apt/sources.list.d/nvidia-machine-learning.list
1:deb http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64 /

/etc/apt/sources.list.d/cuda.list
1:deb http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 /

/etc/apt/sources.list.d/nvidia-docker.list.save
1:deb https://nvidia.github.io/libnvidia-container/ubuntu18.04/$(ARCH) /
2:deb https://nvidia.github.io/nvidia-container-runtime/ubuntu18.04/$(ARCH) /
3:deb https://nvidia.github.io/nvidia-docker/ubuntu18.04/$(ARCH) /

/etc/apt/sources.list.d/nvidia-docker.list
1:deb https://nvidia.github.io/libnvidia-container/ubuntu18.04/$(ARCH) /
2:deb https://nvidia.github.io/nvidia-container-runtime/ubuntu18.04/$(ARCH) /
3:deb https://nvidia.github.io/nvidia-docker/ubuntu18.04/$(ARCH) /
Adrian
  • 373
  • 4
  • 12
  • I'm having a look at https://askubuntu.com/questions/140246/how-do-i-resolve-unmet-dependencies-after-adding-a-ppa – Adrian Apr 23 '20 at 01:24

1 Answers1

1

I was able to get sudo docker run --gpus all nvidia/cuda:10.0-base nvidia-smi working.

I had to first run

$ sudo apt-get install libnvidia-compute-430
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following packages will be REMOVED:
  libnvidia-compute-390 libnvidia-decode-390 libnvidia-encode-390 nvidia-384 nvidia-384-dev nvidia-compute-utils-390 nvidia-driver-390 nvidia-headless-390
  nvidia-headless-no-dkms-390 nvidia-libopencl1-384 nvidia-opencl-icd-384 nvidia-utils-390
The following NEW packages will be installed:
  libnvidia-compute-430
0 upgraded, 1 newly installed, 12 to remove and 15 not upgraded.
Need to get 20.2 MB of archives.
After this operation, 13.0 MB of additional disk space will be used.

After that, I was able to run sudo apt-get install nvidia-driver-430 (the https://www.tensorflow.org/install/gpu#ubuntu_1804_cuda_101 step on which I was originally blocked).

My nvidia-smi now says NVIDIA-SMI 430.50 Driver Version: 430.50 CUDA Version: 10.1.

Adrian
  • 373
  • 4
  • 12