I'm trying to follow https://www.tensorflow.org/install/gpu#ubuntu_1804_cuda_101 to get docker working with GPU on Ubuntu 18.04.4 LTS.
I'll copy the instructions here for reference:
# Add NVIDIA package repositories
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-repo-ubuntu1804_10.1.243-1_amd64.deb
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
sudo dpkg -i cuda-repo-ubuntu1804_10.1.243-1_amd64.deb
sudo apt-get update
wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
sudo apt install ./nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
sudo apt-get update
# Install NVIDIA driver
sudo apt-get install --no-install-recommends nvidia-driver-430
# Reboot. Check that GPUs are visible using the command: nvidia-smi
# Install development and runtime libraries (~4GB)
sudo apt-get install --no-install-recommends \
cuda-10-1 \
libcudnn7=7.6.4.38-1+cuda10.1 \
libcudnn7-dev=7.6.4.38-1+cuda10.1
# Install TensorRT. Requires that libcudnn7 is installed above.
sudo apt-get install -y --no-install-recommends libnvinfer6=6.0.1-1+cuda10.1 \
libnvinfer-dev=6.0.1-1+cuda10.1 \
libnvinfer-plugin6=6.0.1-1+cuda10.1
I made it halfway through the steps above and got an error:
$ sudo apt-get install --no-install-recommends nvidia-driver-430
Reading package lists... Done
Building dependency tree
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:
The following packages have unmet dependencies:
nvidia-driver-430 : Depends: libnvidia-gl-430 (= 430.50-0ubuntu0.18.04.2) but it is not going to be installed
Depends: nvidia-dkms-430 (= 430.50-0ubuntu0.18.04.2)
Depends: nvidia-kernel-source-430 (= 430.50-0ubuntu0.18.04.2) but it is not going to be installed
Depends: libnvidia-decode-430 (= 430.50-0ubuntu0.18.04.2) but it is not going to be installed
Depends: libnvidia-encode-430 (= 430.50-0ubuntu0.18.04.2) but it is not going to be installed
Depends: nvidia-utils-430 (= 430.50-0ubuntu0.18.04.2) but it is not going to be installed
Depends: xserver-xorg-video-nvidia-430 (= 430.50-0ubuntu0.18.04.2) but it is not going to be installed
Depends: libnvidia-cfg1-430 (= 430.50-0ubuntu0.18.04.2) but it is not going to be installed
Depends: libnvidia-ifr1-430 (= 430.50-0ubuntu0.18.04.2) but it is not going to be installed
E: Unable to correct problems, you have held broken packages.
I noticed that I already have an nvidia driver installed, but it is not version 430: what I have in my apt list --installed
includes:
nvidia-384/bionic-updates,now 390.116-0ubuntu0.18.04.3 amd64 [installed,upgradable to: 418.87.01-0ubuntu1]
nvidia-384-dev/bionic-updates,now 390.116-0ubuntu0.18.04.3 amd64 [installed,upgradable to: 418.87.01-0ubuntu1]
nvidia-common/now 1:0.5.3~ppa3 amd64 [installed,local]
nvidia-compute-utils-390/bionic-updates,now 390.116-0ubuntu0.18.04.3 amd64 [installed]
nvidia-container-toolkit/bionic,now 1.0.5-1 amd64 [installed]
nvidia-dkms-390/bionic-updates,now 390.116-0ubuntu0.18.04.3 amd64 [installed]
nvidia-driver-390/bionic-updates,now 390.116-0ubuntu0.18.04.3 amd64 [installed]
nvidia-headless-390/bionic-updates,now 390.116-0ubuntu0.18.04.3 amd64 [installed]
nvidia-headless-no-dkms-390/bionic-updates,now 390.116-0ubuntu0.18.04.3 amd64 [installed]
nvidia-kernel-common-390/bionic-updates,now 390.116-0ubuntu0.18.04.3 amd64 [installed]
nvidia-kernel-source-390/bionic-updates,now 390.116-0ubuntu0.18.04.3 amd64 [installed]
nvidia-libopencl1-384/bionic-updates,now 390.116-0ubuntu0.18.04.3 amd64 [installed,upgradable to: 418.87.01-0ubuntu1]
nvidia-machine-learning-repo-ubuntu1804/unknown,now 1.0.0-1 amd64 [installed]
nvidia-opencl-icd-384/bionic-updates,now 390.116-0ubuntu0.18.04.3 amd64 [installed,upgradable to: 418.87.01-0ubuntu1]
nvidia-prime/now 0.8.9~ppa3 all [installed,local]
nvidia-settings/unknown,now 440.64.00-0ubuntu1 amd64 [installed]
nvidia-utils-390/bionic-updates,now 390.116-0ubuntu0.18.04.3 amd64 [installed]
Here is what does currently work:
- I'm able to run
nvidia-smi
. It says I haveDriver Version: 390.116
- I have
Docker version 19.03.8, build afacb8b7f0
- My
apt list --installed
includesnvidia-container-toolkit/bionic,now 1.0.5-1 amd64 [installed]
, which I installed following some of the instructions at https://github.com/NVIDIA/nvidia-docker - My
apt list --installed
includescuda-repo-ubuntu1804/unknown,now 10.2.89-1 amd64 [installed]
Here is what does not work:
$ sudo docker run --gpus all nvidia/cuda:10.0-base nvidia-smi
docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: requirement error: unsatisfied condition: cuda>=10.0, please update your driver to a newer version, or use an earlier cuda container\\\\n\\\"\"": unknown.
ERRO[0000] error waiting for container: context canceled
The error says I need cuda>=10.0
, which is why I was trying to follow https://www.tensorflow.org/install/gpu#ubuntu_1804_cuda_101.
What should I do to get sudo docker run --gpus all nvidia/cuda:10.0-base nvidia-smi
working?
Edit: I noticed that https://github.com/NVIDIA/nvidia-docker/wiki/Frequently-Asked-Questions#how-do-i-install-the-nvidia-driver says to install the cuda-drivers
package. I got this error when attempting to install it:
$ sudo apt install cuda-drivers
Reading package lists... Done
Building dependency tree
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:
The following packages have unmet dependencies:
cuda-drivers : Depends: libnvidia-encode-440 (>= 440.64.00) but it is not going to be installed
Depends: libnvidia-fbc1-440 (>= 440.64.00) but it is not going to be installed
Depends: libnvidia-ifr1-440 (>= 440.64.00) but it is not going to be installed
Depends: nvidia-compute-utils-440 (>= 440.64.00) but it is not going to be installed
Depends: nvidia-dkms-440 (>= 440.64.00)
Depends: nvidia-driver-440 (>= 440.64.00) but it is not going to be installed
Depends: xserver-xorg-video-nvidia-440 (>= 440.64.00) but it is not going to be installed
E: Unable to correct problems, you have held broken packages.
Are my apt install
errors (E: Unable to correct problems, you have held broken packages.
) related to the sources I have in my /etc/apt config?
$ rg "cuda" /etc/apt/sources.list.d
/etc/apt/sources.list.d/cuda.list.save
1:deb http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 /
/etc/apt/sources.list.d/cuda.list
1:deb http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 /
$ rg "nvidia" /etc/apt/sources.list.d
/etc/apt/sources.list.d/cuda.list.save
1:deb http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 /
/etc/apt/sources.list.d/nvidia-machine-learning.list
1:deb http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64 /
/etc/apt/sources.list.d/cuda.list
1:deb http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 /
/etc/apt/sources.list.d/nvidia-docker.list.save
1:deb https://nvidia.github.io/libnvidia-container/ubuntu18.04/$(ARCH) /
2:deb https://nvidia.github.io/nvidia-container-runtime/ubuntu18.04/$(ARCH) /
3:deb https://nvidia.github.io/nvidia-docker/ubuntu18.04/$(ARCH) /
/etc/apt/sources.list.d/nvidia-docker.list
1:deb https://nvidia.github.io/libnvidia-container/ubuntu18.04/$(ARCH) /
2:deb https://nvidia.github.io/nvidia-container-runtime/ubuntu18.04/$(ARCH) /
3:deb https://nvidia.github.io/nvidia-docker/ubuntu18.04/$(ARCH) /