0

I have downloaded and installed CUDA several times, and every time it fails to pass the test samples deviceQuery and checkBandwidth. Also tensorflow is never listing the GPU between accessible devices, only the CPU.

My current nvidia driver is 384.111, where as the upgraded version 384.130 always generates a library mismatch on nvidia-smi and makes ubuntu unbootable.

Every time I try to install CUDA 9.0 with the .run file, which is the only way to install it without upgrading the nvidia drivers, it finishes with an "incomplete install message.". Runs on the tests are always negative, with the following output:

enter image description here

Installing CUDA 9.0 .deb with dpkg from the nvidia website https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1710&target_type=deblocal it also upgrades the nvidia driver.

How can I install CUDA 9.0 for Ubuntu 17.10, with nvidia 384.111 without upgrading to 384.130, so that it correctly performs on the sample tests and allows tensorflow-gpu to access the graphic-card?

PS: Whenever I say "it fails", the error message is always "UNKNOWN ERROR"

The graphics card in my system is a NVIDIA GeForce GTX 1080

hirschme
  • 570
  • 1
    CUDA 9.2 which is the most recent installs the 396.26 drivers automatically. You might need to test and see if those drivers will work for you. See: http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1710/x86_64/ for all the files. – Terrance May 31 '18 at 22:49
  • After installing CUDA 9.2 it installs 396.26, but nvidia-smi outputs Failed to initialize NVML: Driver/library version mismatch. Do I really need to reboot the computer? Because whenever I do and it fails, its a nightmare to reset to the old drivers back... – hirschme May 31 '18 at 22:56
  • Have you tried setting the nomodeset for the kernel line in grub? See https://askubuntu.com/a/747429/231142 as it is only needed for the installation of the NVIDIA drivers and looks like it can be removed after the installation is completed. – Terrance May 31 '18 at 23:00
  • I also just updated this answer for the installation of CUDA 9.2 in Ubuntu 17.10 https://askubuntu.com/a/1025949/231142 – Terrance May 31 '18 at 23:01
  • @Terrance I appreciate the help, but the only driver that does not destroy my system is currently 384.111 .. I installed CUDA 9.2 with nvidia 396.26 but had to deal the past 20 minutes fixing it from a error-spammed tty. The nomodeset solution is incomprehensible to me, and I can't boot ubuntu from disc/USB, this has been a stable system for years until last night, when an automatic upgrade changed my nvidia driver – hirschme May 31 '18 at 23:34
  • That was meant for you to edit the line during boot up of your system since the OS is already installed. There is no need to boot to the USB as that is pointed at the first time install. If you can't get the newer drivers to work the new CUDA will not work. Unfortunately you can get stuck with installing the nvidia-cuda-toolkit that installs version 7.5 I believe but as far as I know that works with all NVIDIA drivers in that version of Ubuntu. – Terrance May 31 '18 at 23:53
  • Which nvidia graphics card is there ? (or which chipset on your notebook?) – dschinn1001 Jun 01 '18 at 00:09
  • @ Terrance as long as it works, I do not mind using an older version.. Ill try then with cuda 7.5 ? @dschinn1001 I updated the question to include that information. Its a geforce 1080 – hirschme Jun 01 '18 at 00:14
  • @Terrance versions of CUDA < 9.0 are only for ubuntu versions <17.10 (CUDA 8 --> ubuntu 16.04 ; CUDA 7.5 --> ubuntu 15, 14...) Can I even install / run these versions on 17.10? – hirschme Jun 01 '18 at 00:16
  • Cuda 7.5 gets installed with the nvidia-cuda-toolkit in 16.04. The toolkit is what they have tested with the stable release. Have you even tried to install it? Maybe it will install a newer version in 17.10. I wouldn't know since I won't run anything but a LTS version of Ubuntu. – Terrance Jun 01 '18 at 00:19
  • @Terrance installing cuda-repo-ubuntu1604-8-0-local-ga2_8.0.61-1_amd64.deb ALSO upgrades the nvidia drivers to 396.26 ... – hirschme Jun 01 '18 at 00:26
  • Ignore any 3rd party repositories and just run sudo apt install nvidia-cuda-toolkit It will not install the NVIDIA driver with it. That application installs from the main Ubuntu repositories. When I ran it on my 16.04 installation it installs CUDA 7.5 and did not touch my NVIDIA video drivers at all. – Terrance Jun 01 '18 at 00:56
  • @Terrance yes that did not install anything else, but not even CUDA. It only installed the compilation toolkit right? There is no cuda in /usr/local/ and no way of verifying that CUDA works. How should I continue? – hirschme Jun 01 '18 at 01:07
  • nvcc -V is how you verify – Terrance Jun 01 '18 at 01:08
  • nvcc -V has always been working, showing Cuda compilation tools release 7.5, or 8.0, or 9.0 or 9.2, depending on what I installed. Sadly always the CUDA sample tests failed, and was never able to run tensorflow with the GPU – hirschme Jun 01 '18 at 01:11
  • 1
    I am out of ideas. You have not done the nomodeset for your kernel for installing the NVIDIA drivers. So, without doing that we cannot test if any new driver will work with your system. I don't run non-LTS releases of Ubuntu so I have never had to run a nomodeset on my kernel to install the driver. But, that could also be because I don't run UEFI either on my system. Maybe try this installation. https://ubuntuforums.org/showthread.php?t=1613132 – Terrance Jun 01 '18 at 01:27
  • @Terrance I added nomodeset to the grub settings (so it does not use the graphic card while booting I suppose? Not sure what this helps, if I am installing the drivers from the X terminal, as many tutorials say). I am purging the drivers and installing new ones, but nothing changes. Thank you for the time nevertheless – hirschme Jun 01 '18 at 02:05

1 Answers1

1

I too have gone thru similar struggles. After trying to install CUDA 9.0, 9.1, 9.2 I found each toolkit requires a specific Nvidia driver version.

The official Nvidia CUDA installation guide calls for you to uninstall your Nvidia drivers. I think it's unavoidable if you want to do a local machine install unless you use Docker + Nvidia Docker.

This will allow your local machine to keep the same Nvidia drivers, and you install your specific CUDA toolkit in different container images.

This is the approach I went with.

Kinman
  • 226