2

I have upgraded to the latest (beta) Nvidia driver - nvidia-381 due to an issue with the previous driver. I had a problem with window edges after waking from suspend - see here.

For this reason I upgraded to the newer driver, from 375.39 to 381.09.

Since upgrading, I have had to reinstall Nvidia's Cuda Toolkit 8.0 (and CUDNN v5.1), however there seems to be a driver file missing, which prevents me from installing both Tensorflow and the gputools package in R, which build upon the Cuda Toolkit, which in turn needs the missing libcuda.so.1 file. Neither Tensorflow nor gputools are able to locate the file: libcuda.so.1. With the previous driver I was able to install the Cuda Toolkit without issues.

Here is a similar issue, but with older drivers involved: https://github.com/tensorflow/tensorflow/issues/4078

I have read that I could possibly create this file as it is a symlink, however I would prefer not to, as I do not know what other dependencies exist. Example of possible workaround: https://stackoverflow.com/questions/41890549/tensorflow-cannot-open-libcuda-so-1

I am running Ubuntu 16.04. I have also posted this question on the Ubuntu launchpad

Questions:

  1. Can somebody see why this file is missing or propose a stable solution?
  2. If I am going to have to change my driver - which is the best way to downgrade and how to I decide which to downgrade to?

Extra info:

  1. If I search for the missing file on my system, I find the following similar files, but not the one I need:

    user@user $ tree / -fiC | grep libcuda.so
        /usr/local/cuda-8.0/doc/man/man7/libcuda.so.7
        /usr/local/cuda-8.0/lib64/stubs/libcuda.so
        /usr/share/man/man7/libcuda.so.7
    
  2. If I look to see what the Nvidia driver would like to uninstall, should I use the given uninstallation script, then we see that it wasn't aware of the libcuda.so.1 file at installation, hence it isn't in this script:

    user@user $ /usr/local/cuda-8.0/bin$ cat 
    
    .uninstall_manifest_do_not_delete.txt | grep libcuda.so
    file:/usr/share/man/man7/libcuda.so.7:5708adf9bb3c591eb4f1d0d50e78f3df
    file:/usr/local/cuda-8.0/lib64/stubs/libcuda.so:8347cb2f5500934b1942ba42f3979fac
    file:/usr/local/cuda-8.0/doc/man/man7/libcuda.so.7:5708adf9bb3c591eb4f1d0d50e78f3df
    
  3. As there is the stub of the libcuda.so.1 file (seen in output above), I created the missing symlink to that file:

    user@user $ sudo ln -s /usr/local/cuda-8.0/lib64/stubs/libcuda.so /usr/local/cuda-8.0/lib64/libcuda.so.1
    

    This actually allowed the gputools package in R to be successfully installed, however the functions that call upon the GPU failed:

    R> gpuMatMult(matA, matB)
    Error in gpuMatMult(matA, matB) : device memory allocation failed Calls: gpuMatMult -> .Call
    
  4. Using the deviceQuery utility that is bundled into the samples of the Cuda Toolkit (you have to first sudo make it), I see there is definitely something wrong, which Cuda notices itself:

    user@user $ /usr/local/cuda/samples/1_Utilities/deviceQuery$ ./deviceQuery
    
        ./deviceQuery Starting...
    
        CUDA Device Query (Runtime API) version (CUDART static linking)
    
        cudaGetDeviceCount returned 35
        -> CUDA driver version is insufficient for CUDA runtime version
        Result = FAIL
    

Current status:

I have downgraded to the previous driver I knew to work with the Cuda Toolkit 8.0, CUDNN 5.1 - nvidia-378.13. The tools that use Cuda and CUDNN are also now working fine as before, e.g. tensorflow, gputools (in R), etc.

Everything is working just as expected, including the bug showing pixelated window edges after waking from suspend.

n1k31t4
  • 283
  • For anybody who finds this, the open source nvidia-381 driver doesn't include the libcuda.so.1 file that is needed by both tensorflow and theano. I spent hours trying to figure it out and sudo apt install nvidia-378 along with a reboot got libcuda.so.1 installed and still worked with my 1080 ti. – Chris Apr 24 '17 at 04:06
  • Thanks for confirming the problem, @Chris - you're solution (workaround) is just the same as mine though. There has not yet been a response on Ubuntu Launchpad problem tracker. – n1k31t4 Apr 25 '17 at 17:12

0 Answers0