0

Ive tried different versions of cuda drivers to get GPU to work with tensorflow 2.13. It doesnt work with the latest nvidia driver version 12.2.

I was able find cuda 11.8, python 3.10 to recognize the GPU, but nvidia-smi say cuda 11.8 and nvidia driver 12.2. So the python script gave an error.

Googling i found that cuda 11.8 goes with nvidia-driver 520. I had installed default which was 535. When I installed 520 it gave an error. So I uninstalled the driver and rebooted. Ubuntu didnt restart.

Im only able to reboot choosing an older kernel. So I have 2 questions:

  1. which nvidia-driver can I install with cuda 11.8 that will work on ubuntu 22.04?
  2. How can I recover my kernel? I think the latest kernel is 6.32-generic. I have previously recovered the kernel by uninstalling the nvidia-drivers. But that didnt work this time. I suspect the error I got installing 520 has corrupted something else in the kernel.

Edit: Answer to question 2: I recovered the kernel by running

sudo ubuntu-drivers autoinstall

after uninstalling previous drivers (even though that failed)

More info to question 1: nvidia-smi gives nvidia-smi 535.104.05 CUDA version 12.2 nvcc --version release v11.8

But this gives error when running a python script with tensorflow 2.13:

Could not load library libcublasLt.so.12. Error: libcublastLs.so.12 cannot open shared object fiel: No such file or directory.

So it seems cuda 11.8 cannot run with latest nvidia-driver 535, which is cuda-smi 12.2. So it seems to me its needed to downgrade the nvidia-driver but 520 will crach ubuntu 22.04. Any idea what can work with tensorflow 2.13?

Edit 2: "driver version 520.61. 05 should be compatible with CUDA 11.8. Also according to this documentation driver version 525 is not compatible with CUDA 11.8. Package: cuda-runtime-11-8 Version: 11.8." -https://forums.developer.nvidia.com/t/ubuntu-cuda-11-8-package-wrong-dependency-on-cuda-drivers/238891

So it seems to me that tensorflow 2.13 doesnt work with gpu on ubuntu 22.04. Because, cuda 12.2 doesnt work with tensorflow, and cuda 11.8 works with tensorflow (and GPU) but cuda 11.8 requires nvidia-520 which doesnt work (it crashes) ubuntu 22.04.

PyTorch works. Would be good if gpu acceleration could be fixed for tensorflow too.

  • 2
    You mean kernel 6.2.0-31? Set your 22.04 system up with the latest Nvidia drivers from the standard repos (535.86.05), then reject any offer of video drivers from the CUDA install. CUDA and video driver are pretty independent, the Nvidia hardware limits driver and CUDA release. See answers 1077061, 1219761 for various methods for CUDA installs that don't mess up a running system. – ubfan1 Sep 02 '23 at 21:30
  • 2
  • @ubfan1 see updated comments to question. Using latest nvidia-driver with cuda 11.8 and tensorflow 2.13 gives an error in the python script with error "Could not load libcublastLs.so.12... No such file or directory" – Endre Moen Sep 03 '23 at 13:38
  • 1
    libcublasLt.so.11, not ...12 is the version supplied by CUDA 11.8. ....12 is in CUDA 12.1+. What is asking for ...12? Ubuntu 22.04 standard repos still have 535.86.05 as their latest release, you must have added the grephics-drivers PPA to get ...104.05 -- Unless you have a need for a specific Nvidia release, I use the standard repo tested one (not the ...open one which causes some script problems determining a version. My CUDA 11.3 demos run fine with Nvidia 535.86.05, haven't run tensorflow in a long time though. – ubfan1 Sep 03 '23 at 15:44
  • I dont understand... running nvcc --version gives 11.8. I ran "sudo apt-get remove --purge nvidia-" prior to sudo apt-get install cuda-11.8 – Endre Moen Sep 03 '23 at 16:48
  • @ubfan1 - Looking in /usr/local/cuda/lib64 - there is no libcublasLt.so.12 - only libcublasLt.so libcublasLt.so.11 libcublasLt.so.11.11.3.6 libcublasLt_static.a - so why is TensorFlow 2.13 giving error: "Could not load libcublastLs.so.12..." ? – Endre Moen Sep 04 '23 at 20:02
  • 1
    Carefully check the version requirements on tensorflow, cudnn, and cuda for compatibility. See some (mostly old) info at https://stackoverflow.com/questions/50622525/which-tensorflow-and-cuda-version-combinations-are-compatible I don't use the Ubuntu cuda packages since I cannot pick a specific version, I do not let any CUDA installer touch my working Nvidia driver, and I use the cuda/bin and cuda/lib instead of system areas for installer overrides. My oldest CUDA has always worked with my newest Nvidia driver. – ubfan1 Sep 05 '23 at 00:40

0 Answers0