Ive tried different versions of cuda drivers to get GPU to work with tensorflow 2.13. It doesnt work with the latest nvidia driver version 12.2.
I was able find cuda 11.8, python 3.10 to recognize the GPU, but nvidia-smi say cuda 11.8 and nvidia driver 12.2. So the python script gave an error.
Googling i found that cuda 11.8 goes with nvidia-driver 520. I had installed default which was 535. When I installed 520 it gave an error. So I uninstalled the driver and rebooted. Ubuntu didnt restart.
Im only able to reboot choosing an older kernel. So I have 2 questions:
- which nvidia-driver can I install with cuda 11.8 that will work on ubuntu 22.04?
- How can I recover my kernel? I think the latest kernel is 6.32-generic. I have previously recovered the kernel by uninstalling the nvidia-drivers. But that didnt work this time. I suspect the error I got installing 520 has corrupted something else in the kernel.
Edit: Answer to question 2: I recovered the kernel by running
sudo ubuntu-drivers autoinstall
after uninstalling previous drivers (even though that failed)
More info to question 1: nvidia-smi gives nvidia-smi 535.104.05 CUDA version 12.2 nvcc --version release v11.8
But this gives error when running a python script with tensorflow 2.13:
Could not load library libcublasLt.so.12. Error: libcublastLs.so.12 cannot open shared object fiel: No such file or directory.
So it seems cuda 11.8 cannot run with latest nvidia-driver 535, which is cuda-smi 12.2. So it seems to me its needed to downgrade the nvidia-driver but 520 will crach ubuntu 22.04. Any idea what can work with tensorflow 2.13?
Edit 2: "driver version 520.61. 05 should be compatible with CUDA 11.8. Also according to this documentation driver version 525 is not compatible with CUDA 11.8. Package: cuda-runtime-11-8 Version: 11.8." -https://forums.developer.nvidia.com/t/ubuntu-cuda-11-8-package-wrong-dependency-on-cuda-drivers/238891
So it seems to me that tensorflow 2.13 doesnt work with gpu on ubuntu 22.04. Because, cuda 12.2 doesnt work with tensorflow, and cuda 11.8 works with tensorflow (and GPU) but cuda 11.8 requires nvidia-520 which doesnt work (it crashes) ubuntu 22.04.
PyTorch works. Would be good if gpu acceleration could be fixed for tensorflow too.