1

What I need

I need to use tensorflow and train my networks with the GPU. I installed tensorflow gpu using anaconda on a new env using conda create --name tf_gpu tensorflow-gpu. That should correctly install CUDA. However, although Tensorflow was correctly using CUDA, it cannot see the GPU:

assert tf.test.is_built_with_cuda() # True
assert tf.test.is_gpu_available()   # False
print(device_lib.list_local_devices())
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 8659538338150116047
, name: "/device:XLA_CPU:0"
device_type: "XLA_CPU"
memory_limit: 17179869184
locality {
}
incarnation: 3957650727733291855
physical_device_desc: "device: XLA_CPU device"
]

Another thing is that in ¨display¨ the screen appears as ¨unknown device¨ and my max resolution is 1024x768.


What I do

My drivers for NVIDIA the generic X.Org, so I try to install nvidia's proprietary drivers. I open the following image:

enter image description here

So I select the button for using nvidia-driver-430.

The problem is that after re-booting I end up with the problem described here: clean, n/n files, n/n blocks

I correctly fix this by running sudo apt-get purge nvidia* on recovery mode. However, I end up with the drivers not being installed again. Running sudo ubuntu-drivers autoinstall gets the problem again.

How can I install them without a problem?


Hardware and OS

So I have my new PC for work on which I will run tensorflow networks for GPU. I am using Ubuntu 18.04.3 LTS with a Nvidia MSI GeForce RTX 2080 TI GAMING X TRIO


Debugging

If I try ubuntu-drivers devices I get:

== /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0 ==
modalias : pci:v000010DEd00001E07sv00001462sd00003715bc03sc00i00
vendor   : NVIDIA Corporation
driver   : nvidia-driver-430 - distro non-free recommended
driver   : xserver-xorg-video-nouveau - distro free builtin

However, according to this I should also be getting the model (geforce 2080 rtx ti) but I don´t.


I installed windows to see if it was a problem with Ubuntu. However in windows is also not working properly. I have the following on "Device Manager":

enter image description here

I went to the official Windows support and do the following: If I try to update driver it says the correct driver is already installed. If I try to uninstall and then "scan for hardware changes" the OS crashes and reboots.


I found a Nvidia NVS 315 in my office so I plugged it to see if it worked. I booted in windows and it's now working as charm.

  • Disable Secure Boot. – Pilot6 Sep 05 '19 at 12:05
  • How do I do that? Isn't that supposed to be a Windows thing? I guessed it was from the BIOS. I changed the option Windows 8/10 Features to Òther OS. However it didn't worked. Stucked again as usual. I have GIGABYTE Z390 AORUS PRO WIFI motherboard. – Agustin Barrachina Sep 05 '19 at 12:22
  • Yes, it is in BIOS, boot section. – Pilot6 Sep 05 '19 at 12:23
  • Ok, so I just found that as my "CSM Support" was enabled, that means automatically that Secure Boot is disabled. Any other ideas? – Agustin Barrachina Sep 05 '19 at 12:40
  • But then did you install in UEFI boot mode? Should work whether UEFI or BIOS/CSM/Legacy, but most work now is done on UEFI systems. Did you totally purge old nVidia driver before installing a new one? Otherwise conflicts. Did you add ppa to get newest nVidia driver? https://askubuntu.com/questions/813676/installing-ubuntu-mate-with-dual-boot-option-on-windows-10-usb-booting-not-hap/814413#814413 – oldfred Sep 05 '19 at 14:42
  • I run the command sudo apt-get purge nvidia* to be able to boot again so I believe I have. I believe my boot mode is UEFI. I didn´t add the ppa because I did it from the GUI. You think I should try to install them via de command line? – Agustin Barrachina Sep 09 '19 at 09:14

1 Answers1

0

So the problem was either a driver/OS problem or hardware.

For that I tried it on Windows. As the board didn't work on windows I concluded it was a hardware problem.

Now it can be either the motherboard or the GPU, so I tried another Nvidia GPU and worked so the most likely conclusion is that my GPU (GeForce 2080) was faulty.