1

I've been trying all day to have this (v100) GPU working on a new ubuntu VM. I tried installing the drivers and rebooting and also purging/uninstalling everything to do with nvidia but none of these things seem to work.

In particular I ran this specifically:

apt update;
apt install build-essential;

sudo add-apt-repository ppa:graphics-drivers sudo apt install ubuntu-drivers-common ubuntu-drivers devices sudo apt-get install nvidia-driver-460 sudo reboot now

Then sometimes it seems that nvidia-smi is working (as of the writing of this question it wasn't so I wasn't able to copy paste what is said when it works) but when it doesn't work it says this:

(synthesis) miranda9@miranda9:~$ nvidia-smi
Unable to determine the device handle for GPU 0000:00:06.0: Unknown Error

any help is appreciated.

Note I also do not have access to the VMs vmx file so this question and answers are useless/meaningless to me: https://forums.developer.nvidia.com/t/nvidia-smi-reports-unable-to-determine-the-device-handle-for-gpu/46835

In addition I have tried to uninstall everything from nivida and re-install it with:

sudo apt-get --purge remove "*nvidia*"
sudo /usr/bin/nvidia-uninstall

then

apt update;
apt install build-essential;

sudo add-apt-repository ppa:graphics-drivers sudo apt install ubuntu-drivers-common ubuntu-drivers devices sudo apt-get install nvidia-driver-460 sudo reboot now

but that doesnt seem to work


More info in case it helps:

(synthesis) miranda9@miranda9:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.2 LTS
Release:        20.04
Codename:       focal

also:

(synthesis) miranda9@miranda9:~$ python
Python 3.9.5 (default, Jun  4 2021, 12:28:51) 
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
/home/miranda9/miniconda3/envs/synthesis/lib/python3.9/site-packages/torch/cuda/__init__.py:52: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 101: invalid device ordinal (Triggered internally at  /opt/conda/conda-bld/pytorch_1623448238472/work/c10/cuda/CUDAFunctions.cpp:115.)
  return torch._C._cuda_getDeviceCount() > 0
False

As requested by comment:

# lspci
00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)
00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II]
00:01.2 USB controller: Intel Corporation 82371SB PIIX3 USB [Natoma/Triton II] (rev 01)
00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 01)
00:02.0 VGA compatible controller: Cirrus Logic GD 5446
00:03.0 SCSI storage controller: XenSource, Inc. Xen Platform Device (rev 01)
00:05.0 System peripheral: XenSource, Inc. Citrix XenServer PCI Device for Windows Update (rev 01)
00:06.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 PCIe 16GB] (rev a1)

another vm:

$ lspci
00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)
00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II]
00:01.2 USB controller: Intel Corporation 82371SB PIIX3 USB [Natoma/Triton II] (rev 01)
00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 01)
00:02.0 VGA compatible controller: Cirrus Logic GD 5446
00:03.0 SCSI storage controller: XenSource, Inc. Xen Platform Device (rev 01)
00:05.0 System peripheral: XenSource, Inc. Citrix XenServer PCI Device for Windows Update (rev 01)
00:06.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 PCIe 16GB] (rev a1)

Resources I've search for help:

Charlie Parker
  • 425
  • 1
  • 5
  • 11
  • 1
    In a VM the hardware is virtualized. You aren't using the real Nvidia GPU, the host OS is. – ChanganAuto Jul 19 '21 at 22:29
  • Take a look at Google results of nvidia virtual machine gpu passthru – ubfan1 Jul 19 '21 at 22:33
  • @ubfan1 just to make sure I look in the right place. I need to google passthru not passthrough? e.g. google nvidia virtual machine gpu passthru - right? – Charlie Parker Jul 19 '21 at 22:36
  • "passthru" came up as an early choice as I started typing, so I selected that. My GPU's too old for that to work for me, so I didn't check much further. – ubfan1 Jul 20 '21 at 00:38
  • care to elaborate the downvotes? – Charlie Parker Jul 20 '21 at 13:44
  • @NateT yes I am happy to. See update to question. However, my suspicion is that just removing everything from nvidia and then re-installing it with a reboot should work but my attempts to do that fail. – Charlie Parker Jul 28 '21 at 20:21
  • 1
    You need to use PCIe passthrough and 2 phycical gpu's in your computer to make this work. You also need a second monitor connected to the second gpu. For the practical commands and so, try using https://pve.proxmox.com/wiki/PCI(e)_Passthrough – Irsu85 Jul 31 '21 at 11:22
  • What image did you use for the VM. As in full image name? Downvotes are probably because a VM doesn't have a GPU. I assume that you mean "how to get vm to use host GPU" ? Btw wasn't me I only dv in extreme situations. I'm too poor. XD – Nate T Aug 01 '21 at 03:56

1 Answers1

0

A virtual machine emulates a graphic card, so it should be transparent for the guest system which native card you have on your host system. VMs are for "sharing" resources - as opposed to a real system that has access to its hardware directly. So it will not make sense to install Nvidia drivers on a host system. You can check this out by checking your current drivers in your VM:

inxi -G

(executed in a terminal) will show you a VM/oracle driver, not your native card.

Getting a hi performance graphic output may be reached with tweaks and tricks, but VMs are not meant for work like this....

kanehekili
  • 6,402
  • hi, thanks for the response, it was informative! I do not have access to the host system. I request a VM and I get a VM to use. I can be sudo in it but I am in the VM of course. Why do you think the way I am installing the drivers is not working? What exactly is going wrong in your opinion? – Charlie Parker Aug 03 '21 at 22:17
  • OK, so the VM is on a remote host. What does inxi -G say on your "remote VM" ? If it does not exist, try with sudo apt install inxi – kanehekili Aug 03 '21 at 22:31