3

I've tried to install CUDA on three different VMs but have been unsuccessful in getting it to recognize my GPU.

I am using an Azure VM (Standard NV6) with an M60 GPU.

With a fresh VM I run the following commands taken from this guide:

wget https://developer.nvidia.com/compute/cuda/8.0/Prod2/local_installers/cuda-repo-ubuntu1404-8-0-local-ga2_8.0.61-1_amd64-deb

sudo dpkg -i cuda-repo-ubuntu1604-8-0-local_8.0.44-1_amd64-deb
sudo apt-get update
sudo apt-get install -y cuda

It appears to run successful and doesn't indicate that there were any problems. But when I run

nvidia-smi

I receive the following:

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running

I have tried with 16.04 LTS and various other GPU instances. Google tells me others are using these Azure GPU instances with Tensorflow, so it doesn't appear to be an issue with the graphics card.

Finally, I have reviewed what seems to be the canonical guide to installing CUDA on Ubuntu but it fails when running

sudo ./NVIDIA-Linux-x86_64-331.62.run 

enter image description here

The message in the log file:

ERROR: Unable to load the 'nvidia-drm' kernel module.

My Question

What is the most reliable method for installing CUDA 8 on Ubuntu 14.04 LTS?

Are there any special precauations that I need to take when running CUDA on a VM?

Edit: Additional Info

uname -a returns

Linux 2017-02-21-josh-gpu 4.4.0-64-generic #85~14.04.1-Ubuntu SMP Mon Feb 20 12:10:54 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

lsmod returns

Module                  Size  Used by
drm_kms_helper        151552  0
drm                   360448  1 drm_kms_helper
syscopyarea            16384  1 drm_kms_helper
sysfillrect            16384  1 drm_kms_helper
sysimgblt              16384  1 drm_kms_helper
fb_sys_fops            16384  1 drm_kms_helper
udf                    90112  0
crc_itu_t              16384  1 udf
dm_crypt               28672  0
joydev                 20480  0
hid_generic            16384  0
hid_hyperv             16384  0
hid                   118784  2 hid_hyperv,hid_generic
hyperv_keyboard        16384  0
hv_balloon             24576  0
input_leds             16384  0
serio_raw              16384  0
hv_netvsc              40960  0
hv_storvsc             20480  2
hv_utils               28672  2
scsi_transport_fc      65536  1 hv_storvsc
crct10dif_pclmul       16384  0
crc32_pclmul           16384  0
ghash_clmulni_intel    16384  0
hyperv_fb              20480  1
aesni_intel           167936  0
aes_x86_64             20480  1 aesni_intel
lrw                    16384  1 aesni_intel
gf128mul               16384  1 lrw
glue_helper            16384  1 aesni_intel
ablk_helper            16384  1 aesni_intel
cryptd                 20480  3 ghash_clmulni_intel,aesni_intel,ablk_helper
psmouse               126976  0
hv_vmbus               90112  7 hv_balloon,hyperv_keyboard,hv_netvsc,hid_hyperv,hv_utils,hyperv_fb,hv_storvsc
floppy                 73728  0
JoshVarty
  • 161

1 Answers1

3

The official Azure documentation points out:

Currently, Linux GPU support is only available on Azure NC VMs running Ubuntu Server 16.04 LTS.+

I'm not sure why they even let you create GPU instances with 14.04 installed, but hopefully this will help spread the word.

After creating a fresh 16.04 instance I did the following:

First, I had to uninstall/blacklist the Nouveau drivers that come pre-installed on Ubuntu 16.04. They're not compatible with the NVIDIA drivers we're trying to install and will cause errors later on if we don't remove them.

 sudo nano /etc/modprobe.d/blacklist.conf

At the bottom of the file add the following entries:

 amd76x_edac #this might not be required for x86 32 bit users.
 blacklist vga16fb
 blacklist nouveau
 blacklist rivafb
 blacklist nvidiafb
 blacklist rivatv

Reboot VM with sudo reboot

I downloaded the drivers directly from Microsoft, but you can substitute with your preferred source:

wget -O NVIDIA-Linux-x86_64-384.73-grid.run https://go.microsoft.com/fwlink/?linkid=849941  

chmod +x NVIDIA-Linux-x86_64-384.73-grid.run

sudo ./NVIDIA-Linux-x86_64-384.73-grid.run

I just clicked through the default selected options in the runfile.

Verify driver installation by running nvidia-smi

Install CUDA Toolkit 8

CUDA_REPO_PKG=cuda-repo-ubuntu1604_8.0.44-1_amd64.deb

wget -O /tmp/${CUDA_REPO_PKG} http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/${CUDA_REPO_PKG} 

sudo dpkg -i /tmp/${CUDA_REPO_PKG}

rm -f /tmp/${CUDA_REPO_PKG}

sudo apt-get update

sudo apt-get install cuda-drivers
JoshVarty
  • 161