I'm running Ubuntu 16.04 with a GTX 1070. I use this machine for Tensorflow, with GPU support enabled. I rebooted my system randomly the other day, and now I can't login. I can get to the login screen, enter my password, but then it directs me back to the login screen. I can, however, enter the command line through Alt+Ctrl+F1.
When I try to install any driver from source (I don't think the driver version matters because I've tried several different ones), I get an error:
ERROR: An error occurred while performing the step: "Building kernel modules". See /var/log/nvidia-installer.log for details
and then The NVIDIA kernel module was not created
I've tried uninstalling from source sudo ./NVIDIA-Linux-x86_64-367.57-no-compat32.run --uninstall
and then reinstalling from source, but the same thing. I've tried updating from source sudo ./NVIDIA-Linux-x86_64-367.57-no-compat32.run --update
but the same thing happens.
I've tried installing from PPA:
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt-get update
sudo apt-get install nvidia-367
Which doesn't fully fail, but it outputs an error related to the kernel again: Error! Bad return status for module build on kernel: 4.4.0-53-generic
Here's what I get when I check for the driver after the PPA install:
$ nvidia-smi
modprobe: ERROR ../libkmod/libkmod-module.c:832 kmod_module_insert_module() could not find module by name='nvidia_367'
modprove: ERROR could not insert 'nvidia_367': unknown symbol in module, or unknown parameter (see dmesg)
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Maker sure that the latest NVIDIA diver is installed and running.
Is this maybe a CUDA issue? How would I go about fixing it if it is?
Should I just reinstall the OS (a fresh install without losing data)?
UPDATE
I have an idea of what caused the issue, but I'm not sure how to fix it.
I changed my default compiler to be clang
about a week ago, and I think the NVIDIA driver needs gcc
or g++
. I'm not sure how to change it back (a co-worker changed it). I tried this ln -s /usr/bin/gcc-4.9 ~/.local/bin/gcc
but that didn't help.
This bug talks about a config file pointer to clang, but doesn't exactly tell me how to point it back. How can I point the config file back to gcc
?
sudo rm .Xauthority
from the command line. – You'reAGitForNotUsingGit Dec 12 '16 at 16:05~/.Xauthority
however I think I might know what caused this, I changed my default compiler toclang
I believe. Do you think this is a compiler issue? What should the default compiler be? Also, I updated the post with the error message I get when I runnvidia-smi
– Kendall Weihe Dec 12 '16 at 16:12