Odyssey of Ubuntu 20.04 on Dell XPS13 laptop / eGPU Razor Core X / Nvidia GTX 1660 SUPER/ external Dell display

Question

Obviously I am a noob ubuntu user so please forgive any terrible mistakes I might do or required knowledge I might not have :D

The expected behavior

I have an laptop Dell XPS13 running a recently installed Ubuntu 20.04 focal on witch I want to improve the video experience with an Nvidia GTX 1660 SUPER in a eGPU Razor Core X ideally using the output on an external display.

The actual behavior

I have never successfully get anything but a black screen on the external display.

What I have tried

I have enabled Thunderbolt support in BIOS and I set it to require no security so it is recognized as soon as I plug it in. I have installed the drivers listed by the ubuntu-driver devices which is mainly the 440 version of the nvidia-driver:

ubuntu-drivers devices
== /sys/devices/pci0000:00/0000:00:1c.4/0000:03:00.0/0000:04:01.0/0000:06:00.0/0000:07:01.0/0000:08:00.0 ==
modalias : pci:v000010DEd000021C4sv00001462sd0000C758bc03sc00i00
vendor   : NVIDIA Corporation
model    : TU116 [GeForce GTX 1660 SUPER]
manual_install: True
driver   : nvidia-driver-440 - distro non-free recommended
driver   : nvidia-driver-440-server - distro non-free
driver   : xserver-xorg-video-nouveau - distro free builtin

I have run sudo ubuntu-driver autoinstall and reboot Login screen keeps coming back though I am enetring my password correctly.

If I unplug the eGPU I can pass the login screen.

If I reconnect it and run nvidia-smi I get this:

sudo nvidia-smi
[sudo] password for andrei: 
Tue Sep  8 17:55:42 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.100      Driver Version: 440.100      CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 166...  Off  | 00000000:3C:00.0 Off |                  N/A |
|  0%   40C    P0    12W / 130W |      0MiB /  5944MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Other notes:

I have no /etc/X11/xorg.conf
the /usr/lib/modprobe.d/nvidia-graphics-drivers.conf looks like this:

blacklist nouveau
blacklist lbm-nouveau
alias nouveau off
alias lbm-nouveau off

I run also nvidia-settings and I get this:

ERROR: Unable to load info from any available system
(nvidia-settings:4382): GLib-GObject-CRITICAL : 18:09:30.505: g_object_unref: assertion 'G_IS_OBJECT (object)' failed
 Message: 18:09:30.507: PRIME: Requires offloading
** Message: 18:09:30.507: PRIME: is it supported? yes
** Message: 18:09:30.534: PRIME: Usage: /usr/bin/prime-select nvidia|intel|on-demand|query
** Message: 18:09:30.534: PRIME: on-demand mode: "1"
** Message: 18:09:30.534: PRIME: is "on-demand" mode supported? yes

There is no output on the external display.

I have added also tried with the ubuntu graphic driver repository sudo apt-add-repository ppa:graphics-drivers/ppaand repeated the upper presented proces with a newly recommended driver nvidia-driver-450 however the results were similar, mainly inability to login or flickering login screen and sadly no output trough the external video card.

I also have tried using the egpu-switcher, 'nvidia-configwhich , I guess mainly tried to create/modify the /etc/X11/xorg.conf` however there was never an output on my external display (which is functional :) because works when the same cable is connects directly to my laptop).

If anybody can suggest something that will end this sufferance in a happy ending manner for me and for all the hardware and software mentioned, it will be highly appreciated :)

Thank you!

Update

Running nvidia-settings is logging this:

ERROR: Unable to load info from any available system
(nvidia-settings:20812): GLib-GObject-CRITICAL : 01:58:56.002: g_object_unref: assertion 'G_IS_OBJECT (object)' failed
 Message: 01:58:56.005: PRIME: Requires offloading
** Message: 01:58:56.005: PRIME: is it supported? yes
** Message: 01:58:56.039: PRIME: Usage: /usr/bin/prime-select nvidia|intel|on-demand|query
** Message: 01:58:56.039: PRIME: on-demand mode: "1"
** Message: 01:58:56.039: PRIME: is "on-demand" mode supported? yes

Excellent question, well documented. I've used the nvidia settings app to create an xorg file, saving it at /etc/X11/xorg.conf.d/20-nvidia.conf. After that I could use my external monitors. — kanehekili, Sep 08 '20 at 21:05
thanks @kanehekili, when I ran nvidia-settings some dialog appears with 3 options. However in terminal I see some logs, I ll update my post — bluehipy, Sep 09 '20 at 00:04

bluehipy · Accepted Answer · 2020-09-15T10:33:12.970

After long battles I actually was able to solve my issue base mostly on this comment: https://forums.developer.nvidia.com/t/nvidia-xconfig-doesnt-do-what-i-want-it-to-nor-does-nvidia-settings/107883/7

So, I think it is vital to understand that xorg.conf can not help you on this context. No matter what I did, I was not able to get any results while I had a xorg.conf.

What worked for me was:

Remove all nvidia things you might have tried: sudo apt --purge remove 'nvidia-*'
Download latest Nvidia driver from the nvidia website and make it executable.
reboot in recovery mode (or without a x server running) and run the driver installer even if it says that no gpu was found on your system
delete any /etc/X11/xorg.conf you may have
reboot normally
Install nvidia-prime if it is not installed yet
sudo prime-select nvidia
Update /usr/share/X11/xorg.conf.d/10-amdgpu.conf replase driver with modesetting

Section "OutputClass"
        Identifier "AMDgpu"
        MatchDriver "amdgpu"
        Driver "modesetting"
EndSection

Update to something like:

Section "OutputClass"
    Identifier "nvidia"
    MatchDriver "nvidia-drm"
    Driver "nvidia"
    Option "AllowEmptyInitialConfiguration"
    ModulePath "/usr/lib/x86_64-linux-gnu/nvidia/xorg"
    Option "PrimaryGPU" "Yes"
    Option "AllowExternalGpus" "True"
EndSection

Create two files optimus.desktop in /etc/xdg/autostart/ and /usr/share/gdm/greeter/autostart/ containing:

[Desktop Entry]
Type=Application
Name=Optimus
Exec=sh -c "xrandr --setprovideroutputsource modesetting 0; xrandr --auto"
NoDisplay=true
X-GNOME-Autostart-Phase=DisplayServer

(@generix is saying there modesetting NVIDIA-0; but for me it never worked like that. However it works with modesetting 0;)

reboot
Test that everything is good by running: __NV_PRIME_RENDER_OFFLOAD=1 __GLX_VENDOR_LIBRARY_NAME=nvidia glxinfo | grep vendor If it doesn't return lines with nvidia, it's not good In my case, I get:

server glx vendor string: NVIDIA Corporation
client glx vendor string: NVIDIA Corporation
OpenGL vendor string: NVIDIA Corporation

another check is that running nvidia-smi would list you at least some processes.

And I get signal out of the nvidia gpu on a external display, as I wanted :)

Thanks ;)

Yup, xorg.conf is long dead. I've manged to create that file with the help of the nvidia app. But your approach is better, since you understood what's going on. Thanks for sharing! Btw - the name of conf file is not relevant, except the 20 at start and the .conf extension — kanehekili, Sep 14 '20 at 18:03

petteri · Answer 2 · 2021-10-19T18:50:50.997

Also found the solution by @bluehipy very helpful in getting my Acer Predator Helios 300 running Ubuntu 20.04 to work with an external monitor and have the NVIDIA/CUDA stack installed properly for deep learning work as it was causing issues.

I only found this thread when thinking of actually returning the Acer Predator Helios 300 and seeing if Dell XPS 13 with eGPU could work for "thin client" type of workflow when debugging machine learning / data science models locally and actually training on cloud.

So might as well put my small tweaks to the original instructions if someone else is struggling to make their laptop work?

Prerequisites:

sudo apt install gcc make mesa-utils mpich

Install NVIDIA Driver

What worked for me was:

Remove all nvidia things you might have tried: sudo apt --purge remove nvidia-*
Original instructions said that download the latest drivers, but you might want to have the driver version found from the latest CUDA toolkit so check what that is when you are installing things. Might work with the latest NVIDIA driver? You need to check the old drivers most likely to match the CUDA toolkit driver version, e.g. at the time of these instruction the matching version was 470.57.02 (NVIDIA-Linux-x86_64-470.57.02.run).
reboot in recovery mode (or without a x server running) and run the driver installer even if it says that no gpu was found on your system (drop to root, and e.g. cd ../home/username/Downloads and ./NVIDIA-Linux-x86_64-470.74.run)
delete any /etc/X11/xorg.conf you may have
reboot (hit e on the grub menu for Ubuntu and add the nomodeset at the end)
Install nvidia-prime if it is not installed yet
sudo prime-select nvidia
Update /usr/share/X11/xorg.conf.d/10-amdgpu.conf replace driver with modesetting

Section "OutputClass" 
    Identifier "AMDgpu" 
    MatchDriver "amdgpu" 
    Driver "modesetting"
EndSection

Create the nvidia config file (sudo gedit /usr/share/X11/xorg.conf.d/10-nvidia.conf) with something like:

Section "OutputClass" 
    Identifier "nvidia" 
    MatchDriver "nvidia-drm" 
    Driver "nvidia" 
    Option "AllowEmptyInitialConfiguration" 
    ModulePath "/usr/lib/x86_64-linux-gnu/nvidia/xorg" 
    Option "PrimaryGPU" "Yes" 
    Option "AllowExternalGpus" "True"
EndSection

10 Create two files optimus.desktop in /etc/xdg/autostart/ and /usr/share/gdm/greeter/autostart/ containing:

[Desktop Entry]
Type=Application
Name=Optimus
Exec=sh -c "xrandr --setprovideroutputsource modesetting 0; xrandr --auto"
NoDisplay=true
X-GNOME-Autostart-Phase=DisplayServer

Modify the grub so that nomodeset is there every time: sudo gedit /etc/default/grub
reboot
Test that everything is good by running: __NV_PRIME_RENDER_OFFLOAD=1 __GLX_VENDOR_LIBRARY_NAME=nvidia glxinfo | grep vendor
Check that nvidia-smi would list you at least some processes.

| NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4     | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC 
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M.
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0  On |                  N/A
| N/A   42C    P8    14W /  N/A |    264MiB /  5946MiB |      1%      Default

Install CUDA Toolkit

Latest CUDA toolkit at the time of the instruction was cuda_11.4.2_470.57.02_linux.run so installed that without re-installing the NVIDIA driver

wget https://developer.download.nvidia.com/compute/cuda/11.4.2/local_installers/cuda_11.4.2_470.57.02_linux.run
sudo sh cuda_11.4.2_470.57.02_linux.run

CUDA toolkit installation

Verify CUDA installation

See https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#install-samples

Prerequisites: (if you want to have all the samples compiled properly) From:

sudo apt-get install g++ freeglut3-dev build-essential libx11-dev \
    libxmu-dev libxi-dev libglu1-mesa libglu1-mesa-dev libfreeimage-dev

e.g. ./deviceQuery returns:

 CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "NVIDIA GeForce RTX 3060 Laptop GPU"
  CUDA Driver Version / Runtime Version          11.4 / 11.4
  CUDA Capability Major/Minor version number:    8.6
  Total amount of global memory:                 5947 MBytes (6235422720 bytes)
  (030) Multiprocessors, (128) CUDA Cores/MP:    3840 CUDA Cores
  GPU Max Clock rate:                            1425 MHz (1.42 GHz)
...
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.4, CUDA Runtime Version = 11.4, NumDevs = 1
Result = PASS

cudnn installation

See guide from https://docs.nvidia.com/deeplearning/cudnn/install-guide/index.html

Download cuDNN v8.2.4 (September 2nd, 2021), for CUDA 11.4

-> cuDNN Library for Linux (x86_64), e.g. cudnn-11.4-linux-x64-v8.2.4.15.tgz

score 0 · Answer 3 · answered Sep 13 '20 at 10:59

I have a similar setup with a NUC running Ubuntu Mate 20.4 and a Razor Core X with an NVidia RTX 2060 Super.

Basically, I was at the same point you are and nothing worked. Then I messed something up and had to reinstall Ubuntu. However, I did this with the eGPU case plugged in. During installation the NVidia 440 drivers were installed automatically.

At this point I found the following post:

https://egpu.io/forums/thunderbolt-linux-setup/ubuntu-19-04-easy-to-use-setup-script-for-your-egpu/

With the script provided in the repository I could finally make the GPU work! I can access CUDA and also use two external monitors with the eGPU.

I hope this script can help you aswell. Good luck.

Thanks for the comment. The egpu-switcher does not work for me amd / nvidia case. — bluehipy, Sep 14 '20 at 13:19