3

I know similar questions have been asked countless times, but none of the solutions worked for me so far. Let's start from the beginning.

I have a workstation with an NVidia RTX A5000 running Ubuntu 20.04. Before the events of today, I was able to connect to my workstation using SSH and render OpenGL windows using XQuartz on a Mac Book Pro.

Today I was at my workstation trying to run a windowed program, and getting a "Failed to establish dbus connection" error. After Googling, it seemed that this was because of a bug in the driver I had (495.44, mentioned here). I decided to update the driver. All the options in the "Additional drivers" tab were greyed out, and at the end it said I was using a manually installed driver. I came across this question and ran

sudo ubuntu-drivers autoinstall

It ran without hiccups. I then rebooted the machine, and instead of getting the login screen I was greeted with the black screen and blinking white cursor at the top. Fortunately, I was still able to SSH into it. After a while I figured out that ubuntu-drivers had installed nvidia-driver-470. I followed the instructions on this answer to remove it

dpkg -P $(dpkg -l | grep nvidia-driver | awk '{print $2}')
apt autoremove

I did not install noveau (the Nvidia driver README said it was a bad idea). I rebooted the machine and got the login screen this time. Opening the "Additional drivers" showed that I was back to the manually installed (and buggy) driver (I confirmed this by running nvidia-smi, also).

This time around I decided to install the driver manually, and ran

sudo apt install nvidia-driver-510

I rebooted the computer and everything worked this time. I got to the login screen, the bug was gone, and I managed to run the application I wanted (I also discovered I was calling it with incorrect arguments, which is now making me wonder whether all of this was in vain...). Everything was fine when I used the display attached to the workstation.

The problem I have now is with rendering over SSH. For reference, xclock works, so I did not lose remote X rendering completely. For other applications (I think the common thread is that they use GL, but I'm not sure) I get the following errors

libGL error: No matching fbConfigs or visuals found
libGL error: failed to load driver: swrast

I saw some answers saying to run apt-get intall -y mesa-utils libgl1-mesa-glx. I was missing libgl1-mesa-glx, but that did not fix it. Running a GL application such as glxgears or glxinfo with debug information I get the following

$ LIBGL_DEBUG=verbose glxinfo
name of display: localhost:10.0
X Error of failed request:  BadValue (integer parameter out of range for operation)
  Major opcode of failed request:  149 (GLX)
  Minor opcode of failed request:  24 (X_GLXCreateNewContext)
  Value in failed request:  0x0
  Serial number of failed request:  16
  Current serial number in output stream:  17

$ LIBGL_DEBUG=verbose glxgears libGL: MESA-LOADER: dlopen(/usr/lib/x86_64-linux-gnu/dri/swrast_dri.so) libGL: Can't open configuration file /etc/drirc: No such file or directory. libGL: Can't open configuration file /home/[my username]/.drirc: No such file or directory. libGL: Can't open configuration file /etc/drirc: No such file or directory. libGL: Can't open configuration file /home/[my username]/.drirc: No such file or directory. libGL: Disabling server's aux buffer support libGL error: No matching fbConfigs or visuals found libGL error: failed to load driver: swrast Error: glXCreateContext failed

I tried several things, among which selecting the noveau driver from the "Additional drivers" tab (which stopped being greyed out after installing the Nvidia driver). Nothing worked. I read somewhere that this could be due to the Nvidia drivers, so I decided to uninstall nvidia-driver-510 and go back to the manually installed driver. However, after rebooting, my screen resolution was down to 640x480, everything out of scale, and the noveau driver selected. I uninstalled xserver-xorg-video-nouveau and rebooted, hoping to get the manually installed driver, but once I logged in I had noveau again (this despite xserver-xorg-video-nouveau not being installed; I checked). I installed nvidia-driver-510 again to get my screen back to normal.

I saw that other users with the same problem were able to solve it by installing distribution specific packages (mesa-libGLw-devel.x86_64 in CentOS7, mesa-dri-drivers in Redhat), but I don't know how to find the equivalent package for my distribution. Other answers recommend uninstalling the Nvidia drivers, but that alone is not enough to revert back to the (buggy but working) manually installed driver. Is there any package that provides the noveau driver in addition to xserver-xorg-video-nouveau that I might have missed? If so, how can I find what package that is?

In case it might be helpful, here is additional information I see people asking for in other questions

$ sudo ldconfig -p | grep -i gl.so
libwayland-egl.so.1 (libc6,x86-64) => /lib/x86_64-linux-gnu/libwayland-egl.so.1
libcogl.so.20 (libc6,x86-64) => /lib/x86_64-linux-gnu/libcogl.so.20
libOpenGL.so.0 (libc6,x86-64) => /lib/x86_64-linux-gnu/libOpenGL.so.0
libOpenGL.so (libc6,x86-64) => /lib/x86_64-linux-gnu/libOpenGL.so
libGL.so.1 (libc6,x86-64) => /lib/x86_64-linux-gnu/libGL.so.1
libGL.so.1 (libc6) => /lib/i386-linux-gnu/libGL.so.1
libGL.so (libc6,x86-64) => /lib/x86_64-linux-gnu/libGL.so
libEGL.so.1 (libc6,x86-64) => /lib/x86_64-linux-gnu/libEGL.so.1
libEGL.so (libc6,x86-64) => /lib/x86_64-linux-gnu/libEGL.so

where libGL.so and libGL.so.1 both point to /lib/x86_64-linux-gnu/libGL.so.1.7.0.

$ lspci -k | grep -EA3 'VGA|3D|Display'
08:00.0 VGA compatible controller: NVIDIA Corporation Device 2231 (rev a1)
    Subsystem: NVIDIA Corporation Device 147e
    Kernel driver in use: nvidia
    Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia

This was a long question... Thank you for reading all the way down here.

2 Answers2

1

What are you trying to do?

"The problem I have now is with rendering over SSH" is a very broad phrase. It can mean:

  1. Rendering in the server, not displaying anything (i.e. offscreen rendering)
  2. Rendering in the server, displaying the command in the server's display
  3. Rendering in the server, forwarding the results to your computer
  4. Forwarding the X11 commands from server to your client computer, rendering in your client computer

Normally if you SSH into the server and run a command that uses the GPU, only offscreen rendering will work (option 1) but most programs aren't coded to handle this; so they will fail with cryptic error messages like the one you're having (because they have to use the EGL DRM interface to talk to the GPU, while most OpenGL programs will try to use GLX X11 or EGL X11).

This can be workarounded by setting the DISPLAY environment variable i.e. DISPLAY=:0.0 glxinfo which will turn into option 2 (Rendering in the server, displaying the command in the server's display).

If you want option 3, this is hard to get right; so I'd suggest you use an alternate software that works well for this task such as Nomachine, TeamViewer or Anydesk (warning: none of them are FOSS). You can try other tools like X2Go or VNC but in my experience unfortunately they have nowhere the latency or quality of the other tools I mentioned

You didn't paste what command you are using to open xclock or glxinfo over SSH, and that has a profound effect.

Also comparing the output of env in SSH vs a terminal on the host (started from GNOME/Xfce/etc) can help you identify anything wrong or missing.

swrast

swrast is provided by libgl1-mesa-dri

Try:

  1. sudo apt install --reinstall libgl1-mesa-dri
  2. Uninstalling NVIDIA drivers provided by Ubuntu
  3. Installing NVIDIA drivers from its website (warning: in rare cases the installer leaves your system in a state where it won't boot to a GUI, make sure you have SSH access so you can restore it to a working state like e.g. running ./installer --uninstall.

By default Ubuntu's package enables NVIDIA drivers and disables or breaks the Mesa ones; while the official installer keeps Mesa working alongside NVIDIA drivers. This can lead to very different results in use cases like yours.

0

I ran into similar problems.

This question helped me understand what's going on: NVIDIA 440.64 32-bit libraries package breaks 64-bit driver package

Steps:

  1. Remove all related to NVIDIA drivers and CUDA: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#removing-cuda-toolkit-and-driver

sudo apt-get --purge remove "*cuda*" "*cublas*" "*cufft*" "*cufile*" "*curand*" \ "*cusolver*" "*cusparse*" "*gds-tools*" "*npp*" "*nvjpeg*" "nsight*" "*nvvm*"

sudo apt-get --purge remove "*nvidia*" "libxnvctrl*"

sudo apt-get autoremove

  1. Support 32 bit libraries
sudo dpkg --add-architecture i386
sudo apt update
  1. Install the driver via the package manager, along with 32 bit OpenGL

sudo apt-get install nvidia-driver-515

This will also install libnvidia-gl-515 and libnvidia-gl-515:i386

You may want to check your ppa sources before install: apt-cache policy nvidia-driver-515

NVIDIA had a GPG update in April 2022: https://developer.nvidia.com/blog/updating-the-cuda-linux-gpg-repository-key/

  1. sudo reboot

In the end, I got swrast and Nvidia 515 driver working together.

Note that I didn't have to install CUDA on my machine. If you need CUDA, the underlying issue is that you will have to do some workarounds, because 32 bit libraries are not included:

https://forums.developer.nvidia.com/t/latest-cuda-driver-from-deb-repository-breaks-steam/70950

I’d like to upvote this issue on the NVIDIA side. The issue is that the latest CUDA drivers no longer bundle any 32 bit compatibility libraries. This breaks software that requires 32 bit compatibility, such as steam. It would be great if the next CUDA driver could re-include these so that I can have a single Ubuntu machine that runs both CUDA and steam.

That’s not really an issue, you just have to install driver and cuda the right way:

  • add the ubuntu graphics ppa: https://launchpad.net/~graphics-drivers/+archive/ubuntu/ppa
  • install the driver from that (sudo apt install nvidia-driver-430)
  • download the cuda .deb add the repo to your system (first three steps
  • from install instructions on download page) don’t install cuda
  • instead, run sudo apt install cuda-toolkit-10-1
meferne
  • 275