Possible to install distinct drivers for two dissimilar nvidia GPUs on Ubuntu 20.04?

Question

I have had a Geforce GTX 470 in my ubuntu rig for some time. I was unable to get it working with CUDA because apparently nvidia dropped support for old Fermi cards. Note that I tried uninstalling/reinstalling various driver options and was unable to get a display better than 1024x768 using the nvidia-driver-460 with the GTX 470. The only nvidia driver that would work properly with it was nvidia-driver-390, but even with this driver, there was apparently no CUDA. This was the output of nvidia-smi before I put the GTX 1050 in the machine.

$ nvidia-smi
Mon Apr 12 10:15:04 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.141                Driver Version: 390.141                   |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 470     Off  | 00000000:02:00.0 N/A |                  N/A |
| 40%   58C    P0    N/A /  N/A |    506MiB /  1216MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0                    Not Supported                                       |
+-----------------------------------------------------------------------------+

So I bought and installed a GeForce GTX 1050 Ti. After effort installing drivers and nvidia-cuda-toolkit, I am now using this GTX 1050 card for my display and it also is working properly with Blender, significantly speeding up render operations. Unfortunately, the GTX 470, which is still in the computer, seems to be unusable, and the card is no longer recognized by nvidia-smi. That command doesn't even acknowledge the GTX 470, and only lists the GTX 1050:

$ nvidia-smi -L
GPU 0: GeForce GTX 1050 Ti (UUID: GPU-89930378-de2a-cf96-b00a-693446ccda2c)

I cannot change the driver in use for the GTX 470 using the Additional Divers tab of the Software & Update window. The radio buttons are stuck on Continue using a manually installed driver and cannot be changed:

The GTX 1050, however, seems to do just fine with nvidia-driver-460:

$ nvidia-smi
Fri Apr 16 15:16:52 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.67       Driver Version: 460.67       CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 105...  Off  | 00000000:01:00.0  On |                  N/A |
| 30%   34C    P5    N/A /  75W |    536MiB /  4032MiB |      5%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1597      G   /usr/lib/xorg/Xorg                 35MiB |
|    0   N/A  N/A      2416      G   /usr/lib/xorg/Xorg                259MiB |
|    0   N/A  N/A      2543      G   /usr/bin/gnome-shell              140MiB |
|    0   N/A  N/A      3960      G   /usr/lib/firefox/firefox            1MiB |
|    0   N/A  N/A      4126      G   /usr/lib/firefox/firefox            1MiB |
|    0   N/A  N/A      4133      G   /usr/lib/firefox/firefox            1MiB |
|    0   N/A  N/A      4150      G   /usr/lib/firefox/firefox            1MiB |
|    0   N/A  N/A     26149      G   blender                            79MiB |
|    0   N/A  N/A     26833      G   /usr/lib/firefox/firefox            1MiB |
|    0   N/A  N/A     32571      G   /usr/lib/firefox/firefox            1MiB |
+-----------------------------------------------------------------------------+

Curiously, the CUDA version reported there (11.2) doesn't match the CUDA version reported for nvcc (10.1.243):

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243

Despite the CUDA version mismatch, I'm pretty pleased with the GTX 1050's behavior. It appears to be using the full 8GT/s speed of PCI-E 3 and helps my blender rendering speed a LOT.

I've got a few questions:

Is it possible to install different drivers for the respective video cards? The GTX 470 won't work with the nvidia-driver-460 which is always installed with nvidia-cuda-toolkit and seems to be working just great with the GTX 1050 and Blender. Would there be any point in using the X.org X server for the GTX 470 if I can manage such a thing?
Should I try and update/upgrade nvcc to CUDA 11.2? I'm worried I might break something and wonder if there's even any advantage in doing so. I'm most worried that blender won't recognize the card any more.
Is it even worth keeping the GTX 470 in the machine? While I doubt I can get the GTX 470 to help with Blender rendering, it might be able to handle display-related duties for Ubuntu, allowing the GTX 1050 to focus all its efforts on my blender renderings.
If I were to install a second GTX 1050 Ti, would that provide further speed increases in Blender?

EDIT: some additional information. Some output from sudo dmesg:

[    3.032920] nvidia: loading out-of-tree module taints kernel.
[    3.032926] nvidia: module license 'NVIDIA' taints kernel.
[    3.032926] Disabling lock debugging due to kernel taint
[    3.033029] RAPL PMU: API unit is 2^-32 Joules, 2 fixed counters, 163840 ms ovfl timer
[    3.033030] RAPL PMU: hw unit of domain pp0-core 2^-16 Joules
[    3.033030] RAPL PMU: hw unit of domain package 2^-16 Joules
[    3.050459] nvidia-nvlink: Nvlink Core is being initialized, major device number 238
[    3.050773] NVRM: The NVIDIA GeForce GTX 470 GPU installed in this system is
               NVRM:  supported through the NVIDIA 390.xx Legacy drivers. Please
               NVRM:  visit http://www.nvidia.com/object/unix.html for more
               NVRM:  information.  The 460.67 NVIDIA driver will ignore
               NVRM:  this GPU.  Continuing probe...
[    3.050895] nvidia 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
[    3.154354] cryptd: max_cpu_qlen set to 1000
[    3.166652] NVRM: ignoring the legacy GPU 0000:02:00.0
[    3.166677] nvidia: probe of 0000:02:00.0 failed with error -1
[    3.166697] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  460.67  Thu Mar 11 00:11:45 UTC 2021
[    3.324880] AVX version of gcm_enc/dec engaged.
[    3.324880] AES CTR mode by8 optimization enabled
[    3.352802] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  460.67  Thu Mar 11 00:03:18 UTC 2021

Output of sudo lshw -C video :

$ sudo lshw -C video
[sudo] password for sneakyimp: 
  *-display                 
       description: VGA compatible controller
       product: GP107 [GeForce GTX 1050 Ti]
       vendor: NVIDIA Corporation
       physical id: 0
       bus info: pci@0000:01:00.0
       version: a1
       width: 64 bits
       clock: 33MHz
       capabilities: pm msi pciexpress vga_controller bus_master cap_list rom
       configuration: driver=nvidia latency=0
       resources: irq:59 memory:f9000000-f9ffffff memory:b0000000-bfffffff memory:c0000000-c1ffffff ioport:e000(size=128) memory:c0000-dffff
  *-display UNCLAIMED
       description: VGA compatible controller
       product: GF100 [GeForce GTX 470]
       vendor: NVIDIA Corporation
       physical id: 0
       bus info: pci@0000:02:00.0
       version: a3
       width: 64 bits
       clock: 33MHz
       capabilities: pm msi pciexpress vga_controller cap_list
       configuration: latency=0
       resources: memory:f6000000-f7ffffff memory:c8000000-cfffffff memory:d0000000-d3ffffff ioport:d000(size=128) memory:f8000000-f807ffff

And ubuntu-drivers devices suggests that the old 470 is trying to use nvidia-driver-340, which is super old:

$ ubuntu-drivers devices
== /sys/devices/pci0000:00/0000:00:03.0/0000:02:00.0 ==
modalias : pci:v000010DEd000006CDsv00001043sd00008342bc03sc00i00
vendor   : NVIDIA Corporation
model    : GF100 [GeForce GTX 470]
manual_install: True
driver   : nvidia-340 - distro non-free
driver   : xserver-xorg-video-nouveau - distro free builtin
== /sys/devices/pci0000:00/0000:00:02.0/0000:01:00.0 ==
modalias : pci:v000010DEd00001C82sv00001043sd0000862Abc03sc00i00
vendor   : NVIDIA Corporation
model    : GP107 [GeForce GTX 1050 Ti]
driver   : nvidia-driver-390 - distro non-free
driver   : nvidia-driver-418-server - distro non-free
driver   : nvidia-driver-460-server - distro non-free
driver   : nvidia-driver-450 - distro non-free
driver   : nvidia-driver-460 - third-party non-free recommended
driver   : nvidia-driver-450-server - distro non-free
driver   : xserver-xorg-video-nouveau - distro free builtin

A. Genchev · Answer 1 · 2022-09-29T00:08:49.873

I did some experiments with a bit different hardware: TESLA M40 24GB and Quadro FX380. The "new" driver supports the TESLA with CUDA, nvidia-smi, etc. but it won't support the quadro (like your GTX 470, it is supported by the old driver). You cannot mix different versions of the nvidia's proprietary stuff. That's why we use the opensource driver. The nouveau driver supports your/my legacy GPU. I use the newer GPU to accelerate ML stuff with CUDA. The old GPU I use to support the display, watching video, etc. so the video memory of the Tesla is untouched.

To load the "new" driver (your GTX 1050), the nouveau must be blacklisted. On my Ubuntu 20.04, it is done in the file: /lib/modprobe.d/nvidia-graphics-drivers.conf Initially it contained:

blacklist nouveau
blacklist lbm-nouveau
alias nouveau off
alias lbm-nouveau off

I left uncommented only:

blacklist nouveau

line. We shouldn't change the alias, because we need nouveau for the legacy GPU.

You may then get correct modesetting on the legacy GPU by loading nouveau for it. I set the cron daemon to do this for me. In /etc/crontab file, I added the following line:

@reboot     root   /sbin/modprobe nouveau

This loads the "blacklisted" nouveau driver after the nvidia driver has been loaded. This enables acceleration on the old card and the new card is already set up. In my case this makes sense, because the TESLA doesn't have video output. In your case this might not matter so much.

But you will not have CUDA working on your legacy gpu - nouveau is open source effort of good will people while the proprietary driver brings proprietary tech (CUDA). And the vendor might abandon the support for your hardware anytime. That's why proprietary & closed things are generally unwanted.

Ubuntu 20 and Ubuntu 20.04 are different products. You may want to edit your answer. — David, Sep 15 '22 at 09:14

Possible to install distinct drivers for two dissimilar nvidia GPUs on Ubuntu 20.04?

1 Answers1

Linked