0

I have an Ubuntu 22.04 computer with an old Tesla C2070 GPU that I need to use. (Fermi 2.0 compute architecure).

The latest CUDA Toolkit (compiler) that supports this GPU is 8.0 and thus gcc 5.

I am wondering which of these two options is the best way of achieving this installation:

  • Install gcc-5 g++-5 via temporary xenial APT on sources.list 1 (and disable right away), and then set this version as default via update-alternatives, and then installing CUDA 8 from the runfile? Following 2 and 3
  • Set up somehow a docker image that contains Ubuntu 16 (maybe using debootstrap and systemd-nspawn ?) that communicates directly with the GPU and prevents messing up with APT and default compilers? Or would that slow down the execution of the nvcc-compiled programs later on because it's within an image?

Thanks in advance for the advice.

  • Use the run script, reject the Nvidia driver, override any system area location (lib,icons,...) -- ensure by running the script as a user and take permission of what you need temporarily. Never change the gcc in user-alternatives, a kernel update with a failed video driver recompile will leave you with a blank screen. See https://askubuntu.com/questions/1219761/cuda-10-2-different-installation-paths/1244010#1244010 and https://askubuntu.com/questions/1077061/how-do-i-install-nvidia-and-cuda-drivers-into-ubuntu/1077063#1077063 Add gcc 5 links in your ~/bin or in cuda bin, early in path. – ubfan1 Nov 22 '22 at 17:43
  • I cannot do what you suggested, because then the run-script is installed with CUDA version 11, which then does not support sm_20 as compute architecture. I really need to run the CUDA8 installation with gcc-5 active. I tried doing export CXX and export CC rather than setting update-alternatives, but at some point it breaks because it tries to find 'include files' in the libstdc11 parts. – ferdymercury Nov 22 '22 at 17:49
  • If I try the export CXX trick, the compilation of my program starts well and gcc5 is found, but then it fails with this message, as it is checking for GNU_C in the wrong include-path spot: / In file included from /opt/cuda-8.0/bin/..//include/cuda_runtime.h:78, from : /opt/cuda-8.0/bin/..//include/host_config.h:119:2: error: #error -- unsupported GNU version! gcc versions later than 5 are not supported! 119 | #error -- unsupported GNU version! gcc versions later than 5 are not supported! | ^~~~~ – ferdymercury Nov 22 '22 at 18:04
  • Please add to your original posting instead of replying in comments, comments may be deleted at some point. Use the cuda 8 .run script, not the 11 one. Look around for it, it may be archived somewhere. You still need to alter your PATH and LD_LIBRARY_PATH to put your cuda/bin and cuda/lib first so your version 5 gcc get picked up, Do this in a script if you compile other things needing 11, or in your .profile if you want it done automatically and don't do any gcc11 work. – ubfan1 Nov 22 '22 at 18:21
  • I did the PATH alterations as you suggested, and have posted an answer with all the steps one-by-one. Unfortunately, I am left with two errors that prevent loading the NVIDIA driver. One about an incompatible pointer type, the other about a missing header file kmap_types.h – ferdymercury Dec 04 '22 at 22:56

1 Answers1

0

These are the steps I came up with. (See comments and links posted in the original question for more details on the inspiration.)

  • First, verify that your GPU card is recognized

    lspci | grep -i nvidia

    05:00.0 VGA compatible controller: NVIDIA Corporation GF100GL [Tesla C2050 / C2070] (rev a3)

  • Then, open your apt sources

    sudo nano /etc/apt/sources.list

  • Add the xenial repository at the end, save and close.

    deb http://us.archive.ubuntu.com/ubuntu/ xenial main

    deb http://us.archive.ubuntu.com/ubuntu/ xenial universe

  • Update apt lists (you might need first the manual addition of the key)

    sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 40976EAF437D05B5 3B4FE6ACC0B21F32

    sudo apt update

  • Install gcc-5 and g++-5

    sudo apt install gcc-5 g++-5

  • Remove xenial from your apt sources, save and close.

    sudo nano /etc/apt/sources.list

  • Update apt lists

    sudo apt update

  • Potentially remove any old remnants of NVIDIA drivers, maybe even restart inbetween. (Careful as this may let you only with ssh or failsafe terminal access.). This step is not needed if you only want to install the toolkit from runfile (v375) and not the matching-version driver.

    sudo nvidia-uninstall

    sudo nvidia-installer --uninstall

    sudo apt-get remove --purge '^nvidia-.*'

    sudo reboot

    sudo apt-get autoremove

  • Create a directory with links to gcc5 and g++5 and put it first thing on the PATH

    cd /opt/

    sudo mkdir gcc5

    cd gcc5

    sudo ln -s /usr/bin/gcc-5 gcc

    sudo ln -s /usr/bin/g++-5 g++

    export PATH=/opt/gcc5:$PATH

  • Download the CUDA 8 runfile from https://developer.nvidia.com/cuda-80-ga2-download-archive

    cd /tmp/

    wget https://developer.nvidia.com/compute/cuda/8.0/Prod2/local_installers/cuda_8.0.61_375.26_linux-run

    wget https://developer.nvidia.com/compute/cuda/8.0/Prod2/patches/2/cuda_8.0.61.2_linux-run

  • Extract it in order to copy the InstallUtils to the perl path

    sh cuda_8.0.61_375.26_linux-run --tar mxvf

    sudo cp InstallUtils.pm /usr/lib/x86_64-linux-gnu/perl-base/

  • A cleaner step for the previous statement is described here

  • Kill your X-server (this is only needed if you want to install the driver v375 from the runfile, otherwise install the v390 via sudo ubuntu-drivers install)

    sudo service lightdm stop

    sudo killall Xorg

  • Run the installer, answer yes to everything (answer no in the proper question if you do not want to install the driver)

    sh cuda_8.0.61_375.26_linux-run

  • I got

Driver:   Installed
Toolkit:  Installed in /usr/local/cuda-8.0
Samples:  Installed in /home/user
  • Apply the cuBLAS patch

    sudo sh cuda_8.0.61.2_linux-run

  • Verify that nouveau drivers are correctly blacklisted

    lsmod | grep nouveau

  • Export nvcc to PATH

    PATH=$PATH:/usr/local/cuda-8.0/bin

  • And you will end up with:

    nvcc --version

    nvcc: NVIDIA (R) Cuda compiler driver

    Copyright (c) 2005-2016 NVIDIA Corporation

    Built on Tue_Jan_10_13:22:03_CST_2017

    Cuda compilation tools, release 8.0, V8.0.61

  • and, if you chose to install the runfile driver:

    nvidia-smi

    NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

  • Because of

    dkms status

    nvidia/375.26: added

    sudo dkms remove nvidia/375.26 --all

    sudo dkms install nvidia/375.26 -k $(uname -r)

    Error! Bad return status for module build on kernel: 5.15.0-56-generic (x86_64). Consult /var/lib/dkms/nvidia/375.26/build/make.log for more information.

  • The logfile make.log:

    /var/lib/dkms/nvidia/375.26/build/common/inc/nv-mm.h:86:42: error: passing argument 1 of ‘get_user_pages_remote’ from incompatible pointer type [-Werror=incompatible-pointer-types]

    /var/lib/dkms/nvidia/375.26/build/common/inc/nv-linux.h:98:10: fatal error: asm/kmap_types.h: No such file or directory

  • Go then to samples:

    cd ~/NVIDIA_CUDA-8.0_Samples

  • Hand-hacking is needed for a system header file, at line 37, after the definition of __HAVE_FLOAT128

    sudo nano /usr/include/x86_64-linux-gnu/bits/floatn.h

    #if CUDART_VERSION

    #undef __HAVE_FLOAT128

    #define __HAVE_FLOAT128 0

    #endif

  • Note that you might need to rerun the upper step whenever you update your system packages.

  • Finally compile the samples

    make

  • Run any desired example

    ./deviceQuery/deviceQuery

EDIT: With the following (hacky quick) patch on /usr/src/nvidia-375.26, inspired from the NVIDA openGPU kernel as well as forum posts elsewhere, I was able to modprobe and run nvidia-smi. I disabled some things just in order to get it to compile, I need to check if these have any side effects and need more careful handling.

Mon Dec  5 12:54:10 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.26                 Driver Version: 375.26                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla C2070         Off  | 0000:05:00.0     Off |                  Off |
| 30%   58C    P0    N/A /  N/A |      0MiB /  6066MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+

1_Utilities/deviceQuery/deviceQuery 
1_Utilities/deviceQuery/deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "Tesla C2070"
  CUDA Driver Version / Runtime Version          8.0 / 8.0
  CUDA Capability Major/Minor version number:    2.0
  Total amount of global memory:                 6066 MBytes (6361120768 bytes)
  (14) Multiprocessors, ( 32) CUDA Cores/MP:     448 CUDA Cores
  GPU Max Clock rate:                            1147 MHz (1.15 GHz)
  Memory Clock rate:                             1494 Mhz
  Memory Bus Width:                              384-bit
  L2 Cache Size:                                 786432 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65535), 3D=(2048, 2048, 2048)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 32768
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  1536
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (65535, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 5 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.0, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = Tesla C2070
Result = PASS

EDIT2: Of course, this whole process messed up with my display graphics card (second NVIDIA GPU), that was installed alongside it. If I installed the recommended nvidia drivers via ubuntu-drivers devices, it removed my manually installed driver. If I didn't do anything, it was left unclaimed, with this dmesg error:

NVRM: The NVIDIA GeForce 9400 GT GPU installed in this system is
NVRM:  supported through the NVIDIA 340.xx Legacy drivers. Please
NVRM:  visit http://www.nvidia.com/object/unix.html for more
NVRM:  information.  The 375.26 NVIDIA driver will ignore
NVRM:  this GPU.  Continuing probe...

If I whitelisted nouveau, then the computing GPU did not work any more. Finally, the solution was to leave nouveau blacklisted, without adding nomodeset, then modprobing nouveau at startup via crontab, that would find the unclaimed GPU. I also had to delete first the xorg.conf file. Now I have the display GPU with nouveau and the computing GPU with the NVIDIA driver. Pheew.

  • Disable options nomodeset in conf file by commenting second line

    sudo nano /etc/modprobe.d/nvidia-installer-disable-nouveau.conf

  • Enable nouveau after booting and loading NVIDIA by adding at the end the modprobe

    sudo nano /etc/crontab

    @reboot root /sbin/modprobe nouveau

EDIT3: If you are OK with having a CUDA runtime version (8.0) different from the driver version (9.1), then things get much easier and you do not need to do this kernel patching. You just do sudo ubuntu-drivers install (uninstall first your custom driver if you did the steps before, uncomment the nomodeset option in modprobe-conf file), and when installing CUDA via the runfile, you just install the toolkit, not the driver. DeviceQuery will return:

Device 0: "Tesla C2070"
  CUDA Driver Version / Runtime Version          9.1 / 8.0
  CUDA Capability Major/Minor version number:    2.0

and nvidia-smi:

nvidia-smi 
Wed Dec  7 11:35:21 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.157                Driver Version: 390.157                   |
|-------------------------------+----------------------+----------------------+
  • Maybe a later Nvidia driver, like the 515 would work -- https://www.nvidia.com/Download/driverResults.aspx/194649/en-us/ – ubfan1 Dec 05 '22 at 05:52
  • https://forums.developer.nvidia.com/t/460-driver-installation-on-tesla-2070/170382 – ferdymercury Dec 05 '22 at 09:34
  • I wrote a quick patch and now it compiles at least. I need to check because I disabled some things in the code that might be important, but at least it's a start! Thanks for the hints. – ferdymercury Dec 05 '22 at 12:17
  • https://stackoverflow.com/questions/76531467/nvcc-cuda8-gcc-5-3-no-longer-compiles-with-o1-on-ubuntu-22-04 – ferdymercury Jun 28 '23 at 17:11
  • alias nouveau off must be commented via sudo nano /usr/lib/modprobe.d/nvidia-graphics-drivers.conf if after a system update your display gets unclaimed again, and modprobe nouveau returns modprobe: ERROR: ../libkmod/libkmod-module.c:838 kmod_module_insert_module() could not find module by name='off' modprobe: ERROR: could not insert 'off': Unknown symbol in module, or unknown parameter (see dmesg). – ferdymercury Feb 14 '24 at 17:47
  • More issues with string_fortified in stpncpy_check with gcc5 and -O1, see https://stackoverflow.com/questions/76531467/nvcc-cuda8-gcc-5-3-no-longer-compiles-with-o1-on-ubuntu-22-04 – ferdymercury Mar 15 '24 at 19:08