1

On Ubuntu Mate 22.04, recent updates have resulted in the "Your GPU memory is full" error from DaVinci Resolve, despite no other processes running.

So far, have tried purging and reinstalling Nvidia drivers, downgrading to previous version of DaVinci and previous drivers for Nvidia, manually updating 5 packages that had been kept back (gjs libgjs0g libnetplan0 libsgutils2-2 netplan.io), and generally trying to get my workstation back up and running.

The computer is running smoothly and returns the usual information when entering nvidia-smi, so I think the GPU runs as it should, there is just some miscommuncation happening in DaVinci.

Is there a way I can downgrade the drivers, CUDA or something else to the previous version until this is (hopefully) resolved?

Secure Boot in UEFI is disabled, as far as I can tell there are no broken or missing packages or depedencies. But I do get a generic error message from Ubuntu on boot.

$ nvidia-smi
Thu Oct 12 10:50:14 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.113.01             Driver Version: 535.113.01   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3050 ...    Off | 00000000:01:00.0 Off |                  N/A |
| N/A   40C    P0              N/A /  35W |      9MiB /  4096MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 1519 G /usr/lib/xorg/Xorg 4MiB | +---------------------------------------------------------------------------------------+

Cypress
  • 60
  • 1
    You don't provide details as to which kernel stack you're using (with Ubuntu MATE, the install media decides your default), and I wonder if just switching your default kernel stack (from HWE to GA, or vise-versa) would help. Do note: using only open source kernel modules (drivers) both stacks can co-exist on an install, however many Nvidia kernel modules prevent both stacks co-existing... I don't know if it'll help, this is just a thought (and what I'd try, using live media probably first) – guiverc Oct 12 '23 at 09:21
  • @guiverc $ uname -r returns 6.2.0-34-generic.

    Is it very risky to noodle about with the kernels?

    – Cypress Oct 12 '23 at 09:40
  • 1
    You're using the HWE kernel stack currently; as the HWE stack gets kernels from later releases at 22.04.2 (5.19 from 22.10), 22.04.3 giving the current 6.2 from 23.04; with 6.5 from 22.10 next at 22.04.4. If your uname returned 5.15 you'd have been using the GA kernel stack, and this is your alternative (22.04 & 22.04.1 media for Ubuntu MATE defaulted to a GA stack by default; 22.04.2 & later defaults to HWE) https://wiki.ubuntu.com/Kernel/LTSEnablementStack – guiverc Oct 12 '23 at 09:43
  • @guiverc Thank you for the suggestion. I have tried a handful of different 5.15 kernels, including the recommended GA kernel, but unfortunately, the problem still persists. – Cypress Oct 12 '23 at 14:15
  • @guiverc I take it back! Came back to it with fresh eyes this morning and did a couple rounds of nvidia/cuda purge/reinstalls to the newest 5.15 kernel and I can't believe it but it works again! Thank you so much for putting me on the right track! (... now to figure out how to safely remove the other kernel flavors without breaking anything again) – Cypress Oct 13 '23 at 06:52
  • If you're using the Ubuntu supported GA kernel stack (5.15), then refer to the link I provided in a prior comment, and search for "If everything is good, you may remove the other kernel flavours:". Both kernel stacks can co-exist if you're only using open source kernel modules (though you mention nvidia/cuda so it's possible not all yours are) – guiverc Oct 13 '23 at 07:04
  • Yeah, Nvidia has been mercurial from the start and needed lots of finangling, so I'm not sure all my workarounds are still open source. I am still elated you showed me another tool for the toolbox, so thank you again. If you'd like, you could copy your answer into an actual answer that I can mark as the solution if anyone else are having issues? – Cypress Oct 13 '23 at 07:17
  • Feel free to write your own answer; credit me with providing a clue in what you worked out if you wish (you'll gain some rep that way). I'll suggest using the wiki link I provided in it if helpful. I'm glad you have it solved. :) – guiverc Oct 13 '23 at 07:28

1 Answers1

1

Big thank you again to @guiverc for all their help!

It ended up being a mismatch between kernel stacks (recent update switched to a HWE kernel stack while previous version had used the GA kernel stack).

By swapping back to the GA kernel using the link provided by @guiverc ( uname -r to check the right kernel booted in grub):

https://wiki.ubuntu.com/Kernel/LTSEnablementStack

Purging the old drivers and reinstalling them while using the GA kernel (the newest being nvidia-driver-535 as I am writing this), along with the appropriate headers for the 5.15/GA kernel using the answer in this link (and rebooting):

https://forums.developer.nvidia.com/t/nvidia-smi-has-failed-because-it-couldnt-communicate-with-the-nvidia-driver-make-sure-that-the-latest-nvidia-driver-is-installed-and-running/197141

And DaVinci worked again! Video replay and all!

PS: Being a novice Ubuntu user, I was quite nervous of unwittingly breaking something again. So rather than purging the kernel flavors that weren't working I set the grub-customizer to boot from the previously booted entry:

Set "older" kernel as default grub entry

Not the cleanest solution, but I am just happy to have my workstation back to editing videos.

I am also looking into setting up a system backup like TimeShift, to hopefully avoid this issue in the future, since Nvidia/DaVinci have been causing hiccups in Ubuntu on at least a quarterly basis.

Cypress
  • 60