1

I just upgraded my OS from Ubuntu 16.04LTS to Ubuntu 18.04LTS and then to Ubuntu 20.04 LTS as I am trying to use the GPU to run neural networks. I upgraded the OS in order to be able to install the latest nvidia drivers. I have an Nvidia Geforce GTX 1650 GPU card. In Ubuntu 18 I installed the nvidia drivers 430 and when the OS upgrade was done to Ubuntu 20.04 the nvidia drivers were automatically updated to the latest version 525 and these are the ones recommended in the official nvidia drivers website for my card, link:https://www.nvidia.com/download/driverResults.aspx/199656/en-us/

When there are processes that use a lot of RAM like playing videos, loading a lot of data from firefox or when trying to run the neural networks my computer starts slowing down, the mouse pointer starts to get choppy, the GPU temperature goes to 95°C and the gpu-util goes to 100%(running the command nvidia-smi) nvidia-smi command output just before freezing and then all the system goes into a deep freeze state, the mouse and keyboard stop responding and the audio enters into a loop state. There is no way to take it out from freeze state but to hard-reset pressing the power button.

I see there are many similar questions related to this problem in this version of Ubuntu: How to find out why Ubuntu 20.04 freezes? Ubuntu 20.04 LTS freezes randomly - Suspecting Nvidia Ubuntu 20.04 random freezes Ubuntu 20.04 random freeze ups Complete freezing - Ubuntu 20.04, probable problem with AMD driver

In most of the questions the problem was related to the BIOS version but I saw in some of the posts that the swap memory had values of 2GB or 4GB and when checking mine it is 976Mb... I have no idea if my problem is related to the swap memory?? My knowledge of Ubuntu and drivers is pretty limited. If anyone can help I would be really grateful this is getting super frustrating and long.

Here some useful info

free -h

total        used        free      shared  buff/cache   available

Mem: 15Gi 2,8Gi 10Gi 36Mi 2,2Gi 12Gi

Swap: 976Mi 0B 976Mi

sysctl vm.swappiness result:

vm.swappiness = 60

sudo lshw -C memory result:

PCI (sysfs)  
  *-firmware                
       description: BIOS
       vendor: American Megatrends Inc.
       physical id: 1
       version: E16S3IMS.108
       date: 11/18/2019
       size: 64KiB
       capacity: 16MiB
       capabilities: pci upgrade shadowing cdboot bootselect edd int13floppy1200 int13floppy720 int13floppy2880 int5printscreen int9keyboard int14serial int17printer acpi usb biosbootspecification uefi
  *-memory
       description: System Memory
       physical id: 3b
       slot: System board or motherboard
       size: 16GiB
     *-bank:0
          description: SODIMM DDR4 Synchronous 2667 MHz (0,4 ns)
          product: M471A2K43CB1-CTD
          vendor: Samsung
          physical id: 0
          serial: 36BD8D3D
          slot: ChannelA-DIMM0
          size: 16GiB
          width: 64 bits
          clock: 2667MHz (0.4ns)
     *-bank:1
          description: [empty]
          physical id: 1
          slot: ChannelB-DIMM0
  *-cache:0
       description: L1 cache
       physical id: 45
       slot: L1 Cache
       size: 384KiB
       capacity: 384KiB
       capabilities: synchronous internal write-back unified
       configuration: level=1
  *-cache:1
       description: L2 cache
       physical id: 46
       slot: L2 Cache
       size: 1536KiB
       capacity: 1536KiB
       capabilities: synchronous internal write-back unified
       configuration: level=2
  *-cache:2
       description: L3 cache
       physical id: 47
       slot: L3 Cache
       size: 12MiB
       capacity: 12MiB
       capabilities: synchronous internal write-back unified
       configuration: level=3
  *-memory UNCLAIMED
       description: RAM memory
       product: Intel Corporation
       vendor: Intel Corporation
       physical id: 14.2
       bus info: pci@0000:00:14.2
       version: 00
       width: 64 bits
       clock: 33MHz (30.3ns)
       capabilities: pm cap_list
       configuration: latency=0
       resources: memory:d5418000-d5419fff memory:d541d000-d541dfff

htop output just before freezing and after increasing the swap memory: htop

  • Ubuntu releases using the year format are specialist flavors of Ubuntu Server and not intended for desktop operation; 20 & 20.04 are thus different products. When you upgrade 16 to 18 (or 20) no user apps change, which differs from the upgrades of 16.04->18.04->20.04, so were you using 16.04? 18.04? as they differ to 16 & 18 (being specialist flavors of 16.04 & 18.04) – guiverc Mar 17 '23 at 23:45
  • Yes my original OS was Ubuntu 16.04, then I upgraded to Ubuntu 18.04 and now it is Ubuntu 20.04.6 LTS – Ferroeg92 Mar 18 '23 at 12:38
  • Please be precise with details; your question mentions 18 (Ubuntu Core 18 is a flavor of 18.04 Server and a different product). You can upgrade from 16.04 to 18.04, but not from 16.04 to 18, and Nvidia drivers won't install on 18 (only 18.04) – guiverc Mar 18 '23 at 21:59

1 Answers1

1

According to the output of free -h, your swap file is already full (0 free). Linux saw your free swap space and put some cached/shared memory in it and called it a day, but didn't realize that would lead to your system's death when the 2.8GB of physical memory filled up.

Quick free fix: increase your swap size. On my system I have 8GB RAM so I use 24GB of swap, but anything above 25GB should be good for you (the general rule of thumb is that swap should be 2x the physical ram size, but you can put more if you run into problems in the future).

Way more expensive but results in better experience: When linux starts using your swap file, things typically start getting slow. I mean, the slow kind of slow. Cursor lag, waiting on browser tabs for more than 15 seconds, it's a nightmare unless if you're keeping your swap on some kind of multi-thousand dollar NVMe drive. So I recommend just downloading buying more RAM to get the best performance. You have a free RAM slot according to the output of sudo lshw -C memory, so that's not going to be a problem.

  • I incremented the swap memory to 4GB to test if this had some impact and the system is still freezing... I took a picture of htop and I see that the swap memory is not being used, even the RAM is not saturated... but in the list of processes Xorg is demanding 24GB of virtual memory... I guess the problem is related to this Xorg process... I edited my question with the output of htop – Ferroeg92 Mar 18 '23 at 19:35
  • 1
    Does the issue happen when you are using Wayland? Google "how to switch to wayland ubuntu" and follow the instructions, then see if it happens again. – HackerDaGreat57 Mar 18 '23 at 20:26
  • @FernandaRodriguez With the extra swap space, run htop and watch it when starting the neural networks. If your RAM is still maxing out, that's probably your problem, and it's possible that the AI you're trying to run is too demanding for your hardware. You might need more swap space yet still. I'd recommend looking into zswap (which lets you combine compressed RAM and swap space in an efficient manner) if you can't buy more RAM. Of course, the ideal solution is to purchase more RAM or possibly a more powerful computer. – ArrayBolt3 Mar 19 '23 at 04:07
  • @HackerDaGreat57 thanks for pointing this out, I tried using wayland and modified the /etc/gdm3/custom.conf file and then did sudo systemctl restart gdm3 to change to wayland in the loc screen but even doing this I don't see wayland listed, my only options are "Ubuntu", "Unity", or "Gnome"... so I think there is something going on here – Ferroeg92 Mar 20 '23 at 02:40
  • @Ferroeg92 I've messed around with Wayland/Xorg switching while installing NVIDIA drivers and it really isn't pretty. I had to reinstall my system so make sure you take backups of EVERYTHING before modifying ANYTHING.

    I just launched the built in updater that comes with Ubuntu a few months ago and somehow it just switched over to Wayland. I have no idea how or why, the OS just decided "alright let's use wayland" on the next reboot AFAIK; I have no explanation and I wish you luck.

    But just in case it does work you might want to restart your entire computer after editing gdm3/custom.conf

    – HackerDaGreat57 Mar 21 '23 at 02:18