0

Built a new PC: 10700k, z490 MSI board, 32gb Ram etc. Running Ubuntu 20.04 and getting an MCE (Machine Check Error): kernel: [ 0.185888] mce: CPU0: Thermal monitoring enabled (TM1).

The PC is just running the Game boost from the MSI Bios, and not manually overclocked. Using a Corsair H115 2*140mm cooler, with temperatures on idle being at most 32 degrees per core.

Apart from getting these MCE errors, sometimes the PC completely freezes when left on idle. A force restart is the only option. What could this be?

  • Go to https://www.memtest86.com/ and download/run their free memtest to test your memory. Get at least one complete pass of all the 4/4 tests to confirm good memory. This may take many hours to complete. – heynnema Dec 02 '20 at 23:13
  • I have ran the memtest for hours as suggested, but everything is fine with the ram. What else could it be? – Matthew Vella Dec 08 '20 at 18:36
  • What processor? Show me sudo dmidecode -s bios-version and tell me the EXACT model # of your motherboard. MSI Z490-what? Show me free -h and sudo lshw -C memory and sysctl vm.swappiness. If you turn off "Game Boost" in the BIOS, any difference? Start comments to me with @heynnema or I'll miss them. – heynnema Dec 08 '20 at 18:41
  • @heynnema sudo dmidecode -s bios-version : 1.00. MSI-z490 Gaming Edge Wifi – Matthew Vella Dec 09 '20 at 18:22
  • Free -h: Mem: 31Gi 2.5Gi 26Gi 419Mi 2.1Gi 27Gi – Matthew Vella Dec 09 '20 at 18:23
  • *-firmware
    description: BIOS vendor: American Megatrends Inc. physical id: 0 version: 1.00 date: 03/24/2020 size: 64KiB capacity: 32MiB capabilities: pci upgrade shadowing cdboot bootselect socketedrom edd int13floppy1200 int13floppy720 int13floppy2880 int5printscreen int9keyboard int14serial int17printer acpi usb biosbootspecification uefi
    – Matthew Vella Dec 09 '20 at 18:26
  • *-memory description: System Memory physical id: 39 slot: System board or motherboard size: 32GiB – Matthew Vella Dec 09 '20 at 18:26
  • vm.swappiness = 60 – Matthew Vella Dec 09 '20 at 18:26
  • @heynnema Both Game boost and XMP were turned off but the problem persists. The memory was also tested and I don't think that this is a memory swap problem as I have 32gb of ram. – Matthew Vella Dec 09 '20 at 18:28
  • @heynnema Thanks i will! – Matthew Vella Dec 09 '20 at 18:47

1 Answers1

0

memtest

Go to https://www.memtest86.com/ and download/run their free memtest to test your memory. Get at least one complete pass of all the 4/4 tests to confirm good memory. This may take many hours to complete.

BIOS

MSI-Z490 Gaming Edge Wifi

You have BIOS version 1.00.

There's a newer BIOS available that might help solve your problem, version 1.30, dated 10/14/2020, and can be downloaded here.

Note: Confirm that I have the correct web page for your motherboard model #.

Note: Have good backups before updating the BIOS.

/swapfile

Even though you have a lot of memory, a swap partition or /swapfile is still recommended. Let's create a fresh /swapfile...

Note: Incorrect use of the dd command can cause data loss. Suggest copy/paste.

In the terminal...

sudo swapoff -a           # turn off swap
sudo rm -i /swapfile      # remove old /swapfile

sudo dd if=/dev/zero of=/swapfile bs=1M count=4096

sudo chmod 600 /swapfile # set proper file protections sudo mkswap /swapfile # init /swapfile sudo swapon /swapfile # turn on swap free -h # confirm 32G RAM and 4G swap

Confirm this /swapfile line at the end of /etc/fstab... and confirm no other “swap” lines...

To edit, use sudo -H gedit /etc/fstab or sudo pico /etc/fstab

/swapfile  none  swap  sw  0  0

reboot                    # reboot and verify operation

vm.swappiness

Since you have 32G RAM, to minimize swapping, let's change this value...

Set vm.swappiness=10 (based on 32G RAM and 4G SWAP), this way...

sudo -H gedit /etc/sysctl.conf # edit this file

Search for an existing vm.swappiness= entry...

CTRL+f vm.swappiness

  • If found, edit it to say vm.swappiness=10

  • If not found, add vm.swappiness=10 at the end of the file

Save your edits and quit gedit

sudo sysctl -p

Update #1:

Set intel_idle.max_cstate=1 in /etc/default/grub, and then sudo update-grub and reboot solved the problem.

heynnema
  • 70,711
  • I will update the BIOS. By any chance, does this motherboard have 2 UEFI/BIOS or a single chip? Just in case the Bios gets corrupted – Matthew Vella Dec 09 '20 at 18:44
  • @MatthewVella I don't know. Some manufacturer's build in some redundancy to safeguard the process. – heynnema Dec 09 '20 at 18:45
  • @MatthewVella Status please... – heynnema Dec 10 '20 at 15:55
  • I have updated the BIOS to the latest version. I will update this week if the problem is solved or persists. Thanks – Matthew Vella Dec 11 '20 at 12:19
  • I have also updated the swappiness value to 10, worked great! – Matthew Vella Dec 11 '20 at 12:19
  • @MatthewVella Good! Did you make the /swapfile? – heynnema Dec 11 '20 at 15:03
  • Yes, thanks for the guide! If I have no issues for the next 3 days, I will mark it as the correct solution, as I usually have at least an issue every 3 days. – Matthew Vella Dec 11 '20 at 23:30
  • Unfortunately, the problem still persists. MCE check errors and the pc froze again, turning off all devices including USB ports, wifi, Bluetooth chips etc. – Matthew Vella Dec 13 '20 at 13:57
  • @MatthewVella If I have to wager, I'd say bad CPU. Outside chance it could be a power supply, so if you have one, give it a try. – heynnema Dec 13 '20 at 14:47
  • @MatthewVella Status please... – heynnema Dec 24 '20 at 19:28
  • The PC is much more stable, hence I marked your answer as Correct. I do believe though there are Motherboard issues, as the PSU is working well and the CPU is ploughing trough benchmarks, but as soon as the PC is idle, it locks and shutdowns all the hardware. This is happening less frequently after the BIOS update. – Matthew Vella Dec 27 '20 at 11:30
  • @MatthewVella You may have a problem with a cstate parameter that needs tweeking. Search for cstate or c-state here on AU. Look for descriptions of a GRUB mod that looks similar to intel_idle.max_cstate=1. Example: https://askubuntu.com/questions/749349/how-to-set-intel-idle-max-cstate-1 and https://askubuntu.com/questions/896149/can-i-simulate-the-intel-bay-trail-processor-cstate-crash/896162#896162 – heynnema Dec 27 '20 at 15:02
  • Updating the C-State to 1, completely solved the issue! Thanks! – Matthew Vella Feb 15 '21 at 13:12