Ubuntu crashing - memtest86 shows failed tests - can tweaking UEFI help or Grub BadRam?

Question

I was asked to fixed an out-of-warranty laptop where windows 10S would hang for hours on boot & never launch. After trying everything i could think of, i gave up and installed Ubuntu.

Ubuntu launches and will work for a few minutes until it freezes (usually after launching Chrome).

I have run memtest86 and it is showing RAM errors (can't replace ram as it is soldered in). I want to mark the BadRam however cant identify it all.

My question - memtest86 passes tests 1-5 and tests 10 & 13. It fails tests 6,7,8. and 9. Any ideas why this may be happening and whether GRUB_BADRAM or tweaking RAM settings in UEFI would help?

'''

Test                                        # Tests Passed          Errors
Test 0 [Address test, walking ones, 1 CPU]  4/4                     (100%)  0
Test 1 [Address test, own address, 1 CPU]   4/4                     (100%)  0
Test 2 [Address test, own address]          4/4                     (100%)  0
Test 3 [Moving inversions, ones & zeroes]   4/4                     (100%)  0
Test 4 [Moving inversions, 8-bit pattern]   4/4                     (100%)  0
Test 5 [Moving inversions, random pattern]  4/4                     (100%)  0
Test 6 [Block move, 64-byte blocks]         0/4                       (0%)  94
Test 7 [Moving inversions, 32-bit pattern]  0/4                       (0%)  29
Test 8 [Random number sequence]             0/4                       (0%)  41
Test 9 [Modulo 20, ones & zeros]            0/4                       (0%)  177
Test 10 [Bit fade test, 2 patterns, 1 CPU]  4/4                     (100%)  0
Test 13 [Hammer test]                       4/4                     (100%)  0

The RAM is 4 GB, 4 DIMM slots with

1GB LPDDR4 PC4-17000
SK Hynix / H9HCNNN8KUMLHR / 00000000
2133 MHz

The last 10 errors (in memtest86) were

2020-07-09 16:17:43 - [Data Error] Test: 9, CPU: 1, Address: 178A4D6B4, Expected: 90BD5162, Actual: 90BD5160
2020-07-09 16:17:24 - [Data Error] Test: 9, CPU: 1, Address: 17052573C, Expected: 4230F0F6, Actual: 4230F0F4
2020-07-09 16:17:08 - [Data Error] Test: 9, CPU: 0, Address: 17795431C, Expected: 0146A628, Actual: 0146A62A
2020-07-09 16:16:52 - [Data Error] Test: 9, CPU: 1, Address: 168B0F3B4, Expected: 9E286A4A, Actual: 9E286A48
2020-07-09 16:15:57 - [Data Error] Test: 9, CPU: 0, Address: 15F787234, Expected: 43135D5C, Actual: 43135D5E
2020-07-09 16:15:54 - [Data Error] Test: 9, CPU: 1, Address: 1501AE294, Expected: A0EE6E32, Actual: A0EE6E30
2020-07-09 16:15:13 - [Data Error] Test: 9, CPU: 0, Address: 14E3C71F8, Expected: 0AC0FD2E, Actual: 0AC0FD2C
2020-07-09 16:14:59 - [Data Error] Test: 9, CPU: 1, Address: 1424E7234, Expected: C0BE5E14, Actual: C0BE5E16
2020-07-09 16:14:29 - [Data Error] Test: 9, CPU: 1, Address: 13B3A53BC, Expected: 6368E450, Actual: 6368E452
2020-07-09 16:14:04 - [Data Error] Test: 9, CPU: 1, Address: 128174E74, Expected: AAD22C33, Actual: AAD22C31

Thanks

Bit 1 is stuck low on some memory. Cut your losses and toss the computer. — Doug Smythies, Jul 09 '20 at 23:53
Thanks very very much for your detailed response. I'm sorry i have been away since posting this question but i have tried the steps below and i think i am on my way. The system is more stable, but i still need to block more ram (there are a lot of addresses and its not in one straight range). I will continue to work on this. — fgadev, Aug 03 '20 at 19:21

heynnema · Accepted Answer · 2020-07-10T23:04:41.310

If you look in /etc/default/grub, you'll find a GRUB_BADRAM= parameter where you can identify what bad memory locations there are.

# Uncomment to enable BadRAM filtering, modify to suit your needs
# This works with Linux (no patch required) and with any kernel that obtains
# the memory map information from GRUB (GNU Mach, kernel of FreeBSD ...)
#GRUB_BADRAM="0x01234567,0xfefefefe,0x89abcdef,0xefefefef"

Also see How to install Ubuntu on a laptop with soldered RAM module that has damaged cells

Source: https://help.ubuntu.com/community/BadRAM#BADRAM_setting_in_Grub2

BADRAM setting in Grub2

The GRUB2 config file ~~in Natty~~ has a line for configuring kernel bad ram exclusions. So, I will assume that is the preferred way of mapping out a section of memory that is showing errors. The line I set was

GRUB_BADRAM="0x7DDF0000,0xffffc000"

The suggested way on every web site I could find was to set this was to run memtest86 and let it show you the BadRAM settings. memtest86 gave me a page of stuff I would have had to enter. I could see that all the addresses were in one 16K block, so I just wanted to map that 16K block out of action. Here is how I generated the correct entry.

The first parameter is easy. That is the base address of the bad memory. In my case, I could see that all the bad addresses were greater than 0x7DDF0000 and less than 0x7DDF4000. So, I took the beginning of the 16K block as my starting address.

The second parameter is a mask. You put 1s where the address range you want shares the same values and 0s where it will vary. This means you need to pick your address range such that only the low order bits vary. Looking at my address, the first part of the mask is easy. You want to start with 0xffff. For the next nibble, I will explain with bit maps. I want to range from 0000 to 0011. So, the mask for badram would be 1100 or a hex c. The last 3 nibbles need to be all 0s in the mask, since we want the entire range mapped out. So, we get a total result of 0xffffc000.

After setting this line in /etc/default/grub, I ran sudo update-grub and rebooted and my bad memory was no longer being used. No kernel patches are needed to map out bad memory using this method.

Also see https://askubuntu.com/questions/1222971/how-to-install-ubuntu-on-a-laptop-with-soldered-ram-module-that-has-damaged-cell/1222980#1222980 — heynnema, Jul 10 '20 at 22:33

Ubuntu crashing - memtest86 shows failed tests - can tweaking UEFI help or Grub BadRam?

1 Answers1