1

Recently, almost every time I run Ubuntu, the operating system has experienced an internel error. I believe my current version of xorg is partially responsible, but I've received many kerneloops errors, none of which I experienced while I had 4.4.0-31 as the in-use kernel. Thus, I wish to downgrade my kernel from 4.4.0-83 to 4.4.0-31.

I've changed my grub file according to the instructions in

Set "older" kernel as default grub entry

but upon booting up 4.4.0-83 is still the kernel in use. The instructions in

Grub does not autoboot the default option after upgrade to 12.10

did not fix the issue (though I'm using 14.04). Now, when choosing "advanced options" in grub, the 4.4.0-31 kernel is the default selection. But if I boot using the advanced options, I am taken to a tty1 screen, which I can't exit. I tried the commands in

How can I leave tty?

but received either no response or an error message. Below is my grub file (minus commented out lines):

GRUB_DEFAULT="Advanced options for Ubuntu>Ubuntu, with Linux 4.4.0-31-generic"
GRUB_HIDDEN_TIMEOUT_QUIET=true
GRUB_TIMEOUT=10
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"
GRUB_CMDLINE_LINUX=""
GRUB_RECORDFAIL_TIMEOUT=0

Let me know if there's any command lines I should run that may identify the problem.

EDIT 1

Here is the output from entering ls -alt /var/crash

total 71060
-rw-r-----  1 root     whoopsie  1512336 Jul 24 19:47 _usr_bin_Xorg.0.crash
drwxrwsrwt  2 root     whoopsie     4096 Jul 24 19:47 .
-rw-------  1 whoopsie whoopsie        0 Jul 24 16:36 _usr_bin_Xorg.0.uploaded
-rw-r--r--  1 root     whoopsie        0 Jul 24 16:36 _usr_bin_Xorg.0.upload
-rw-rw----  1 root     whoopsie        0 Jul 24 01:55 .lock
-rw-r-----  1 kernoops whoopsie     8445 Jul 24 00:55 linux-image-4.4.0-83-generic.233306.crash
-rw-------  1 whoopsie whoopsie        0 Jul 23 23:37 _opt_google_chrome_chrome.1000.uploaded
-rw-rw-r--  1 zachary  whoopsie        0 Jul 23 23:37 _opt_google_chrome_chrome.1000.upload
-rw-r-----  1 zachary  whoopsie 58735028 Jul 23 23:37 _opt_google_chrome_chrome.1000.crash
-rw-------  1 whoopsie whoopsie        0 Jul 23 21:59 linux-image-4.4.0-83-generic.285645.uploaded
-rw-r--r--  1 root     whoopsie        0 Jul 23 21:59 linux-image-4.4.0-83-generic.285645.upload
-rw-r-----  1 kernoops whoopsie     8789 Jul 23 21:55 linux-image-4.4.0-83-generic.285645.crash
-rw-r-----  1 kernoops whoopsie     7976 Jul 23 15:07 linux-image-4.4.0-83-generic.220593.crash
-rw-r-----  1 kernoops whoopsie     8746 Jul 23 15:06 linux-image-4.4.0-83-generic.255332.crash
-rw-------  1 whoopsie whoopsie        0 Jul 23 15:06 ttf-mscorefonts-installer.0.uploaded
-rw-r--r--  1 root     whoopsie        0 Jul 23 15:06 ttf-mscorefonts-installer.0.upload
-rw-r-----  1 root     whoopsie   153662 Jul 23 15:06 ttf-mscorefonts-installer.0.crash
-rw-r--r--  1 kernoops whoopsie     3484 Jul 23 03:10 linux-image-4.4.0-83-generic.245092.crash
-rw-r-----  1 zachary  whoopsie 12051671 Jul 19 01:52 _usr_bin_compiz.1000.crash
-rw-r-----  1 zachary  whoopsie   238085 Jul 18 10:44 _usr_lib_dconf_dconf-service.1000.crash
-rw-r--r--  1 kernoops whoopsie     2823 Jul 16 14:03 linux-image-4.4.0-83-generic.215830.crash
drwxr-xr-x 14 root     root         4096 May 21 23:22 ..

of free -h

             total       used       free     shared    buffers     cached
Mem:           62G       1.8G        61G        16M        40M       626M
-/+ buffers/cache:       1.1G        61G
Swap:          29G         0B        29G

and of swapon -s

Filename                Type        Size    Used    Priority
/dev/sda6                               partition   31250428    0   -1

also, having GRUB_CMDLINE_LINUX_DEFAULT="quiet splash nomodeset" completely broke my installation, but I hadn't rebooted at the time of writing my original post. I fixed it, however, by changing it back to GRUB_CMDLINE_LINUX_DEFAULT="quiet splash" in recovery mode. I had made this change after reading some post but I can no longer find it.

EDIT 2

An image of the MemTest run

MemTest

EDIT 3

In response to:

(heynnema) Looks like you've got a hardware problem, as I suspected. It's picking up a high bit in the data bus. First thing to do is reseat your memory sticks in their current slots. Power off the computer, unplug it from the AC, hold down the power button for 5 seconds, release and reinsert each memory stick, then rerun memtest. What is your current RAM config? How many sticks of what sizes? Report back. ps: do you have intel-microcode installed?

I was only able to reseat two of my memory sticks because the CPU and water cooler cords completely covered the other two, and I wasn't comfortable removing those components. I reran MemTest, trying both individual cores and all in parallel, and it freezes on test 2 like before.

My desktop memory is the DDR4 Corsair Vengeance. It contains four sticks each with 16GB of memory for a total of 64GB.

Here is the output of entering dmesg | grep microcode

[    8.808196] microcode: CPU0 sig=0x406f1, pf=0x4, revision=0xb00001c
[    8.808205] microcode: CPU1 sig=0x406f1, pf=0x4, revision=0xb00001c
[    8.808217] microcode: CPU2 sig=0x406f1, pf=0x4, revision=0xb00001c
[    8.808252] microcode: CPU3 sig=0x406f1, pf=0x4, revision=0xb00001c
[    8.808289] microcode: CPU4 sig=0x406f1, pf=0x4, revision=0xb00001c
[    8.808326] microcode: CPU5 sig=0x406f1, pf=0x4, revision=0xb00001c
[    8.808338] microcode: CPU6 sig=0x406f1, pf=0x4, revision=0xb00001c
[    8.808350] microcode: CPU7 sig=0x406f1, pf=0x4, revision=0xb00001c
[    8.808363] microcode: CPU8 sig=0x406f1, pf=0x4, revision=0xb00001c
[    8.808375] microcode: CPU9 sig=0x406f1, pf=0x4, revision=0xb00001c
[    8.808388] microcode: CPU10 sig=0x406f1, pf=0x4, revision=0xb00001c
[    8.808399] microcode: CPU11 sig=0x406f1, pf=0x4, revision=0xb00001c
[    8.808445] microcode: Microcode Update Driver: v2.01 <tigran@aivazian.fsnet.co.uk>, Peter Oruba

I believe that means intel microcode is installed, according to Step F on Easy Linux Tips Project (I can't yet include more than two links).

EDIT 4

In response to heynnema:

ok, some progress. no way to reach the other two simms, eh? so try this next. remove the two simms that you can reach, and see if you can still boot, and/or run memtest. if it runs, it'll tell us that one of the two pulled simms may be defective

ps: another test that we can do is to run different single CPUs during memtest. So... if it fails with CPU #0, but runs with CPUs 1-11, we may have a defective CPU.

I first ran MemTest on each distinct, individual CPU. All resulted in a freeze on the second test. I then removed the two memory sticks that are easily accessible, booted up, and was able to run MemTest. I did not try to boot into any installation.

However, after putting the two memory sticks back in, I am unable to boot Windows or Ubuntu. Windows shows my desktop background but with a blue filter and Ubuntu shows only the default, Unity background. Though in Ubuntu the computer was not completely frozen as I could enter tty1 through keyboard commands.

I ran MemTest, hoping it would give indication as to what went wrong, and it now fails on the first test. It says [CPU Error] Could not start CPU 0. I tried reseating the memory sticks again and it's still completely broken.

The Could not start CPU 0 error now occurs if I run MemTest with the two accessible memory sticks removed.

EDIT 5

I reseated the memory sticks again, and I can (sometimes) boot my Ubuntu installation, but Windows is even more broken. It simply leads to the blue screen with options to repair your computers. When I do successfully boot Ubuntu the system will usually freeze upon any attempt to open an application.

EDIT 6

In response to heynnema:

You may have actually found the problem, but missed the clue. With the 2 accessible SIMMS removed, memtest ran, but right there you should have tried to boot Ubuntu and Windows to see how they ran. But instead, you put both SIMMS back in, memtest failed, and both OS's had trouble. Remove those same two SIMMS again, retest with memtest to confirm that it still works, then boot the OS's and see how they run! More steps coming after that test. Good luck! ps: with 2 SIMMS removed, confirm that the OS's think you have 32G RAM.

I removed the accessible SIMMS and booted the PC. I entered into the terminal at the login screen and used the free -m command to check available RAM. It was 32GB. The first attempt at logging in succeeded but upon opening google chrome it froze. The second attempt led to a black screen that said the graphics card could not be found. The third attempt led to a freeze after selecting Ubuntu in grub and just prior to the login screen appearing.

I found entering tty1 at the login screen was rather stable and could do many basic commands without freezing unlike when I actually log in. Though I'm not sure that's of any relevance.

EDIT 7

In response to heynnema:

You may very well have more than one problem. Power off the computer and reseat the video card. You may have to loosen a screw that holds its bracket down, and you may have to release a catch at the lower/front of the card, or order to be able to remove/reseat it. As far as the memory is concerned, what would it take for you to get to the other two? Do you need a technician to help you? Can you see the color of the four memory slots? Sometimes they're white, or black. And beside each socket, etched on the motherboard, is a designation like J0/J1/J2/J4... can use see those?

ps2: show me sudo dmidecode -t memory.

ps3: have you overclocked the CPU or memory?

I will be having some one take a look at the PC tomorrow. Still, I checked the colors of the memory slots, and all four are grey. The four other possible memory slots are all black. For lack of time at the moment, I couldn't open up my PC to look at the socket designations.

I ran sudo dmidecode -t memory and it displayed information on all my memory devices. I couldn't copy the text, and it took several screens so I didn't take a photo, but of note was that only two devices had identified sizes or manufacturers. Both were SIMMS, since they were Corsair brand and 16GB, but I had all four SIMMS in memory slots at the time. Otherwise, unknown and NA were all the details given for other devices.

I have not overclocked my CPU or memory.

EDIT 8

I had a person to take a look at my computer. Two issues were found with the hardware:

1) Only two memory slots worked. The memory sticks themselves all worked but the motherboard was faulty. Strangely, MemTest initially picked up on 64GB of RAM, but that's no longer the case regardless of how the SIMMS are configured on the motherboard.

2) My GPUs were slightly too long for the motherboard and couldn't lock into their slots completely. There's a "sweet spot" where they work, but at some point while reseating my memory sticks, I must have jostled them.

While putting the GPUs back in better alignment and only using the two working memory slots has stopped the error messages (so far) it's not a permanent solution. I still have no answer for why issues started when I upgraded to 4.4.0-83.

1 Answers1

1

From the comments...

Lets gather some data first...

In terminal...

ls -alt /var/crash
free -h
swapon -s

The system is very unstable. Suggest we run memtest. Go to http://www.memtest86.com and download the free memory diagnostic and run it.

Sure enough, memtest fails... as I suspected... it's picking a high bit on the data bus... troubleshooting further... reseating SIMMS... removing suspect SIMMS...

Update #1:

We removed the only 2 accessible SIMMS and now memtest runs. There well may be more than one hardware problem with this computer, because after we booted Ubuntu, it complained that it couldn't find the video card. We'll try and reseat it. We need to get access to the other 2 SIMMS that are difficult to access. It may require a technicians help.

Take the 2 removed SIMMS, wrap them in foil, and take the computer to your favorite computer repair place, along with the memtest disc. Let them sort out the hardware problems, and report back. If we need to, we can continue testing any remaining software issues.

Update #2:

Final result... as I thought... there were 2 defective memory SIMM slots, and GPU's needed to be reseated. Recommend logging a warranty ticket with the motherboard manufacturer to get a replacement.

heynnema
  • 70,711