System:
- Ryzen 5, no integrated graphics
- B450 Tomahawk Max motherboard
- ADATA SX8100 512 GB SSD
- Nvidia GeForce 1660 main GPU
- Dual boot Ubuntu 20.04 and Windows 10
- UEFI firmware
- No overclocking or other tweaks
I have had occasional problems in the past where the system would enter a kernel panic on boot, complaining first that initramfs decoding failed followed by being unable to mount root. The recovery mode option for the same kernel version would also panic, although with far more messages displayed.
I would usually deal with this by selecting an older kernel, which would boot fine, and then run Boot-Repair. I would then be good for a random number of boots until it all started over again.
I was never able find the cause and just dealt with the occasional inconvenience, however now none of my kernels boot. All I can do is boot from a live USB. I updated the GRUB config from inside a chroot, so now my Windows menu option is also gone.
The recovery mode messages ask me to specify my root partition with the root=
boot option, and then says here are the available partitions
followed by a kernel panic message. It seems that it is not detecting any partitions at all. This seems confirmed by the message that it can't mount root fs on unknown-block(0,0)
indicating it can't identify what block device to use.
I've checked that the root UUID shown in the boot messages matches the UUID of my actual boot partition. I have not made any partition table modifications recently.
I've tried removing and re-seating the SSD.
How do I troubleshoot this? How do I get the kernel to detect my SSD?
Normal boot error messages:
Recovery mode boot messages
Per comments, I found the SMART status of the SSD.
Results of sudo smartctl
:
kubuntu@kubuntu:~$ sudo smartctl -a /dev/nvme0n1
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.0-42-generic] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Number: ADATA SX8100NP
Serial Number: 2J4620042048
Firmware Version: VB411D43
PCI Vendor/Subsystem ID: 0x10ec
IEEE OUI Identifier: 0x00e04c
Controller ID: 1
Number of Namespaces: 1
Namespace 1 Size/Capacity: 512,110,190,592 [512 GB]
Namespace 1 Formatted LBA Size: 512
Local Time is: Thu Dec 3 00:57:15 2020 UTC
Firmware Updates (0x0e): 7 Slots
Optional Admin Commands (0x0007): Security Format Frmw_DL
Optional NVM Commands (0x0014): DS_Mngmt Sav/Sel_Feat
Maximum Data Transfer Size: 64 Pages
Warning Comp. Temp. Threshold: 118 Celsius
Critical Comp. Temp. Threshold: 150 Celsius
Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 8.00W - - 0 0 0 0 0 0
1 + 4.00W - - 1 1 1 1 0 0
2 + 3.00W - - 2 2 2 2 0 0
3 - 0.0128W - - 3 3 3 3 4000 8000
4 - 0.0080W - - 4 4 4 4 8000 30000
Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 0
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 31 Celsius
Available Spare: 100%
Available Spare Threshold: 32%
Percentage Used: 0%
Data Units Read: 11,670,921 [5.97 TB]
Data Units Written: 7,734,266 [3.95 TB]
Host Read Commands: 0
Host Write Commands: 0
Controller Busy Time: 0
Power Cycles: 451
Power On Hours: 3,897
Unsafe Shutdowns: 319
Media and Data Integrity Errors: 0
Error Information Log Entries: 0
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Error Information (NVMe Log 0x01, max 8 entries)
Num ErrCount SQId CmdId Status PELoc LBA NSID VS
0 5490475142593210059 45613 0xa607 0x2024 0x35f8 5071432998301508804 1351944024 0xc5
1 10305710759900180890 9804 0x6c00 0xa1d6 0xcb61 27252774468141376 380130432 0xc3
2 11549487431983370324 16455 0x58c9 0xd23e 0x8147 6957061290267430970 3258320200 0xf5
3 7018321358667646096 37313 0x0e1f 0x8670 0x6242 459368911713436868 1166902044 0x10
4 11390238159922049047 38421 0xd002 0x1890 0x7d29 17438238972143084540 884054193 0x01
5 156936697365045345 26140 0x5041 0xac10 0x4265 11916595043416224210 405107254 0xd4
6 6790662844906997140 16528 0x5fc1 0x2ed1 0x77c 5801270468783952621 39946248 0xb0
7 3460708732253516421 2072 0xa101 0x610c 0xc852 13889879911473169861 2147786536 0x68
Re: motherboard BIOS version. I updated my BIOS both soon after building my PC to version 7C02v36 (dated 04/24/2020) and before asking this question to version 7C02v39 (dated 11/30/2020). It had no effect. The next most recent BIOS listed is dated 12/10/2020, but it is a beta version so I'm uncertain if trying it is a good idea.
FWIW, my Ubuntu boot partition is 30GB, and has 3GB free.
GRUB can see my boot partition, as shown in this screenshot. Immediately after snapping this picture, I typed "normal" to return to the GRUB menu. I booted with debug messages enabled, and it complained that it couldn't mount that exact partition.
update-initramfs -c -k all
followed by 'update-grub`. My boot partition is not full (3GB free). This covers all of the answers, except for this one: https://askubuntu.com/a/1048477/751380. That one just demos that having not detecting the disk at all causes the error message seen - so my question is, why is linux not seeing my disk, but windows can? – rothloup Dec 21 '20 at 19:15It fixes things for a little while
. How long does it work? Is it possible to narrow down the problem by the logs, i.e, what causes the next break ? – RedEyed Dec 24 '20 at 18:03journalctl -b0 -p4 > logs.txt
:-b0
means logs of the current boot,-b1
means the boot before the current and so on.-p4
means: show only warnings and errors. So, by changing the number after-b
you can view logs of previous boots. – RedEyed Dec 25 '20 at 09:18dd
a copy of that ubuntu partition onto it, I'd be interested to see if the same thing occurs. Of course youd have to create a new UUID for that new partition, edit its /etc/fstab, and then update-grub... Along those lines, I'd also check the power supply... but again those are hardware/corruption type troubleshooting – WU-TANG Dec 25 '20 at 16:20