I have a dual boot setup using Ubuntu Mate 17.04, fully patched and up-to-date. (The other OS, Windows 10, was installed first. Then, Ubuntu was installed using default options second.)
However, recently, every time sudo apt-get dist-upgrade
runs non-trivially (i.e., trying to install something), a curious sequence of events happens:
- The first couple of commands will print out, appearing to succeed.
- The rest will fail with weird errors about a read-only filesystem, and the operation will abort itself.
- Boot now fails. Sometimes, I can get it to boot into Windows, but booting into Linux either fails or drops into a recovery shell.
- I boot using a USB recovery drive.
gparted
gives weird warnings about a bad superblock. I go about fixing it (only last command is necessary, but first two tell me what to do):sudo fdisk -l | grep Linux | grep -Ev 'swap'
sudo dumpe2fs /dev/nvme0n1p6 | grep superblock
sudo fsck -b 32768 /dev/nvme0n1p6 -y
- It prints out a lot of garbage and negative numbers and says it changes stuff.
- Reboots into the real Linux just fine.
sudo apt-get dist-upgrade
now runs to completion with no weird errors.- Reboot to either OS works.
sudo apt-get autoremove
to remove old Linux kernels.- Reboot fails. Fix it again using steps 4–7.
What's weird is that this is repeatable (this sequence has happened the last 3 times I've tried dist-upgrade), and it doesn't happen with other operations.
My suspicion is that Ubuntu is updating the superblock incorrectly when trying to update the bootloader for each updated Linux kernel. Things I've tentatively ruled out:
- It's not Windows, per se. The corruption happens without ever booting into Windows. Linux might be getting confused by the boot schema, however.
- It's not the hard drive. The drive is a nearly new SSD, and the first time this happened, I verified the drive surface searching for bad blocks. In any case, the drive HW would remap the bad block once discovered, and it wouldn't corrupt the superblock again.
My question: What can I do to fix this? Otherwise, what should I look at, log-wise, config-wise the next time this inevitably happens, so that it can be debugged?
sudo apt-get update
? – ADDB Jun 13 '17 at 10:16apt-get update
,upgrade
,dist-upgrade
,autoremove
. – geometrian Jun 13 '17 at 20:56autoremove
is one of the commands causing the issues. @ADDB: All non-nop commands used were listed fully. Point of fact, from an apparently fine system, ifdist-upgrade
orautoremove
do anything, it breaks it. – geometrian Jun 13 '17 at 21:03