my Ubuntu server recently crashed and since then I am struggling to get it back.
The server became unresponsive, ping returned sporadic returns and none of the services (SSH or Webmin) would connect. Shutdown wasn't possible either so I eventually had to switch it off.
The hard reset seems to have destroyed the root file system as the boot folder and many others were empty which meant I ended up in grub rescue mode after the reboot.
Well, decided to reinstall the OS which is where my journey begins.
First, what's working:
- New installation works without a problem
- All drives are found, including the raid
- when opening a shell in USB drive rescue mode I can mount all drives without a problem, (raid and backup drive)
Setup is
- SSD for the OS, home and swap (3 separate partitions)
- 3 4TB drives for a software raid 10 (one spare)
- a separate 2 TB swappable drive for offline backups
And here's where I am stuck:
The server boots, shows the grub window and loads the kernel (lots of the usual status messages...)
The last successful messages seem to be
Begin: Loading essential drivers ... done
Begin: Running scripts/init-premount ... done
[19.000] random: fast init done
Begin waiting for root file system
From there on there are lots of the below
Begin: Running scripts/local-block ... mdadm: no devices listed in config file were found
done
Until it finaly gives up with
Gave up waiting for root device. Common problems...
...
ALERT! UUID=.... does not exist. Dropping to shell
After which the system freezes.
The UUID listed is correct and represents the boot partition of my SSD.
Thís somehow looks like none of the drives are accesible all of a sudden, neither the boot drive (UUID error) nor the raid array (mdadm error message)
I tried grup-updates and reinstalls which all give me strange errors. But whenever I a boot from my USB stick, select the rescue option and open a shell with the ssd-boot partition I can happily see and mount all partitions.
Some of the grup messages I am getting:
grub-update
Found linux image....
Found initrd image....
WARNING: Failed to connect lvmdat. Falling back to device scanning
grup-probe: error: cannot find a GRUB drive for dev/sdb1 check your device map
I checked /etc/fstab, and all entries look good to me. UUIDs macth what I would expect, / SWAP and are available
Anyone's got an idea of where to look next? My next step would be to completely repartition the SSD which I would like to avoid...
Thanks Thomas