2

I'm running (or rather, was running) Ubuntu

  • The machine seemed fine yesterday; This morning, it was unresponsive
  • It's a tower, so there were no mechanical disturbances in the meantime (the only plausible disturbances would be atmospheric temperature/humidity cycles or faults/irregularity of the building electrical supply). This is relevant because several similar questions are about laptops, where these errors are related to mechanical failures. I suspect this is simply an SSD drive failure or file-system corruption, but I don't know how to diagnose/repair/recover from it.

Reboot eventually ends up in busybox, after printing errors (abbreviated, since I'm copying these manually from the screen):

/dev/sda1 recovering journal
... failed command: WRITE FPDMA QUEUED (several times)
... COMRESET failed errno=-16 (a few times)
... drop into busybox
  • Various error messages point to the boot drive, sda being corrupted in some way
  • I can still get the GRUB boot menu, which tells me the drive is technically bootable?
  • From busybox (or the Ubuntu Studio install USB stick), fsck.ext4 /dev/sda1 results in a long message that, among other things, says superblock could not be read or does not describe a valid ext2/ext3/ext4 filesystem
  • From the Ubuntu Studio install USB stick, fdisk /dev/sda results in cannot open ...: Input/Output error
  • The drive passed the HP bios checks
  • The drive passed the Dell disk tool checks

I would need to open the machine up again to determine, but I think it's an ADATA-brand SSD, if that matters.

The only Ubuntu version information I can read is that which is provided by GRUB. It says I have access to Ubuntu, Linux 5.4.0-128-generic, and 122 generic. Neither work, recovery mode or not.

Question:

  1. Can you point me to a clear, simple-to-understand, set of diagnostic procedures/commands to run, when simply running fsck/fdisk doesn't automatically fix the problem?
  2. Can you help me understand why the Dell/HP disk check tools think the drive is fine, and I can boot into GRUB and run Memtest, but I'm otherwise unable to boot (any installed) version of Ubuntu, or successfully inspect/repair the drive/partition from busybox or the Ubuntu Studio install stick?
  3. Assuming the drive has failed, but is partially working, can you point me to a clear, simple-to-understand, procedure for retrieving its data?

I've done a bit of research and these questions didn't help:

  • WRITE FPDMA QUEUED questions:

    • ata7.00: failed command: SEND FPDMA QUEUED asks has no accepted answer, and the answer that is given makes no sense to me whatsoever (and almost certainly doesn't apply).

    • Failed command: WRITE FPDMA QUEUED. Brand New Samsung SSD Drive errors has no accepted answers. The answer that is given in relation to specific issues with new Samsung drives. The drive I have has been working find for years in this hardware configuration. Firmware issues are clearly irrelevant. Random READ FPDMA QUEUED at boot is probably the same issue (new Samsung SSDs), and I don't see how it could apply to a system that had been running stably for years.

    • Kernel reporting READ FPDMA QUEUED has no answers.

    • failed command: READ FPDMA QUEUED, only on Tuesdays has no accepted answers, and the given answer is related to a periodic fstrim job, which is very much not applicable to this situation.

    • (Disk errors during boot, failed command: READ FPDMA QUEUED has no answers, and was erroneously marked as a duplicate of a much less clearly written question that is not obviously related to the posted issue. Suggested comments are a drive failure or cable/controller failure; Steps taken so far suggests that there is no drive failure in this case; I haven't been able to find tools/commands to progress past this, so my question remains unresolved.

      To be fair, the "duplicate" question does contain many non-specific answers that may or may not be relevant, but I can't find a clear, simply-stated diagnostic procedure to determine which. It's suggested it could also be a power cable, or a power-supply issue, in addition to many other suspects. But, what specific steps/commands should I run to identify which of these issues (if any) it is?

    • How does one deal with "read FPDMA QUEUED" error? mentions READ FPDMA QUEUED, as opposed to the "WRITE FPDMA QUEUED" errors I'm seeing. It has no answer, but comments suggest any of several possible hardware faults. Given that I've taken steps to determined that it is specifically this drive (not the controller or cable or the port) experiencing problems, I think we can provisionally rule this out. The SSD has evidentally failed, yes, in some way that makes it impossible to run fsck/fdisk on, but yet still gets as far as the GRUB bootloader. Surely there must be some way to at least recover whatever data can be read? And we still need to explain why the HP and Dell diagnostic tools think the drive is fine, but Linux can't touch it.

  • COMRESET questions:

  • Bad superblock questions

muru
  • 197,895
  • 55
  • 485
  • 740
MRule
  • 440

0 Answers0