0

I believe my 256GB solid-state drive needs to be replaced, but I'd like to have some definitive indication of that before making the investment.

I'm running Ubuntu 18.04 on a laptop with a solid-state hard drive and full-disk encryption. Recently, during normal operation, the system put my root partition into read-only mode. After shutting down, the machine failed to boot, reporting a disk error.

I ran Ubuntu 20.04 from a flash drive, but the Disks utility did not report the machine's internal drive. I connected the drive using USB via an adapter, and then Disks did report the drive.

I was able to run a SMART test, and it passed (as far as I can tell; there's a lot of technical information--see the excerpt below). After this, I connected the drive via the machine's internal SATA interface, and it booted successfully.

The system has since exhibited this behavior a few times: it enters "read-only" mode, I reboot with the drive connected via USB, and it then becomes able to boot via the SATA connection.

What steps can I take to verify that the hard drive is to blame?

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   096   096   010    Pre-fail  Always       -       318
  9 Power_On_Hours          0x0032   094   094   000    Old_age   Always       -       27276
 12 Power_Cycle_Count       0x0032   093   093   000    Old_age   Always       -       6972
177 Wear_Leveling_Count     0x0013   089   089   000    Pre-fail  Always       -       373
179 Used_Rsvd_Blk_Cnt_Tot   0x0013   096   096   010    Pre-fail  Always       -       318
181 Program_Fail_Cnt_Total  0x0032   100   100   010    Old_age   Always       -       0
182 Erase_Fail_Count_Total  0x0032   100   100   010    Old_age   Always       -       0
183 Runtime_Bad_Block       0x0013   096   096   010    Pre-fail  Always       -       318
187 Uncorrectable_Error_Cnt 0x0032   095   095   000    Old_age   Always       -       42737
190 Airflow_Temperature_Cel 0x0032   077   051   000    Old_age   Always       -       23
195 ECC_Error_Rate          0x001a   001   001   000    Old_age   Always       -       42737
199 CRC_Error_Count         0x003e   099   099   000    Old_age   Always       -       281
235 POR_Recovery_Count      0x0012   099   099   000    Old_age   Always       -       206
241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       52974713881

SMART Error Log Version: 1 No Errors Logged

$ uname -a
Linux bruce 4.15.0-99-generic #100-Ubuntu SMP Wed Apr 22 20:32:56 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 18.04.4 LTS
Release:        18.04
Codename:       bionic
Mike
  • 13

1 Answers1

0

These specific errors indicate a problem...

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   096   096   010    Pre-fail  Always       -       318
183 Runtime_Bad_Block       0x0013   096   096   010    Pre-fail  Always       -       318
187 Uncorrectable_Error_Cnt 0x0032   095   095   000    Old_age   Always       -       42737
195 ECC_Error_Rate          0x001a   001   001   000    Old_age   Always       -       42737
199 CRC_Error_Count         0x003e   099   099   000    Old_age   Always       -       281

If this is a Samsung SSD, then download the Samsung Magician here and check for a firmware update.

You can also check for ncq errors with the command...

grep -i FPDMA /var/log/syslog*

If you find these, I'll give you a patch.

heynnema
  • 70,711
  • Thanks! The Magician application is only available for Windows, so I'm afraid I can't use it. The answer to this question describes the "DC Toolkit" which is available for GNU/Linux, but not for my hard drive model (840 PRO). No ncq errors reported. I'll update the firmware, though I don't know what to expect from that. If this fixes the problem, should a new SMART test report a value of "0" for the attributes you've highlighted? – Mike May 16 '20 at 17:19
  • @Mike Good find! Some values may reset to zero, some may not. For the ones that don't reset to zero, you'll just have to watch that the displayed counts don't continue to increase. To run Samsung Magician, you may have to either: 1) install Windows, yuck, or 2) temporarily attach the drive to a Windows machine. Report back. – heynnema May 16 '20 at 17:26
  • Getting a Windows machine would take some doing even during normal times, so it may be a while before I can go any further on this. Thanks again for your help so far :) – Mike May 16 '20 at 18:02