27

I've noticed that on my home Ubuntu server one drive is read only for some reason. Digging up I found that this can happen when hard drives have errors. I used badblocks to check for errors, and indeed I have some damaged sectors.

In most cases the only rational course of action is to try to backup data, remove the HDD and buy a new one. However, this server doesn't have anything I already don't have backed up on multiple places, and I'd like to use it till it dies. I use it for streaming music and running some simple scripts. In any case, it would be a big fuss reinstalling everything.

Is there a way to mark these bad blocks without formatting a hdd?

heynnema
  • 70,711
enedene
  • 1,659

2 Answers2

38

I assume you are talking about physical bad blocks on a disk and not about corrupted file systems.

To check the physical condition of your disk it's best to install smartmontools

sudo apt-get install smartmontools

This works because all modern disks log their health status using a system called S.M.A.R.T.

Use the smartctrl command to read out this status. For example to read all attributes from the first disk call

sudo smartctl --all /dev/sda

Watch out for a line talking about the overall heath status. Once this indicates an error it's very likely that the disk will fail soon.

SMART overall-health self-assessment test result: PASSED

Other lines you want to check for are the Pending Sector Count and the Reallocated Sectors.

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       48
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       2

Reallocated lists usually in the raw field the number of bad sectors the disk exchanged for working spare ones. Pending are sectors which might be reallocated in case the next write fails.

You can even trigger self tests of the disk when supported by your model

sudo smartctl -t long /dev/sda

To force checking of all sectors, use badblocks in a mode in which data is written. Beware that even though in general it is safe to run, it will put extra load on your disks, which might cause them to fail. Always have a backup of your data.

sudo badblocks -svvn -c 262144 /dev/sda

The output from the badblocks command will show you many lines with

hh:mm:ss elapased. (x/y/z errors)

where

x = num_read_errors
y = num_write_errors
z = num_corruption_errors

If you have fully processed your disk this way, the disk controller should have replaced all bad blocks by working ones and the reallocated count will be increased in the SMART log.

user228505
  • 1,659
  • 1
  • 16
  • 15
  • 4
    what is the influence of -c or the number of blocks which are tested at a time or in other words why do you such a large value compared to the default of 64? – greole Sep 05 '14 at 15:26
  • 6
    default block-size of 1024 bytes multiplied by the default of 64 blocks at a time leads to only 64k processed on each request. With modern disks having a throughput of up to 150 MiB/s this introduces more overhead than I feel comfortable with. I want to give the drive the chance to process the data in the most efficient way without waiting for the data on the bus. – user228505 Dec 08 '14 at 12:20
  • Says: /dev/sda is apparently in use by the system; it's not safe to run badblocks! – Dims Mar 24 '16 at 20:09
  • 1
    @Dims It tells you to not run this on an actively mounted disk. Unmount the disk first. In your case it might be your system drive. So boot to a rescue system first. – user228505 Mar 26 '16 at 07:09
  • 2
    @Dims There is -force option if you want to run it anyways. i.e. sudo badblocks -svvn -c 262144 /dev/sda -force – ADR May 13 '16 at 17:35
  • Is it -svn or -svvn? Why the -v twice? – Rodrigo Mar 04 '20 at 19:27
  • 1
    @Rodrigo two -v increase the verbosity, so you see what is going on – user228505 Mar 06 '20 at 10:00
  • From man badblocks "it is strongly recommended that users not run badblocks directly, but rather use the -c option of the e2fsck and mke2fs programs". – heynnema Apr 26 '21 at 15:57
  • Thank you for pointing this out. Yes. Badblocks works on a very low level. It is certainly preferred to run it from within the file-system layer to keep the accounting details there in sync. Also if using ext2/3/4 filesystem, start with these tools. 7 years later many block devices are no longer HDD, but SSD. I still believe the re-mapping of sectors on SSD vs HDD is transparent to the OS as before, due to handling by drive firmware and spare blocks. For a SSD a read instruction should be sufficient for the controller to refresh a weak block. – user228505 Apr 28 '21 at 06:59
7

Although this is an old question, I'm posting this because badblocks should not be used to bad block a disk. See man badblocks for more info.

"It is strongly recommended that users not run badblocks directly, but rather use the -c option of the e2fsck and mke2fs programs".

Notes

  • Do NOT abort a bad block scan!
  • Do NOT bad block a SSD
  • Backup your important files FIRST!
  • This will take many hours
  • You may have a pending HDD failure

Boot to a Ubuntu Live DVD/USB in “Try Ubuntu” mode.

In Terminal:

sudo fdisk -l  # identify all "Linux Filesystem" partitions

sudo e2fsck -fccky /dev/sdXX # non-destructive read/write test (recommended)

or sudo e2fsck -fcky /dev/sdXX # read-only test

The -k is important, because it saves the previous bad block table, and adds any new bad blocks to that table. Without -k, you loose all of the prior bad block information.

The -fccky parameter:

   -f    Force checking even if the file system seems clean.

-c This option causes e2fsck to use badblocks(8) program to do a read-only scan of the device in order to find any bad blocks. If any bad blocks are found, they are added to the bad block inode to prevent them from being allocated to a file or direc‐ tory. If this option is specified twice, then the bad block scan will be done using a non-destructive read-write test.

-k When combined with the -c option, any existing bad blocks in the bad blocks list are preserved, and any new bad blocks found by running badblocks(8) will be added to the existing bad blocks list.

-y Assume an answer of `yes' to all questions; allows e2fsck to be used non-interactively. This option may not be specified at the same time as the -n or -p options.

Pablo Bianchi
  • 15,657
heynnema
  • 70,711
  • That's right. For a non-SSD hard dist of 1 TB in size, badblocks -svn /dev/sda command took almost two days! – Megidd Dec 21 '21 at 10:33
  • 1
    @user3405291 You didn't follow the instructions in my answer. – heynnema Dec 21 '21 at 14:57
  • That's right! I used badblocks before studying your post. From now on, I'm going to follow your instructions :) – Megidd Dec 22 '21 at 06:29
  • I was getting /dev/sdb is in use on a device with LUKS. It wasn't mounted but already mapped (this helped). Worked using /dev/mapper/luks-...instead. – Pablo Bianchi Oct 18 '22 at 22:57
  • Seems like this will only work on extfs devices/partitions. I tried it with fsck.fat and it doesn't support the same command options. Any idea how to do this for FAT32 drives? – Don Joe Jul 31 '23 at 00:01
  • @DonJoe What specific problem are you trying to solve? Is this a partition on your hard disk, or something else? Exactly what command did you try? Do "man fsck.fat" for more command-line options. Do you have access to Windows, where you could use "chkdsk"? – heynnema Jul 31 '23 at 01:50
  • Checking my phone's add-on SD card for signs of aging and data corruption so I can replace it if it's no longer reliable. Ran the same command as above but with fsck.fat. Honestly yeah, I should've booted to Windows to do a chkdsk :)) but it would still be good to know if any solution like yours (not calling badblocks directly) exists on *buntu for non-extfs drives. – Don Joe Jul 31 '23 at 07:20
  • @DonJoe SD cards can go bad. Best way is to copy off whatever data you can reliably get, then re-format the card (using the long format choice if it's available). If it fails to format, toss it. – heynnema Jul 31 '23 at 12:56
  • @heynnema That's not an issue, I have better copies on magnetic HDDs that I can restore from. I'm looking to find out if this card is at the age where it can start corrupting data, in which case I will immediately stop using it and replace it with a newer one. As far as I understand no formatting or zeroing out can bring back the lost reliability past this point where automatic sector reallocation (if any has ever been performed by my Android?) has started failing to prevent data corruption. Going forward I know I should periodically zero out and re-write for max. data longevity on SD cards. – Don Joe Jul 31 '23 at 14:04
  • @DonJoe The long format will give you a good indicator about the condition of the SD card. – heynnema Jul 31 '23 at 14:07