0

So I had a problem with my server running Ubuntu 14.04 and 3x1T configured with a software RAID. I forced mdadm to mount the RAID with two disks only, I added the missing disk back to the RAID array and the system rebuilt the RAID and everything looked fine..

And now here is the problem. Every time when the server starts I see this messages

[    2.440341] md0: detected capacity change from 0 to 482848079872
[    2.460418]  md0: unknown partition table

It waits for a couple of seconds and after that it mounts the partitions as it should and everything is fine.

Here is some more info:

mdadm -D /dev/md0 /dev/md0:

        Version : 0.90   Creation Time : Sat Feb 26 10:39:28 2011  
     Raid Level : raid5  
     Array Size : 1921873792 (1832.84 GiB 1968.00 GB)   Used Dev Size : 960936896   (916.42 GiB 984.00 GB)    Raid Devices : 3   Total Devices  
: 3 Preferred Minor : 0  
    Persistence : Superblock is persistent  

    Update Time : Fri Jan 30 19:40:00 2015  
          State : clean   Active Devices : 3 Working Devices : 3  Failed Devices : 0     Spare Devices : 0

         Layout : left-symmetric  
     Chunk Size : 64K  

           UUID : 91c9bf9f:53a9ecfd:80cbc40e:2f20054f  
         Events : 0.602824  

    Number   Major   Minor   RaidDevice State  
       0       8        1        0      active sync   /dev/sda1  
       1       8       17        1      active sync   /dev/sdb1  
       2       8       33        2      active sync   /dev/sdc1    

fdisk -l

Disk /dev/sda: 1000.2 GB, 1000204886016 bytes 255 heads, 63
sectors/track, 121601 cylinders, total 1953525168 sectors Units =
sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512
bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00072f13

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *        2048  1921875967   960936960   fd  Linux raid
autodetect /dev/sda2      1921875968  1953523711    15823872   82 
Linux swap / Solaris

Disk /dev/sdb: 1000.2 GB, 1000204886016 bytes 255 heads, 63
sectors/track, 121601 cylinders, total 1953525168 sectors Units =
sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512
bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000d8a37

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1   *        2048  1921875967   960936960   fd  Linux raid
autodetect /dev/sdb2      1921875968  1953523711    15823872   82 
Linux swap / Solaris

Disk /dev/sdc: 1000.2 GB, 1000204886016 bytes 255 heads, 63
sectors/track, 121601 cylinders, total 1953525168 sectors Units =
sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512
bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000e4fef

   Device Boot      Start         End      Blocks   Id  System
/dev/sdc1   *        2048  1921875967   960936960   fd  Linux raid
autodetect /dev/sdc2      1921875968  1953523711    15823872   82 
Linux swap / Solaris

Disk /dev/md0: 1968.0 GB, 1967998763008 bytes 2 heads, 4
sectors/track, 480468448 cylinders, total 3843747584 sectors Units =
sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512
bytes / 512 bytes I/O size (minimum/optimal): 65536 bytes / 131072
bytes Disk identifier: 0x00000000

**Disk /dev/md0 doesn't contain a valid partition table**

Why is it doing this and what can I do to fix it?

Fabby
  • 34,259
MiniMe
  • 193
  • 1
  • 2
  • 11
  • http://unix.stackexchange.com/questions/28636/how-to-check-mdadm-raids-while-running – Elder Geek Jan 31 '15 at 15:16
  • Ok I run mdadm from a LiveCD. I assembled the RAID nothing was wrong there as someone around here assumed. /dev/md0 came out clean. It fixed the free inodes and free blocks numbers and that was all – MiniMe Jan 31 '15 at 17:35
  • I was a little bit unclear above. I run fsck to check if the superblock was OK and fsck said that /dev/md0 is clean – MiniMe Jan 31 '15 at 17:45
  • Please feel free to edit your question to provide clarity or if the problem seemingly went away on it's own, indicate that and we will close the question. Thank you. – Elder Geek Jan 31 '15 at 20:31
  • Yes this problem went away by itself (kind of). I also have a problem that is connected with this and that probably kicked in fsck at the next reboot. Apparently there is a bug affecting the shutdown process where you receive a message saying "mount / is busy" and the computer shuts down with / mounted which causes a fsck check at the next reboot. So the above problem initially posted disappeared that way.Fabby insisted that I had a bad superblock which I checked (see above in my comments) and now everything seems to be allright excepting that "mount / is busy" which continue to cause problems – MiniMe Jan 31 '15 at 21:18
  • bugs should be reported on launchpad so that devs can be aware of the frequency and details of these failures and resolve them. – Elder Geek Jan 31 '15 at 21:21
  • I already did that https://bugs.launchpad.net/ubuntu/+source/upstart/+bug/1019347?comments=all – MiniMe Jan 31 '15 at 21:28
  • Read tuhu post in the bug you linked. It appears to be related to Plymouth rather than upstart as you surmised. – Elder Geek Jan 31 '15 at 21:36

1 Answers1

1

Why? The individual disks are (maybe) fine, but the RAID super-block is very probably damaged due to a software/hardware combination fault.

What to do?

  1. Back up everything!
  2. Install smartmontools and to a full diagnostic of all drives

    sudo apt-get install smartmontools
    sudo smartctl --test=long /dev/sda
    sudo smartctl --test=short /dev/sdb
    sudo smartctl --test=short /dev/sdc
    

    wait until test is finished, then:

    sudo smartctl --all /dev/sda
    sudo smartctl --all /dev/sdb
    sudo smartctl --all /dev/sdc
    
  3. interpret results and see whether any drive(s) need(s) replacing (leave a comment if not clear, and did I mention to back up?)

  4. look for bad blocks:

    badblocks -nsv -o /dev/USB-Stick/BadBlocks.sda /dev/sda
    badblocks -nsv -o /dev/USB-Stick/BadBlocks.sdb /dev/sdc
    badblocks -nsv -o /dev/USB-Stick/BadBlocks.sdc /dev/sdc
    
  5. If you find bad blocks, these must be combined into 1 file (badblocks.all. Didn't I forget to mention to back up?) and passed to all drives:

    mkfs.ext4 -l /dev/USB-Stick/BadBlocks.all /dev/sda
    mkfs.ext4 -l /dev/USB-Stick/BadBlocks.all /dev/sdb
    mkfs.ext4 -l /dev/USB-Stick/BadBlocks.all /dev/sdc
    
  6. and recreate your device:

    mdadm --create --verbose /dev/md0 --level=5 --raid-devices=3 /dev/sda /dev/sdb /dev/sdc
    
  7. Restore back-up

Notes:

  • I would definitely not do a mdadm --detail --scan beforehand as you will copy the error.
  • If this is really time-critical, you can do away with 4&5 if the results from 3 are fantastic, but I would not!
  • If this is time-critical, you can do away with 5 if the results from 3&4 are fantastic, but I would not!
  • If you have the budget, get rid of software RAID5 and get a hardware RAID5 (300-500$)
  • If you have the budget, add 2 more disks and go to RAID6
  • If you have the entire week-end to do this, do a -wsv instead of an -nsv

Oh, and I wasn't joking about the backup!

Fabby
  • 34,259
  • Could this go away by itself? Now it is not stopping at that point anymore – MiniMe Jan 31 '15 at 05:41
  • Did I mention that the OS is installed on RAID and I am booting up from RAID? Can I really do the above if that is the case? – MiniMe Jan 31 '15 at 05:49
  • 1
    How about doing just this: https://linuxexpresso.wordpress.com/2010/03/31/repair-a-broken-ext4-superblock-in-ubuntu/ – MiniMe Jan 31 '15 at 05:52
  • 1. Do you have a mdadm --detail --scan from before the problem started? That is the only way I can conclusively tell you that the problem went away all by itself... (Though problems seldomly go away all by themselves in IT, they only get worse). 2. If you do a system image backup as described for user type 4, yes, you can... – Fabby Jan 31 '15 at 13:16
  • The linuxexpresso solution is for a partition on a disk. If all you have is a hammer, all problems look like nails... I wouldn't touch my RAID array with a 10-foot pole with that solution... BTW, Have you made a back-up yet??? RAID is not an excuse for not backing up... :P :| – Fabby Jan 31 '15 at 13:20
  • the RAID is the backup :)) so let's take it easier with that ... Question for you: if I have the original data and a program like BestSync to compare the files (backup and originals) and the program says there are no changes will that confirm that everything is OK ? – MiniMe Jan 31 '15 at 14:47
  • I've had a look through the tutorials, forum and Q&A and cannot find anything on the technology behind BestSync, so no, I can't guarantee that. They seem to back up files, not systems... I see noting about MBR nor superblock backup. :(

    And RAID is a warranty against single drive failure... >:) It is not a back-up! :P

    – Fabby Jan 31 '15 at 15:40
  • The point is that I have all the files on the RAID on other computers in my home. How do I run fsck against my md0 drive? Do I need to unmount it or I need to boot from a LiveCD and assemble the raid and scan it from there ? – MiniMe Jan 31 '15 at 15:56
  • You've given me data and I've arrived to a conclusion and you already know what to do. If you don't like my conclusion, down-vote the answer. If you like my conclusion, upvote. If you think it's a valid answer, accept... The only extra work I'm willing to do for you is #3 : just post the results to http://paste.ubuntu.com – Fabby Jan 31 '15 at 16:02
  • "but the RAID super-block is very probably damaged due to a software/hardware combination fault" sounds more like a guessing. Before I go out and spend 100$ for a 2TB disk and spend half day to do the above work I need to check that indeed the superblock is damaged. FSCK will do just that so please be so kind and help me with that if you will. Only when your answer makes sense to me and I understand what is going on I can vote... otherwise it will be blind voting. – MiniMe Jan 31 '15 at 16:07