Repairing a degraded RAID array

One of the team servers, which was set up with a RAID1 array had a disk that went bad. Here’s a rough outline of how that was fixed.

First, figure out which hard drive was bad.
To do this, we did this:
look into /proc/mdstat and find out what hard drives are assigned to the raid array. You should be able to identify the drive that is still good. For us the good disk was /dev/sda.
Once we knew it was /dev/sda, run ls -l /dev/disk/by-id. It will show an identification of the disks (usually you could figure out the manufacturer name from there), and linked with it what the drive is referred to /dev/sda, /dev/sdb, etc.). If your drives are from the same manufacturer, you might be able to figure out which one is bad by this command ls -l /dev/disk/by-path. From there, try go figure out what path corresponds to what sata connector. If that still fails, you might need to just pull one of the two disks, and see if your machine boots. If it doesn’t, you know that you disconnected the good disk.

Find out the partition size(s) that are raided. We used fdisk /dev/sda, issued the p command to find that out.

Once the bad drive is identified, replace it with a new disk.

Once rebooted, fdisk the new disk.
Create the partition that you want to raid. Make sure its size is the same as you recorded earlier. Make sure that you set its partition type to fe (raid auto detect).

Once formatted, issue the following command as super user
mdadm –add /dev/mdx /dev/sdyz, where you have to replace x and z with the appropriate digits, and y with an appropriate letter.
For example, if the raid array is /dev/md0 and you want to add /dev/sdb1 to the array do this: mdadm –add /dev/md0 /dev/sdb1

You should see the raid array repairing itself by looking at /proc/mdstat

Leave a Reply