In a Linux RAID array (Software RAID), when a failed hard disk replacement occurs or you need to remove a failed hard disk and want to add a new hard disk to the RAID array without losing data, your RAID array is left with one functioning disk in Software RAID and one empty disk.
This is also the same as if you have accidentally deleted the partition tables or have altered the partition settings and corrupted the data.
In such case, you need to copy the intact Partition Table of the functioning disk to the new empty disk and then rebuild the software RAID array with the help of the “mdadm” command - additional instructions may need to be applied if you have corrupted the data.
Advisory: Please remember the correct drive/partition and don’t forget it, otherwise you may wipe both drives and suffer a serious loss of data and I cannot be held accountable.
Step 1:
Login to your server via SSH.
$ ssh root@yourserver.domain
Step 2:
Safety first! Your data is important, backup your data - you have been warned.
This is very dangerous and one small mistake will cost you your life!
Step 3:
If you don’t have “mdadm”, install it depending on your package manager:
$ yum install mdadm
$ apt-get install mdadm
$ aptitude install mdadm
Or, you may use whatever command you’re comfortable with installing packages under your Linux System (such as RPM).
Step 4:
Check the status of the RAID array using the command:
$ cat /proc/mdstat
This returns the output of multi-disk status.
Below is an example of two functioning healthy RAID arrays.
[root@local~]# cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 sdb2[2] sda2[0]
20478912 blocks [2/2] [UU]
md3 : active raid1 sdb3[2] sda3[0]
1902310336 blocks [2/2] [UU]
unused devices: <none>
In the above example, it shows two healthy RAID 1 arrays.
As each array has a status of [2/2] and [UU], this means that out of 2 partitions in the array, both the two partitions are functional.
So you can better understand, sda is disk one, and sda is disk two. You then have the partition table numbers, e.g (1,2,3,4):
$ fdisk -l or parted -l
Number Start End Size File system Name Flags
1 20.5kB 1049kB 1029kB primary bios_grub
2 2097kB 21.0GB 21.0GB ext3 primary raid
3 21.0GB 1969GB 1948GB ext3 primary raid
4 1969GB 2000GB 31.5GB linux-swap(v1) primary
As you can see in the above example, we are going to be linking sdb2 ( primary raid partition ) to the matching sda2 ( primary raid partition ) in the array, ignoring the number 1 bios and number 4 swap partitions and again with sdb3 ( primary raid partition ) to the matching sda3 ( primary raid partition ) - recovering your data to the new disk, sdb.
Step 5:
In our case, the /dev/sdb has already been replaced and /dev/sda is functioning well, otherwise you would need to remove the failed drive from the array before you continue to reboot the server and replace the drive:
$ mdadm --manage /dev/md2 --remove /dev/sdb2
$ mdadm --manage /dev/md3 --remove /dev/sdb3
In the below output of a degraded array, the partitions are not listed, so you will not need to remove them from the array.
[root@local~]# cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 sda2[0]
20478912 blocks [2/1] [U_]
md3 : active raid1 sda3[0]
1902310336 blocks [2/1] [U_]
unused devices: <none>
The output above shows instead of one of the partitions in each array being marked as healthy (as in the above example), they are not listed at all - ( sdb2 ) are ( sdb3 ) are missing from ( md2 ) and ( md3 ).
Step 6:
Now, issue “Fdisk -l ” command to list all the partitions of both disks.
Important:
If “fdisk -l ” returns the following message:
WARNING: GPT (GUID Partition Table) detected on '/dev/sda'! The util fdisk doesn't support GPT. Use GNU Parted.
Your disk’s Partition Table in not MBR.
Instead, it’s GPT (GUID Partition Table).
In that case, you use “parted -l ” command.
If “fdisk -l ” displays “Disk /dev/sdb doesn’t contain a valid partition table ” for the failed disk /dev/sdb, it’s fine.
If “fdisk -l ” or "parted -l" lists partitions on the failed disk /dev/sdb, you need to enter into fdisk / parted and delete the partitions - Google how to use parted, gdisk or fdisk if you are not familiar with the process.
Once you have removed the partitions, reboot the server to re-read the partition tables.
Execute either command “reboot” or “shutdown -r now”.
You do not always need to reboot the server, and could try to reload the partition table using a general hard-disk utility such as:
partprobe
# partprobe /dev/sdb
Hdparm
hdparm -z /dev/sdb
Step 7:
Now replicate the partition by copying the Partition Table of the healthy disk (/dev/sda) to the empty disk (/dev/sdb).
Please be extra careful by providing the right disk names, otherwise it will wipe out the data in functioning healthy drive.
For MBR disks (replaced sdb drive) execute:
$ sfdisk -d /dev/sda | sfdisk /dev/sdb
Sometimes you may encounter an error such as " sfdisk: ERROR: sector 0 does not have an msdos signature "
In that case, execute the command with “–force” option:
$ sfdisk -d /dev/sda | sfdisk --force /dev/sdb
For GPT disks (replaced sdb drive) execute:
# sgdisk --backup=table /dev/sda
# sgdisk --load-backup=table /dev/sdb
# sgdisk -G /dev/sdb
The partition tables should now match both sda and sdb.
Step 8:
Now, we are going to use “mdadm” command to get detailed information on the status of the RAID arrays, execute:
$ mdadm --misc --detail /dev/md2
$ mdadm --misc --detail /dev/md3
The output may look like below:
# mdadm --misc --detail /dev/md2
/dev/md2:
Version : 0.90
Creation Time : Tue Dec 30 00:01:43 2014
Raid Level : raid1
Array Size : 20478912 (19.53 GiB 20.97 GB)
Used Dev Size : 20478912 (19.53 GiB 20.97 GB)
Raid Devices : 2
Total Devices : 1
Preferred Minor : 1
Persistence : Superblock is persistent
Update Time : Fri Mar 30 18:34:21 2018
State : clean, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 0
Spare Devices : 0
UUID : c1d4fc4e:649242d5:a4d2adc2:26fd5302
Events : 0.714840
Number Major Minor RaidDevice State
0 8 2 0 active sync /dev/sda2
1 8 18 1 removed
Notice that, “State :” could be anything like “active, degraded” or “clean, degraded” for failed disks and “active” or “clean” for functioning disks. Also, take notice of the last three lines.
Step 9:
To find out which partition should be added to which array, execute:
"cat /etc/mdadm.conf” or "cat /etc/mdadm/mdadm.conf”.
[root@localhost ~]# cat /etc/mdadm.conf
ARRAY /dev/md2 UUID=c1d4fc4e:649242d5:a4d2adc2:26fd5302
ARRAY /dev/md3 UUID=af5d17c4:9a30e18f:a4d2adc2:26fd5302
Step 10:
The /dev/sdb should now have been replaced, and you need to add the /dev/sdb partitions to the correct arrays.
The output from the Step 9 states that /dev/sdb2 should be added to the /dev/md2 array, so execute:
$ mdadm /dev/md2 --manage --add /dev/sdb2
Check the RAID array status by issuing the “cat /proc/mdstat ” command.
The correct partition should have been added to the array, data should begin copying over to the new drive and rebuilding of /dev/sdb1 will occur.
Once the rebuilding process is done the output of “ mdadm –misc –detail /dev/md2 ” should display:
# mdadm --misc --detail /dev/md2
/dev/md2:
Version : 0.90
Creation Time : Tue Dec 30 00:01:43 2014
Raid Level : raid1
Array Size : 20478912 (19.53 GiB 20.97 GB)
Used Dev Size : 20478912 (19.53 GiB 20.97 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 2
Persistence : Superblock is persistent
Update Time : Fri Mar 30 18:34:21 2018
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
UUID : c1d4fc4e:649242d5:a4d2adc2:26fd5302
Events : 0.714840
Number Major Minor RaidDevice State
0 8 2 0 active sync /dev/sda2
1 8 18 1 active sync /dev/sdb2
You have now activated raid on partition number 2 of sdb - all the data from partition 2 of the old disk is now being copied over.
Do the same for the /dev/sdb3 partition by executing:
$ mdadm /dev/md3 --manage --add /dev/sdb3
The following output will be displayed if you have been successful:
mdadm --misc --detail /dev/md3
/dev/md3:
Version : 0.90
Creation Time : Tue Dec 30 00:01:44 2014
Raid Level : raid1
Array Size : 1902310336 (1814.18 GiB 1947.97 GB)
Used Dev Size : 1902310336 (1814.18 GiB 1947.97 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 3
Persistence : Superblock is persistent
Update Time : Fri Mar 30 19:00:43 2018
State : active, degraded, recovering
Active Devices : 1
Working Devices : 2
Failed Devices : 0
Spare Devices : 1
Rebuild Status : 3% complete
UUID : af5d17c4:9a30e18f:a4d2adc2:26fd5302
Events : 0.3799693
Number Major Minor RaidDevice State
0 8 3 0 active sync /dev/sda3
2 8 19 1 spare rebuilding /dev/sdb3
cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 sdb2[1] sda2[0]
20478912 blocks [2/2] [UU]
md3 : active raid1 sdb3[2] sda3[0]
1902310336 blocks [2/1] [U_]
[>....................] recovery = 3.1% (59272512/1902310336) finish=2234.9min speed=13743K/sec
unused devices: <none>
Step 11:
Now that both the partitions /dev/sdb2 and /dev/sdb3 are recovered and added to correct arrays and the arrays are rebuilt, you need to enable the swap partition for the new drive.
To verify the swap partitions, execute the command:
$ cat /proc/swaps
Filename Type Size Used Priority
/dev/sda4 partition 30718972 49244 -1
As you can see, the swap partition number 4 on sdb has not been created.
To enable the swap partition for /dev/sdb, issue the commands:
$ mkswap /dev/sdb4
$ swapon -p 1 /dev/sdb4
(or)
$ swapon -a
Step 12:
Issue a final “fdisk -l" or "parted -l" to verify the partitions match both of the disks.
# parted -l
Model: ATA HGST HUS724020AL (scsi)
Disk /dev/sda: 2000GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Number Start End Size File system Name Flags
1 20.5kB 1049kB 1029kB primary bios_grub
2 2097kB 21.0GB 21.0GB ext3 primary raid
3 21.0GB 1969GB 1948GB ext3 primary raid
4 1969GB 2000GB 31.5GB linux-swap(v1) primary
Model: ATA HGST HUS724020AL (scsi)
Disk /dev/sdb: 2000GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Number Start End Size File system Name Flags
1 20.5kB 1049kB 1029kB primary bios_grub
2 2097kB 21.0GB 21.0GB ext3 primary raid
3 21.0GB 1969GB 1948GB ext3 primary raid
4 1969GB 2000GB 31.5GB linux-swap(v1) primary
If you require any additional support, or would like to ask a question - please comment below or open a support ticket at https://webhost44.com/
Regards,