RAID 10 is not equal to RAID1+0

Category

I just ran into a document stating that RAID 10 is not equal to RAID1+0 . Below is what she said :

There is little documentation on Linux's RAID 10, or how it differs from RAID 1+0. Fortunately, some readers passed on some good information on what RAID 10 is, and what sets Linux's implentation apart. First of all, never mind that everyone says that RAID 10 and 1+0 are the same thing and that even man md says this — they're not the same. RAID 10 is a new type of array; it is a single array, it can have an odd number of drives, and the minimum required are two. RAID 1+0 is a combination of multiple arrays, it requires at least four disks, and must always have an even number.

Below is the comparison:

Linux RAID 10 needs a minimum of two disks, and you don't have to use pairs, but can have odd numbers. You can read all about it in Resources, so I'll sum up the basic differences:
    * RAID10 provides superior data security and can survive multiple disk failures
    * RAID10 is fast
    * RAID10 is considerably faster during recovery— RAID5 performance during a rebuild after replacing a failed disk bogs down as much as 80%, and it can take hours. RAID10 recovery is simple copying.
    * RAID5 is susceptible to perpetuating parity and other errors

This came to my attention since I always think that RAID 10 requires at least 4 drives but it's not. So, I did some more research and found a few documents said about it. (Since most documents usually said RAID10 is the same as RAID1+0.) I will just post what I found below:

This is now implemented by Neil Brown in Linux kernel 2.6.9 who has reported that a raid10 with 2 devices and a layout of "far=2" will behave exactly as suggested here.

There is more info on setup and performance on raid10,f2 in the linux-raid Howto and FAQ.

Analyses:

RAID 1 does not give performance enhancements for sequential reads, while it gives some enhancement when multiple operations are concurrent, due to its optimizing of head movements. This is because RAID 1 does not allow for striping, that is that the different disks can read data for a file at the same time. The problem is especially disk seek times, which with current technology (2004) is about 8 ms on the most common 7200 RPM IDE disks. A problem is also latency time, which is the time it takes in average to reach the right sector once a disk read/write head has been positioned on a track, this is directly related to the rotation speed of the disk, for a 7200 RPM disk it is about 4 ms. You can in average get maybe 40 MB/s throughput for these types of disk, that is per 8 ms 320 KB, or 160 KB per 4 ms is transferred.
So in the time you would need to access a specific sector, (access time plus latency time) you could have transferred about half a meg of data instead. The seek and latency times are thus very time consuming, and if you can lower them, much is gained. Cylinder track to track seek time can be about 1 ms.

Design: Striping with mirroring:

The idea is here to let reading be simultaneous on the two disks and with as little seek and latency time as possible. This would mean sequential reads (and writes) of all sectors on a track, and minimizing cylinder track to track seek times to the minimum, which occurs with adjacent tracks. Buffering in the harddisk may speed up this smoothening the read or write process.

If we alternate the chunks on each of the two disks, we will get a striping effect. The mirroring of the data can then be done by using the second half of the disks and alternating the mirror chunks also there. The alternating should then be switched so one disk contains all the data of the file system, and the loss of one of the disks would still leave all the data available on the other disk.

You can get even improved performance with employing more disks, and also if one or more of these disk fail, there will be improved
performance anyway, as the remaining disks can do some striping.

Using the first half of the disks as the primary chunks and the second half of the disks as mirror backup should also improve performance slightly, especially for reads. Restricting reads to make seeks over only half of the size of the partitions should improve average seek times with a little less than double speed, and using the beginning of the disks where rotation speeds and thus transfer rates are typically somewhat faster than in the end of the partition should also do a little to improving performance.

The scheme allows for speeding up disk performance on normal pc systems by a factor of a little less than 2, using the 2 dma channels on a normal IDE controller and relatively cheap standard IDE disks.

The scheme could be called RAID-A to signify that it is almost a RAID-1 solution.

Chunk layout for two disks:

disk 1 disk 2

-------------- -----------------
! ! ! !
! ! ! !
! ! ! !
! chunk 1 ! ! chunk 2 !
! ! ! !
! ! ! !
! ! ! !
-------------- -----------------
! ! ! !
! ! ! !
! ! ! !
! chunk 3 ! ! chunk 4 !
! ! ! !
! ! ! !
! ! ! !
! ! ! !
! ! ! !
-------------- -----------------
! ! ! !
! ! ! !
! ! ! !
! ! ! !
! chunk 5 ! ! chunk 6 !
! ! ! !
! ! ! !
! ! ! !
-------------- -----------------
! ! ! !
! ! ! !

..... half the size of the smaller disk.....

! ! ! !
! ! ! !
! ! ! !
! ! ! !
-------------- -----------------
! ! ! !
! ! ! !
! ! ! !
! chunk 2 ! ! chunk 1 !
! mirror ! ! mirror !
! ! ! !
! ! ! !
-------------- -----------------
! ! ! !
! ! ! !
! ! ! !
! chunk 4 ! ! chunk 3 !
! mirror ! ! mirror !
! ! ! !

And below is to give you the detail of 2 different RAID 10 :

There are two distinct layouts that md/raid10 can use, which I call "near" and "far". The layouts can actually be combined if you want 4 or more copies of all data, but that is not likely to be useful.

With the near layout, copies of the one block of data are, not surprisingly, near each other. They will often be at the same address on different devices, though possibly some copies will be one chunk further into the device (if, e.g. you have an odd number of devices, and 2 copies of each data block).

This yields read and write performance similar to raid0 over half the number of drives.

The "far" layout lays all the data out in a raid0 like arrangement over the first half of all drives, and then a second copy in a similar layout over the second half of all drives - making sure that all copies of a block are on different drives.

This would be expected to yield read performance which is similar to raid0 over the full number of drives, but write performance that is substantially poorer as there will be more seeking of the drive heads.

I don't have any useful performance measurements yet.

The main differences between raid10 and raid0+1 configurations are:

    * management is easier as it is one complete array rather than a combination of multiple arrays. In particular, hot spare management is more straight forward
    * The "far" data layout is easier to obtain
    * IT is simple to have a raid10 with an odd number of drives and still have two copies of each data block

So, I can conclude that there are two types of RAID10.
- RAID10 (NEAR) that most people think it's the same as RAID1+0 and required at least 4 drives.
- RAID10 (FAR) that some people calls it RAID10, f2 and required at least 2 drives.

I'm not an expert here. That's why I have to quote what people are saying. So, if you think I'm wrong, please correct me.

Sources :

Manage Linux Storage with LVM and Smartctl
Linux RAID Smackdown: Crush RAID 5 with RAID 10
An idea for a RAID 1 like mirror with striping (RAID10)
Linux MD RAID 10
RAID10 in Linux MD driver