RAID ( / r eɪ d / ; " redundant array of inexpensive disks " or " redundant array of independent disks " ) is a data storage virtualization technology that combines multiple physical data storage components into one or more logical units for the purposes of data redundancy , performance improvement, or both. This is in contrast to the previous concept of highly reliable mainframe disk drives referred to as "single large expensive disk" (SLED).
101-430: Although all RAID implementations differ from the specification to some extent, some companies and open-source projects have developed non-standard RAID implementations that differ substantially from the standard. Additionally, there are non-RAID drive architectures , providing configurations of multiple hard drives not referred to by RAID acronyms. Row diagonal parity is a scheme where one dedicated disk of parity
202-470: A lockstep ) added design considerations that provided no significant advantages over other RAID levels. Both RAID 3 and RAID 4 were quickly replaced by RAID 5. RAID 3 was usually implemented in hardware, and the performance issues were addressed by using large disk caches. RAID 4 consists of block -level striping with a dedicated parity disk. As a result of its layout, RAID 4 provides good performance of random reads, while
303-792: A vendor lock-in , and contributing to reliability issues. For example, in FreeBSD , in order to access the configuration of Adaptec RAID controllers, users are required to enable Linux compatibility layer , and use the Linux tooling from Adaptec, potentially compromising the stability, reliability and security of their setup, especially when taking the long-term view. Some other operating systems have implemented their own generic frameworks for interfacing with any RAID controller, and provide tools for monitoring RAID volume status, as well as facilitation of drive identification through LED blinking, alarm management and hot spare disk designations from within
404-405: A RAID 5 disk drive array depending upon the sequence of writing across the disks, that is: The figure shows 1) data blocks written left to right, 2) the parity block at the end of the stripe and 3) the first block of the next stripe not on the same disk as the parity block of the previous stripe. It can be designated as a Left Asynchronous RAID 5 layout and this is the only layout identified in
505-400: A RAID array's virtual disks in the presence of any two concurrent disk failures. Several methods, including dual check data computations (parity and Reed–Solomon ), orthogonal dual parity check data and diagonal parity, have been used to implement RAID Level 6." The second block is usually labeled Q, with the first block labeled P. Typically the P block is calculated as the parity (XORing) of
606-550: A RAID 6 array will have the same chance of failure as its RAID 5 counterpart had in 2010. Mirroring schemes such as RAID 10 have a bounded recovery time as they require the copy of a single failed drive, compared with parity schemes such as RAID 6, which require the copy of all blocks of the drives in an array set. Triple parity schemes, or triple mirroring, have been suggested as one approach to improve resilience to an additional drive failure during this large rebuild time. A system crash or other interruption of
707-606: A RAID-Z block, ZFS compares it against its checksum, and if the data disks did not return the right answer, ZFS reads the parity and then figures out which disk returned bad data. Then, it repairs the damaged data and returns good data to the requestor. There are five different RAID-Z modes: RAID-Z0 (similar to RAID 0, offers no redundancy), RAID-Z1 (similar to RAID 5, allows one disk to fail), RAID-Z2 (similar to RAID 6, allows two disks to fail), RAID-Z3 (a RAID 7 configuration, allows three disks to fail), and mirror (similar to RAID 1, allows all but one of
808-457: A Reed Solomon code is used, the second parity calculation is unnecessary. Reed Solomon has the advantage of allowing all redundancy information to be contained within a given stripe. It is possible to support a far greater number of drives by choosing the parity function more carefully. The issue we face is to ensure that a system of equations over the finite field Z 2 {\displaystyle \mathbb {Z} _{2}} has
909-600: A constant average rate. The probability of two failures in the same 10-hour period was twice as large as predicted by an exponential distribution. Unrecoverable read errors (URE) present as sector read failures, also known as latent sector errors (LSE). The associated media assessment measure, unrecoverable bit error (UBE) rate, is typically guaranteed to be less than one bit in 10 for enterprise-class drives ( SCSI , FC , SAS or SATA), and less than one bit in 10 for desktop-class drives (IDE/ATA/PATA or SATA). Increasing drive capacities and large RAID 5 instances have led to
1010-487: A dedicated RAID controller chip, but simply a standard drive controller chip, or the chipset built-in RAID function, with proprietary firmware and drivers. During early bootup, the RAID is implemented by the firmware and, once the operating system has been more completely loaded, the drivers take over control. Consequently, such controllers may not work when driver support is not available for the host operating system. An example
1111-462: A downside, such schemes suffer from elevated write penalty—the number of times the storage medium must be accessed during a single write operation. Schemes that duplicate (mirror) data in a drive-to-drive manner, such as RAID 1 and RAID 10, have a lower risk from UREs than those using parity computation or mirroring between striped sets. Data scrubbing , as a background process, can be used to detect and recover from UREs, effectively reducing
SECTION 10
#17327754796711212-456: A factor of three to four times less than a conventional RAID. File system performance becomes less dependent upon the speed of any single rebuilding storage array. Dynamic disk pooling (DDP), also known as D-RAID, maintains performance even when up to 2 drives fail simultaneously. DDP is a high performance type of declustered RAID. RAID Data is distributed across the drives in one of several ways, referred to as RAID levels, depending on
1313-470: A few ways: Write hole is a little understood and rarely mentioned failure mode for redundant storage systems that do not utilize transactional features. Database researcher Jim Gray wrote "Update in Place is a Poison Apple" during the early days of relational database commercialization. There are concerns about write-cache reliability, specifically regarding devices equipped with a write-back cache , which
1414-425: A field is an element of the field such that g i {\displaystyle g^{i}} is different for each non-negative i < m − 1 {\displaystyle i<m-1} . This means each element of the field, except the value 0 {\displaystyle 0} , can be written as a power of g . {\displaystyle g.} A finite field
1515-459: A full-stripe write. This, when combined with the copy-on-write transactional semantics of ZFS, eliminates the write hole error . RAID-Z is also faster than traditional RAID 5 because it does not need to perform the usual read–modify–write sequence. RAID-Z does not require any special hardware, such as NVRAM for reliability, or write buffering for performance. Given the dynamic nature of RAID-Z's stripe width, RAID-Z reconstruction must traverse
1616-477: A much faster rate than transfer speed, and error rates have only fallen a little in comparison. Therefore, larger-capacity drives may take hours if not days to rebuild, during which time other drives may fail or yet undetected read errors may surface. The rebuild time is also limited if the entire array is still in operation at reduced capacity. Given an array with only one redundant drive (which applies to RAID levels 3, 4 and 5, and to "classic" two-drive RAID 1),
1717-428: A non-RAID setup), but in most situations it will yield a significant improvement in performance". Synthetic benchmarks show different levels of performance improvements when multiple HDDs or SSDs are used in a RAID 0 setup, compared with single-drive performance. However, some synthetic benchmarks also show a drop in performance for the same comparison. RAID 1 consists of an exact copy (or mirror ) of
1818-418: A non-standard RAID level known as RAID 1E . In this layout, data striping is combined with mirroring, by mirroring each written stripe to one of the remaining disks in the array. Usable capacity of a RAID 1E array is 50% of the total capacity of all drives forming the array; if drives of different sizes are used, only the portions equal to the size of smallest member are utilized on each drive. One of
1919-672: A particular Galois field or Reed–Solomon error correction . RAID can also provide data security with solid-state drives (SSDs) without the expense of an all-SSD system. For example, a fast SSD can be mirrored with a mechanical drive. For this configuration to provide a significant speed advantage, an appropriate controller is needed that uses the fast SSD for all read operations. Adaptec calls this "hybrid RAID". Originally, there were five standard levels of RAID, but many variations have evolved, including several nested levels and many non-standard levels (mostly proprietary ). RAID levels and their associated data formats are standardized by
2020-446: A polynomial. The effect of g i {\displaystyle g^{i}} can be thought of as the action of a carefully chosen linear feedback shift register on the data chunk. Unlike the bit shift in the simplified example, which could only be applied k {\displaystyle k} times before the encoding began to repeat, applying the operator g {\displaystyle g} multiple times
2121-542: A read request for B2 could be serviced concurrently by disk 1. RAID 5 consists of block-level striping with distributed parity. Unlike in RAID ;4, parity information is distributed among the drives. It requires that all drives but one be present to operate. Upon failure of a single drive, subsequent reads can be calculated from the distributed parity such that no data is lost. RAID 5 requires at least three disks. There are many layouts of data and parity in
SECTION 20
#17327754796712222-487: A redundancy mode—the boot drive is protected from failure (due to the firmware) during the boot process even before the operating system's drivers take over. Data scrubbing (referred to in some environments as patrol read ) involves periodic reading and checking by the RAID controller of all the blocks in an array, including those not otherwise accessed. This detects bad blocks before use. Data scrubbing checks for bad blocks on each storage device in an array, but also uses
2323-408: A schedule set within the software. Advantages include lower power consumption than standard RAID levels, the ability to use multiple hard drives with differing sizes to their full capacity and in the event of multiple concurrent hard drive failures (exceeding the redundancy), only losing the data stored on the failed hard drives compared to standard RAID levels which offer striping in which case all of
2424-407: A scrub. The redundant information is used to reconstruct the missing data, rather than to identify the faulted drive. Drives are considered to have faulted if they experience an unrecoverable read error , which occurs after a drive has retried many times to read data and failed. Enterprise drives may also report failure in far fewer tries than consumer drives as part of TLER to ensure a read request
2525-438: A second drive failure would cause complete failure of the array. Even though individual drives' mean time between failure (MTBF) have increased over time, this increase has not kept pace with the increased storage capacity of the drives. The time to rebuild the array after a single drive failure, as well as the chance of a second failure during a rebuild, have increased over time. Some commentators have declared that RAID 6
2626-416: A set of data on two or more disks; a classic RAID 1 mirrored pair contains two disks. This configuration offers no parity, striping, or spanning of disk space across multiple disks, since the data is mirrored on all disks belonging to the array, and the array can only be as big as the smallest member disk. This layout is useful when read performance or reliability is more important than write performance or
2727-401: A single disk. However, if disks with different speeds are used in a RAID 1 array, overall write performance is equal to the speed of the slowest disk. Synthetic benchmarks show varying levels of performance improvements when multiple HDDs or SSDs are used in a RAID 1 setup, compared with single-drive performance. However, some synthetic benchmarks also show a drop in performance for
2828-503: A specific fix. A utility called WDTLER.exe limited a drive's error recovery time. The utility enabled TLER (time limited error recovery) , which limits the error recovery time to seven seconds. Around September 2009, Western Digital disabled this feature in their desktop drives (such as the Caviar Black line), making such drives unsuitable for use in RAID configurations. However, Western Digital enterprise class drives are shipped from
2929-673: A suitable irreducible polynomial p ( x ) {\displaystyle p(x)} of degree k {\displaystyle k} over Z 2 {\displaystyle \mathbb {Z} _{2}} . We will represent the data elements D {\displaystyle D} as polynomials D = d k − 1 x k − 1 + d k − 2 x k − 2 + . . . + d 1 x + d 0 {\displaystyle \mathbf {D} =d_{k-1}x^{k-1}+d_{k-2}x^{k-2}+...+d_{1}x+d_{0}} in
3030-508: A unique solution. To do this, we can use the theory of polynomial equations over finite fields. Consider the Galois field G F ( m ) {\displaystyle GF(m)} with m = 2 k {\displaystyle m=2^{k}} . This field is isomorphic to a polynomial field F 2 [ x ] / ( p ( x ) ) {\displaystyle F_{2}[x]/(p(x))} for
3131-403: A write operation can result in states where the parity is inconsistent with the data due to non-atomicity of the write process, such that the parity cannot be used for recovery in the case of a disk failure. This is commonly termed the write hole which is a known data corruption issue in older and low-end RAIDs, caused by interrupted destaging of writes to disk. The write hole can be addressed in
Non-standard RAID levels - Misplaced Pages Continue
3232-411: Is Intel Rapid Storage Technology , implemented on many consumer-level motherboards. Because some minimal hardware support is involved, this implementation is also called "hardware-assisted software RAID", "hybrid model" RAID, or even "fake RAID". If RAID 5 is supported, the hardware may provide a hardware XOR accelerator. An advantage of this model over the pure software RAID is that—if using
3333-425: Is RAID 0 (such as in RAID 1+0 and RAID 5+0), most vendors omit the "+" (yielding RAID 10 and RAID 50, respectively). Many configurations other than the basic numbered RAID levels are possible, and many companies, organizations, and groups have created their own non-standard configurations, in many cases designed to meet the specialized needs of a small niche group. Such configurations include
3434-498: Is a caching system that reports the data as written as soon as it is written to cache, as opposed to when it is written to the non-volatile medium. If the system experiences a power loss or other major failure, the data may be irrevocably lost from the cache before reaching the non-volatile storage. For this reason good write-back cache implementations include mechanisms, such as redundant battery power, to preserve cache contents across system failures (including power failures) and to flush
3535-417: Is a rewrite of the mdadm module. Disadvantages include closed-source code, high price , slower write performance than a single disk and bottlenecks when multiple drives are written concurrently. However, Unraid allows support of a cache pool which can dramatically speed up the write performance. Cache pool data can be temporarily protected using Btrfs RAID 1 until Unraid moves it to the array based on
3636-642: Is also capable of this. The software RAID subsystem provided by the Linux kernel , called md , supports the creation of both classic (nested) RAID 1+0 arrays, and non-standard RAID arrays that use a single-level RAID layout with some additional features. The standard "near" layout, in which each chunk is repeated n times in a k -way stripe array, is equivalent to the standard RAID 10 arrangement, but it does not require that n evenly divides k . For example, an n 2 layout on two, three, and four drives would look like: The four-drive example
3737-490: Is also vulnerable to controller failure because it is not always possible to migrate it to a new, different controller without data loss. In practice, the drives are often the same age (with similar wear) and subject to the same environment. Since many drive failures are due to mechanical issues (which are more likely on older drives), this violates the assumptions of independent, identical rate of failure amongst drives; failures are in fact statistically correlated. In practice,
3838-584: Is booted, and after the operating system is booted, proprietary configuration utilities are available from the manufacturer of each controller. Unlike the network interface controllers for Ethernet , which can usually be configured and serviced entirely through the common operating system paradigms like ifconfig in Unix , without a need for any third-party tools, each manufacturer of each RAID controller usually provides their own proprietary software tooling for each operating system that they deem to support, ensuring
3939-579: Is designed for offering striping performance on a mirrored array; sequential reads can be striped, as in RAID 0 configurations. Random reads are somewhat faster, while sequential and random writes offer about equal speed to other mirrored RAID configurations. "Far" layout performs well for systems in which reads are more frequent than writes, which is a common case. For a comparison, regular RAID 1 as provided by Linux software RAID , does not stripe reads, but can perform reads in parallel. The md driver also supports an "offset" layout, in which each stripe
4040-920: Is fulfilled in a timely manner. In measurement of the I/O performance of five filesystems with five storage configurations—single SSD, RAID 0, RAID 1, RAID 10, and RAID 5 it was shown that F2FS on RAID 0 and RAID 5 with eight SSDs outperforms EXT4 by 5 times and 50 times, respectively. The measurements also suggest that the RAID controller can be a significant bottleneck in building a RAID system with high speed SSDs. Combinations of two or more standard RAID levels. They are also known as RAID 0+1 or RAID 01, RAID 0+3 or RAID 03, RAID 1+0 or RAID 10, RAID 5+0 or RAID 50, RAID 6+0 or RAID 60, and RAID 10+0 or RAID 100. In addition to standard and nested RAID levels, alternatives include non-standard RAID levels , and non-RAID drive architectures . Non-RAID drive architectures are referred to by similar terms and acronyms, notably JBOD ("just
4141-427: Is given as an expression in terms of the number of drives, n ; this expression designates a fractional value between zero and one, representing the fraction of the sum of the drives' capacities that is available for use. For example, if three drives are arranged in RAID 3, this gives an array space efficiency of 1 − 1/ n = 1 − 1/3 = 2/3 ≈ 67% ; thus, if each drive in this example has a capacity of 250 GB, then
Non-standard RAID levels - Misplaced Pages Continue
4242-465: Is guaranteed to have at least one generator. Pick one such generator g {\displaystyle g} , and define P {\displaystyle \mathbf {P} } and Q {\displaystyle \mathbf {Q} } as follows: As before, the first checksum P {\displaystyle \mathbf {P} } is just the XOR of each stripe, though interpreted now as
4343-400: Is guaranteed to produce m = 2 k − 1 {\displaystyle m=2^{k}-1} unique invertible functions, which will allow a chunk length of k {\displaystyle k} to support up to 2 k − 1 {\displaystyle 2^{k}-1} data pieces. If one data chunk is lost, the situation is similar to
4444-475: Is identical to a standard RAID 1+0 array, while the three-drive example is a software implementation of RAID 1E. The two-drive example is equivalent to RAID 1. The driver also supports a "far" layout, in which all the drives are divided into f sections. All the chunks are repeated in each section but are switched in groups (for example, in pairs). For example, f 2 layouts on two-, three-, and four-drive arrays would look like this: "Far" layout
4545-456: Is implemented in the manufacturer's storage architecture—in software, firmware, or by using firmware and specialized ASICs for intensive parity calculations. RAID 6 can read up to the same speed as RAID 5 with the same number of physical drives. When either diagonal or orthogonal dual parity is used, a second parity calculation is necessary for write operations. This doubles CPU overhead for RAID-6 writes, versus single-parity RAID levels. When
4646-505: Is in a horizontal "row" like in RAID 4 , but the other dedicated parity is calculated from blocks permuted ("diagonal") like in RAID 5 and 6. Alternative terms for "row" and "diagonal" include "dedicated" and "distributed". Invented by NetApp , it is offered as RAID-DP in their ONTAP systems. The technique can be considered RAID 6 in the broad SNIA definition and has the same failure characteristics as RAID 6. The performance penalty of RAID-DP
4747-419: Is only a "band aid" in this respect, because it only kicks the problem a little further down the road. However, according to the 2006 NetApp study of Berriman et al., the chance of failure decreases by a factor of about 3,800 (relative to RAID 5) for a proper implementation of RAID 6, even when using commodity drives. Nevertheless, if the currently observed technology trends remain unchanged, in 2019
4848-410: Is only one building block of a larger data loss prevention and recovery scheme – it cannot replace a backup plan. RAID 0 (also known as a stripe set or striped volume ) splits (" stripes ") data evenly across two or more disks, without parity information, redundancy, or fault tolerance . Since RAID 0 provides no fault tolerance or redundancy, the failure of one drive will cause
4949-444: Is rebuilt using all the operational disks in the array, the bandwidth of which is greater than that of the fewer disks of a conventional RAID group. Furthermore, if an additional disk fault occurs during a rebuild, the number of impacted tracks requiring repair is markedly less than the previous failure and less than the constant rebuild overhead of a conventional array. The decrease in declustered rebuild impact and client overhead can be
5050-541: Is relatively CPU intensive, as it involves polynomial multiplication in F 2 [ x ] / ( p ( x ) ) {\displaystyle F_{2}[x]/(p(x))} . This can be mitigated with a hardware implementation or by using an FPGA . The above Vandermonde matrix solution can be extended to triple parity, but for beyond a Cauchy matrix construction is required. The following table provides an overview of some considerations for standard RAID levels. In each case, array space efficiency
5151-611: Is repeated o times and offset by f (far) devices. For example, o 2 layouts on two-, three-, and four-drive arrays are laid out as: It is also possible to combine "near" and "offset" layouts (but not "far" and "offset"). In the examples above, k is the number of drives, while n# , f# , and o# are given as parameters to mdadm 's --layout option. Linux software RAID (Linux kernel's md driver) also supports creation of standard RAID 0, 1, 4, 5, and 6 configurations. Some RAID 1 implementations treat arrays with more than two disks differently, creating
SECTION 50
#17327754796715252-451: Is that it allows any assortment of RAID 0, 1, 5, or 10 volumes in the array, to which a controllable (and identical) portion of each disk is allocated. As such, a Matrix RAID array can improve both performance and data integrity. A practical instance of this would use a small RAID 0 (stripe) volume for the operating system , program, and paging files; second larger RAID 1 (mirror) volume would store critical data. Linux MD RAID
5353-420: Is the only original level of RAID that is not currently used. RAID 3 , which is rarely used in practice, consists of byte -level striping with a dedicated parity disk. One of the characteristics of RAID 3 is that it generally cannot service multiple requests simultaneously, which happens because any single block of data will, by definition, be spread across all members of the set and will reside in
5454-477: Is typically under 2% when compared to a similar RAID 4 configuration. RAID 5E, RAID 5EE, and RAID 6E (with the added E standing for Enhanced ) generally refer to variants of RAID 5 or 6 with an integrated hot-spare drive, where the spare drive is an active part of the block rotation scheme. This spreads I/O across all drives, including the spare, thus reducing the load on each drive, increasing performance. It does, however, prevent sharing
5555-771: The Storage Networking Industry Association (SNIA) in the Common RAID Disk Drive Format (DDF) standard. The numerical values only serve as identifiers and do not signify performance, reliability, generation, hierarchy, or any other metric. While most RAID levels can provide good protection against and recovery from hardware defects or defective sectors/read errors ( hard errors ), they do not provide any protection against data loss due to catastrophic failures (fire, water) or soft errors such as user error, software malfunction, or malware infection. For valuable data, RAID
5656-406: The Storage Networking Industry Association (SNIA) in the Common RAID Disk Drive Format (DDF) standard: In what was originally termed hybrid RAID , many storage controllers allow RAID levels to be nested. The elements of a RAID may be either individual drives or arrays themselves. Arrays are rarely nested more than one level deep. The final array is known as the top array. When the top array
5757-463: The second-stage boot loader from the second drive as a fallback. The second-stage boot loader for FreeBSD is capable of loading a kernel from such an array. Software-implemented RAID is not always compatible with the system's boot process, and it is generally impractical for desktop versions of Windows. However, hardware RAID controllers are expensive and proprietary. To fill this gap, inexpensive "RAID controllers" were introduced that do not contain
5858-527: The Galois field. Let D 0 , . . . , D n − 1 ∈ G F ( m ) {\displaystyle \mathbf {D} _{0},...,\mathbf {D} _{n-1}\in GF(m)} correspond to the stripes of data across hard drives encoded as field elements in this manner. We will use ⊕ {\displaystyle \oplus } to denote addition in
5959-496: The Thinking Machines' DataVault where 32 data bits were transmitted simultaneously. The IBM 353 also observed a similar usage of Hamming code and was capable of transmitting 64 data bits simultaneously, along with 8 ECC bits. With all hard disk drives implementing internal error correction, the complexity of an external Hamming code offered little advantage over parity so RAID 2 has been rarely implemented; it
6060-427: The array by each disk is limited to the size of the smallest disk. For example, if a 120 GB disk is striped together with a 320 GB disk, the size of the array will be 120 GB × 2 = 240 GB. However, some RAID implementations would allow the remaining 200 GB to be used for other purposes. The diagram in this section shows how the data is distributed into stripes on two disks, with A1:A2 as
6161-422: The array has a total capacity of 750 GB but the capacity that is usable for data storage is only 500 GB. Different RAID configurations can also detect failure during so called data scrubbing . Historically disks were subject to lower reliability and RAID levels were also used to detect which disk in the array had failed in addition to that a disk had failed. Though as noted by Patterson et al. even at
SECTION 60
#17327754796716262-423: The benefits of RAID 1E over usual RAID 1 mirrored pairs is that the performance of random read operations remains above the performance of a single drive even in a degraded array. The ZFS filesystem provides RAID-Z , a data/parity distribution scheme similar to RAID 5 , but using dynamic stripe width: every block is its own RAID stripe, regardless of blocksize, resulting in every RAID-Z write being
6363-770: The cache at system restart time. RAID 1 In computer storage , the standard RAID levels comprise a basic set of RAID ("redundant array of independent disks" or "redundant array of inexpensive disks") configurations that employ the techniques of striping , mirroring , or parity to create large reliable data stores from multiple general-purpose computer hard disk drives (HDDs). The most common types are RAID 0 (striping), RAID 1 (mirroring) and its variants, RAID 5 (distributed parity), and RAID 6 (dual parity). Multiple RAID levels can also be combined or nested , for instance RAID 10 (striping of mirrors) or RAID 01 (mirroring stripe sets). RAID levels and their associated data formats are standardized by
6464-416: The chances for a second failure before the first has been recovered (causing data loss) are higher than the chances for random failures. In a study of about 100,000 drives, the probability of two drives in the same cluster failing within one hour was four times larger than predicted by the exponential statistical distribution —which characterizes processes in which events occur continuously and independently at
6565-520: The data is still exposed to operator, software, hardware, and virus destruction. Many studies cite operator fault as a common source of malfunction, such as a server operator replacing the incorrect drive in a faulty RAID, and disabling the system (even temporarily) in the process. An array can be overwhelmed by catastrophic failure that exceeds its recovery capacity and the entire array is at risk of physical damage by fire, natural disaster, and human forces, however backups can be stored off site. An array
6666-453: The data on the array is lost when more hard drives fail than the redundancy can handle. In OpenBSD , CRYPTO is an encrypting discipline for the softraid subsystem. It encrypts data on a single chunk to provide for data confidentiality. CRYPTO does not provide redundancy. RAID 1C provides both redundancy and encryption. Some filesystems, such as Btrfs, and ZFS/OpenZFS (with per-dataset copies=1|2|3 property), support creating multiple copies of
6767-438: The data would be distributed in two RAID 5–like arrays and two RAID 1-like sets: BeyondRaid offers a RAID 6–like feature and can perform hash-based compression using 160-bit SHA-1 hashes to maximize storage efficiency. Unraid is a proprietary Linux-based operating system optimized for media file storage. Unfortunately Unraid doesn't provide information about its storage technology, but some say its parity array
6868-465: The data, the same as RAID 5. Different implementations of RAID 6 use different erasure codes to calculate the Q block, often one of Reed Solomon, EVENODD, Row Diagonal Parity (RDP), Mojette, or Liberation codes. RAID 6 does not have a performance penalty for read operations, but it does have a performance penalty on write operations because of the overhead associated with parity calculations. Performance varies greatly depending on how RAID 6
6969-420: The developers of Drive Bender, and StableBit's DrivePool. BeyondRAID is not a true RAID extension, but consolidates up to 12 SATA hard drives into one pool of storage. It has the advantage of supporting multiple disk sizes at once, much like JBOD, while providing redundancy for all disks and allowing a hot-swap upgrade at any time. Internally it uses a mix of techniques similar to RAID 1 and 5. Depending on
7070-584: The direction the data blocks are written, the location of the parity blocks with respect to the data blocks and whether or not the first data block of a subsequent stripe is written to the same drive as the last parity block of the prior stripe. The figure to the right is just one of many such layouts. According to the Storage Networking Industry Association (SNIA), the definition of RAID 6 is: "Any form of RAID that can continue to execute read and write requests to all of
7171-401: The disks of a declustered array. Under traditional RAID, an entire disk storage system of, say, 100 disks would be split into multiple arrays each of, say, 10 disks. By contrast, under declustered RAID, the entire storage system is used to make one array. Every data item is written twice, as in mirroring, but logically adjacent data and copies are spread arbitrarily. When a disk fails, erased data
7272-428: The disks to fail). Windows Home Server Drive Extender is a specialized case of JBOD RAID 1 implemented at the file system level. Microsoft announced in 2011 that Drive Extender would no longer be included as part of Windows Home Server Version 2, Windows Home Server 2011 (codename VAIL). As a result, there has been a third-party vendor move to fill the void left by DE. Included competitors are Division M,
7373-413: The entire array to fail, due to data being striped across all disks. This configuration is typically implemented having speed as the intended goal. RAID 0 is normally used to increase performance, although it can also be used as a way to create a large logical volume out of two or more physical disks. A RAID 0 setup can be created with disks of differing sizes, but the storage space added to
7474-563: The factory with TLER enabled. Similar technologies are used by Seagate, Samsung, and Hitachi. For non-RAID usage, an enterprise class drive with a short error recovery timeout that cannot be changed is therefore less suitable than a desktop drive. In late 2010, the Smartmontools program began supporting the configuration of ATA Error Recovery Control, allowing the tool to configure many desktop class hard drives for use in RAID setups. While RAID may protect against physical drive failure,
7575-417: The field, and concatenation to denote multiplication. The reuse of ⊕ {\displaystyle \oplus } is intentional: this is because addition in the finite field Z 2 {\displaystyle \mathbb {Z} _{2}} represents to the XOR operator, so computing the sum of two elements is equivalent to computing XOR on the polynomial coefficients. A generator of
7676-579: The filesystem metadata to determine the actual RAID-Z geometry. This would be impossible if the filesystem and the RAID array were separate products, whereas it becomes feasible when there is an integrated view of the logical and physical structure of the data. Going through the metadata means that ZFS can validate every block against its 256-bit checksum as it goes, whereas traditional RAID products usually cannot do this. In addition to handling whole-disk failures, RAID-Z can also detect and correct silent data corruption , offering "self-healing data": when reading
7777-439: The first stripe, A3:A4 as the second one, etc. Once the stripe size is defined during the creation of a RAID 0 array, it needs to be maintained at all times. Since the stripes are accessed in parallel, an n -drive RAID 0 array appears as a single large disk with a data rate n times higher than the single-disk rate. A RAID 0 array of n drives provides data read and write transfer rates up to n times as high as
7878-422: The following: Industry manufacturers later redefined the RAID acronym to stand for "redundant array of independent disks". Many RAID levels employ an error protection scheme called " parity ", a widely used method in information technology to provide fault tolerance in a given set of data. Most use simple XOR , but RAID 6 uses two separate parities based respectively on addition and multiplication in
7979-489: The following: The distribution of data across multiple drives can be managed either by dedicated computer hardware or by software . A software solution may be part of the operating system, part of the firmware and drivers supplied with a standard drive controller (so-called "hardware-assisted software RAID"), or it may reside entirely within the hardware RAID controller. Hardware RAID controllers can be configured through card BIOS or Option ROM before an operating system
8080-503: The fraction of data in relation to capacity, it can survive up to three drive failures, if the "array" can be restored onto the remaining good disks before another drive fails. The amount of usable storage can be approximated by summing the capacities of the disks and subtracting the capacity of the largest disk. For example, if a 500, 400, 200, and 100 GB drive were installed, the approximate usable capacity would be 500 + 400 + 200 + 100 − 500 = 700 GB of usable space. Internally
8181-458: The growing personal computer market. Although failures would rise in proportion to the number of drives, by configuring for redundancy, the reliability of an array could far exceed that of any large single drive. Although not yet using that terminology, the technologies of the five levels of RAID named in the June 1988 paper were used in various products prior to the paper's publication, including
8282-469: The help of a third-party logical volume manager: Many operating systems provide RAID implementations, including the following: If a boot drive fails, the system has to be sophisticated enough to be able to boot from the remaining drive or drives. For instance, consider a computer whose disk is configured as RAID 1 (mirrored drives); if the first drive in the array fails, then a first-stage boot loader might not be sophisticated enough to attempt loading
8383-457: The inception of RAID many (though not all) disks were already capable of finding internal errors using error correcting codes. In particular it is/was sufficient to have a mirrored set of disks to detect a failure, but two disks were not sufficient to detect which had failed in a disk array without error correcting features. Modern RAID arrays depend for the most part on a disk's ability to identify itself as faulty which can be detected as part of
8484-526: The individual drive rates, but with no data redundancy. As a result, RAID 0 is primarily used in applications that require high performance and are able to tolerate lower reliability, such as in scientific computing or computer gaming . Some benchmarks of desktop applications show RAID 0 performance to be marginally better than a single drive. Another article examined these claims and concluded that "striping does not always increase performance (in certain situations it will actually be slower than
8585-445: The last edition of The Raid Book published by the defunct Raid Advisory Board. In a Synchronous layout the data first block of the next stripe is written on the same drive as the parity block of the previous stripe. In comparison to RAID 4, RAID 5's distributed parity evens out the stress of a dedicated parity disk among all RAID members. Additionally, write performance is increased since all RAID members participate in
8686-579: The maximum error rates being insufficient to guarantee a successful recovery, due to the high likelihood of such an error occurring on one or more remaining drives during a RAID set rebuild. When rebuilding, parity-based schemes such as RAID 5 are particularly prone to the effects of UREs as they affect not only the sector where they occur, but also reconstructed blocks using that sector for parity computation. Double-protection parity-based schemes, such as RAID 6, attempt to address this issue by providing redundancy that allows double-drive failures; as
8787-652: The one before. In the case of two lost data chunks, we can compute the recovery formulas algebraically. Suppose that D i {\displaystyle \mathbf {D} _{i}} and D j {\displaystyle \mathbf {D} _{j}} are the lost values with i ≠ j {\displaystyle i\neq j} , then, using the other values of D {\displaystyle D} , we find constants A {\displaystyle A} and B {\displaystyle B} : We can solve for D i {\displaystyle D_{i}} in
8888-648: The operating system without having to reboot into card BIOS. For example, this was the approach taken by OpenBSD in 2005 with its bio(4) pseudo-device and the bioctl utility, which provide volume status, and allow LED/alarm/hotspare control, as well as the sensors (including the drive sensor ) for health monitoring; this approach has subsequently been adopted and extended by NetBSD in 2007 as well. Software RAID implementations are provided by many modern operating systems . Software RAID can be implemented as: Some advanced file systems are designed to organize data across multiple storage devices directly, without needing
8989-435: The original data is removed from the parity, the new data calculated into the parity and both the new data sector and the new parity sector are written. RAID 6 extends RAID 5 by adding a second parity block; thus, it uses block -level striping with two parity blocks distributed across all member disks. RAID 6 requires at least four disks. As in RAID 5, there are many layouts of RAID 6 disk arrays depending upon
9090-468: The performance of random writes is low due to the need to write all parity data to a single disk, unless the filesystem is RAID-4-aware and compensates for that. An advantage of RAID 4 is that it can be quickly extended online, without parity recomputation, as long as the newly added disks are completely filled with 0-bytes. In diagram 1, a read request for block A1 would be serviced by disk 0. A simultaneous read request for block B1 would have to wait, but
9191-690: The redundancy of the array to recover bad blocks on a single drive and to reassign the recovered data to spare blocks elsewhere on the drive. Frequently, a RAID controller is configured to "drop" a component drive (that is, to assume a component drive has failed) if the drive has been unresponsive for eight seconds or so; this might cause the array controller to drop a good drive because that drive has not been given enough time to complete its internal error recovery procedure. Consequently, using consumer-marketed drives with RAID can be risky, and so-called "enterprise class" drives limit this error recovery time to reduce risk. Western Digital's desktop drives used to have
9292-507: The required level of redundancy and performance. The different schemes, or data distribution layouts, are named by the word "RAID" followed by a number, for example RAID 0 or RAID 1. Each scheme, or RAID level, provides a different balance among the key goals: reliability , availability , performance , and capacity . RAID levels greater than RAID 0 provide protection against unrecoverable sector read errors, as well as against failures of whole physical drives. The term "RAID"
9393-400: The resulting data storage capacity. The array will continue to operate so long as at least one member drive is operational. Any read request can be serviced and handled by any drive in the array; thus, depending on the nature of I/O load, random read performance of a RAID 1 array may equal up to the sum of each member's performance, while the write performance remains at the level of
9494-427: The risk of them happening during RAID rebuilds and causing double-drive failures. The recovery of UREs involves remapping of affected underlying disk sectors, utilizing the drive's sector remapping pool; in case of UREs detected during background scrubbing, data redundancy provided by a fully operational RAID set allows the missing data to be reconstructed and rewritten to a remapped sector. Drive capacity has grown at
9595-563: The same comparison. RAID 2 , which is rarely used in practice, stripes data at the bit (rather than block) level, and uses a Hamming code for error correction . The disks are synchronized by the controller to spin at the same angular orientation (they reach index at the same time ), so it generally cannot service multiple requests simultaneously. However, depending with a high rate Hamming code , many spindles would operate in parallel to simultaneously transfer data so that "very high data transfer rates" are possible as for example in
9696-498: The same data on a single drive or disks pool, protecting from individual bad sectors, but not from large numbers of bad sectors or complete drive failure. This allows some of the benefits of RAID on computers that can only accept a single drive, such as laptops. Declustered RAID allows for arbitrarily sized disk arrays while reducing the overhead to clients when recovering from disk failures. It uniformly spreads or declusters user data, redundancy information, and spare space across all
9797-487: The same physical location on each disk. Therefore, any I/O operation requires activity on every disk and usually requires synchronized spindles. This makes it suitable for applications that demand the highest transfer rates in long sequential reads and writes, for example uncompressed video editing. Applications that make small reads and writes from random disk locations will get the worst performance out of this level. The requirement that all disks spin synchronously (in
9898-501: The second equation and plug it into the first to find D j = ( g m − i + j ⊕ 1 ) − 1 ( g m − i B ⊕ A ) {\displaystyle D_{j}=(g^{m-i+j}\oplus 1)^{-1}(g^{m-i}B\oplus A)} , and then D i = A ⊕ D j {\displaystyle D_{i}=A\oplus D_{j}} . Unlike P , The computation of Q
9999-430: The serving of write requests. Although it will not be as efficient as a striping (RAID 0) setup, because parity must still be written, this is no longer a bottleneck. Since parity calculation is performed on the full stripe, small changes to the array experience write amplification : in the worst case when a single, logical sector is to be written, the original sector and the according parity sector need to be read,
10100-542: The spare drive among multiple arrays, which is occasionally desirable. Intel Matrix RAID (a feature of Intel Rapid Storage Technology) is a feature (not a RAID level) present in the ICH6R and subsequent Southbridge chipsets from Intel, accessible and configurable via the RAID BIOS setup utility. Matrix RAID supports as few as two physical disks or as many as the controller supports. The distinguishing feature of Matrix RAID
10201-536: Was invented by David Patterson , Garth Gibson , and Randy Katz at the University of California, Berkeley in 1987. In their June 1988 paper "A Case for Redundant Arrays of Inexpensive Disks (RAID)", presented at the SIGMOD Conference, they argued that the top-performing mainframe disk drives of the time could be beaten on performance by an array of the inexpensive drives that had been developed for
#670329