In-depth analysis of key technologies of disk RAID

Disk fault tolerance means that data integrity and data processing capabilities can be guaranteed when a hard disk error or hard disk failure occurs in the subsystem. The RAID control card implements this function on RAID 1, 5, 6, 10, 50, and 60 through redundant hard disk groups.

In RAID 1, since the data is mirrored on a pair of hard drives, there will be no data loss if an error or failure occurs in one of the paired hard drives. Similarly, in RAID 5, one hard drive failure is allowed; in RAID 6, two hard drive failures are allowed.

For RAID that contains multiple subgroups, RAID 10 and 50 allow the number of failed disks to be the same as the number of subgroups, but each subgroup is required to contain only one failed disk. RAID 60 allows the number of failed disks to be twice the number of subgroups, and each subgroup is required to contain at most 2 failed disks.

RAID 0 does not support fault tolerance. When the hard disk in RAID 0 fails, the RAID will fail and data will be lost.

Fault tolerance improves system availability, that is, the system can still run normally when the hard disk fails. Therefore, fault tolerance is a very important feature in the fault repair process.

1 Consistency Check

For RAID 1, 5, 6, 10, 50, and 60 with redundant functions, the RAID control card can perform consistency verification on the hard disk data of the RAID group, check and calculate the hard disk data, and compare it with the corresponding redundant data. Compare. If data inconsistencies are found, automatic repair will be attempted and error information will be saved.

Since RAID 0 does not have redundancy, it does not support consistency checking.

2 Hot Backup

The hot backup feature of the RAID controller card is implemented by the hot spare disk and emergency backup functions.

Hot spare disk

A hot spare disk is an independent hard disk in the hard disk system. When a hard disk in the RAID group fails, the hot spare disk automatically replaces the failed disk as a member disk and reconstructs the failed disk data on the hot spare disk.

In the management interface or command line tool of the RAID controller card, you can specify an idle disk with a capacity greater than or equal to that of a RAID group member disk and with the same media type and interface as the member disk as the hot spare disk of the RAID group.

The hot spare disks supported by the RAID controller card include the following two types:

Global hot spare disks are shared by all configured RAID groups on the RAID controller card. One or more global hot spare disks can be configured in one RAID controller card. When any member disk in the RAID group fails, the global hot spare disk can automatically replace it.
Local hot spare disks are exclusive to a designated RAID group on the RAID controller card. Each RAID group can be configured with one or more local hot spare disks. When a member disk in the specified RAID group fails, the local hot spare disk can be automatically replaced.

Hot spare disks have the following features:

Hot spare disks are only used in RAID groups with redundancy functions, including RAID 1, 5, 6, 10, 50, and 60.
Hot spare disks are only used to replace failed disks on the same RAID controller card.

Emergency Backup

The emergency backup function means that when a member disk fails in any redundant RAID group and no hot spare disk is specified, the free disk under the RAID control card will automatically replace the failed member disk and reconstruct it to avoid data loss.

Emergency backup requires that the capacity of the free disk used for backup is greater than or equal to that of the member disk, and the media type must be the same as that of the member disk.

3 RAID reconstruction

When a failed disk occurs in a RAID group, the data in the failed disk can be reconstructed on the new disk through the data reconstruction function of the RAID control card. The data reconstruction function is only available for RAID 1, 5, 6, 10, 50, and 60 with redundancy function.

The RAID control card supports hot spare disks to automatically reconstruct data on failed member disks. When the RAID group is configured with an available hot spare disk, in the event of a member disk failure, the hot spare disk automatically replaces the failed disk for data reconstruction; when the RAID group has no available hot spare disk, the hot spare disk will be replaced only after a new disk is replaced. Data reconstruction is possible. When the hot spare disk starts data reconstruction, the failed member disk is marked as removable. If the system is powered off during the data reconstruction process, the RAID controller card will continue to perform the data reconstruction task after the system is restarted.

The reconstruction rate, that is, the ratio of CPU resources occupied by the data reconstruction task when the system is running, can be set from 0% to 100%. 0% means that the data reconstruction task will only be started when the system does not currently have any other tasks running. 100% means that the data reconstruction task consumes all CPU resources. The reconstruction rate can be set by the user. It is recommended to set an appropriate value according to the actual situation of the system.

4 Virtual disk read and write strategies

When creating a virtual disk, you will need to define its data read and write policy to standardize the way data is read and written during subsequent virtual disk operation.

Data reading strategy

It is generally reflected as “Read Policy” in the configuration interface. The RAID controller card supports the following two data reading strategies:

Pre-reading method: In the configuration interface, there are generally “Always Read Ahead”, “Read Ahead”, “Ahead” and other configuration options. After using this strategy, when the required data is read from the virtual disk, the subsequent data will be read out and placed in the cache at the same time. When the user subsequently accesses the data, it can be directly hit in the cache, which will reduce the hard disk seek operation and save response time. time, improving data reading speed.

To use this strategy, the RAID controller card is required to support the data power-off protection function. However, if the supercapacitor is abnormal at this time, data may be lost.

Non-pre-read mode: After using this policy, the RAID controller card only reads data from the virtual disk when it receives the data read command, and does not perform pre-read operations.

Data writing strategy

It is generally reflected as “Write Policy” in the configuration interface. The RAID controller card supports the following data writing strategies:

Write back: Generally reflected in the configuration interface as “Write Back” and other words. After using this strategy, when data needs to be written to the virtual disk, it will be written directly into the Cache. When the written data accumulates to a certain extent, the RAID control card will refresh the data to the virtual disk. This not only achieves batch writing, but also Improved data writing speed. When the control card Cache receives all the transmission data, it will return a data transmission completion signal to the host.

To use this strategy, the RAID controller card is required to support the data power-off protection function. However, if the supercapacitor is abnormal at this time, data may be lost.

Direct writing: There are generally options such as “Write Through” in the configuration interface. After using this policy, the RAID controller card writes data directly to the virtual disk without going through the cache. When the hard disk subsystem receives all transmission data, the control card will return a data transmission completion signal to the host.

This method does not require the RAID control card to support the data power-off protection function, and there will be no impact even if the supercapacitor fails. The disadvantage of this write strategy is lower write speed.

Write back related to BBU: There are generally options such as “Write Back with BBU” in the configuration interface. After using this strategy, when the RAID control card BBU is in place and the status is normal, the write operations from the RAID control card to the virtual disk will be transferred through the Cache (i.e. write-back mode); when the RAID control card BBU is not in place or the BBU fails, the RAID control card writes to the virtual disk. The write operation from the control card to the virtual disk will automatically switch to direct writing without going through the cache (that is, write-through mode).

Forced write back: Generally reflected in the configuration interface as “Write Back Enforce”, “Always Write Back” and other words. When the RAID controller card has no capacitor or the capacitor is damaged, the “Write Back” mode is forced to be used.

When the server is powered off abnormally, if the capacitor is not in place or is in a charging state, the written data in the DDR (i.e. Cache) in the RAID control card will be lost. This mode is not recommended.

5 Data power-off protection

Principle of power failure protection

The speed at which data is written into the RAID controller card cache is greater than the speed at which data is written into the hard disk. When the server performs a large number of write operations, the RAID controller card cache is used to improve system performance.

Enabling the RAID controller card cache can improve the write performance of the entire machine. When the server write pressure decreases or the RAID controller card cache becomes full, the data will be written to the hard disk from the RAID controller card cache.
While enabling the RAID controller card cache improves write performance, it also increases the risk of data loss. When the entire machine is unexpectedly powered off, the data in the RAID controller card cache will be lost.

In order to improve the high read and write performance of the entire machine and the security of data in the cache of the RAID control card, a supercapacitor can be configured for the RAID control card. The principle of the supercapacitor protection module is to use the supercapacitor to supply power when the system unexpectedly loses power, and write the data in the RAID control card cache to the NAND Flash in the supercapacitor module for permanent storage.

Supercapacitor power calibration

Since data protection requires the cooperation of supercapacitors, in order to record the discharge curve of the supercapacitor so that the RAID control card can understand the status of the supercapacitor, such as the maximum and minimum voltage, etc., and in order to extend the life of the supercapacitor, the RAID control card starts supercapacitor automatic by default. Calibration mode.

The RAID control card calibrates the power of the supercapacitor through the three-stage charge and discharge operation described below to keep it at a relatively stable value.

1. The RAID control card charges the supercapacitor to the maximum value.

2. Automatically start the calibration process and completely discharge the supercapacitor.

3. Restart charging until maximum power is reached.

During the supercapacitor power calibration process, the write strategy of the RAID control card is automatically adjusted to “Write Through” mode to ensure data integrity. At this time, the performance of the RAID control card will be reduced. The time for power calibration depends on the charging and discharging speed of the supercapacitor.

6 Hard drive striping

When multiple processes access a hard disk at the same time, hard disk conflicts may occur. Most hard drive systems have limits on access times (I/O operations per second) and data transfer rates (amount of data transferred per second). When these limits are reached, subsequent processes that need to access the hard disk need to wait.

Striping is a technology that automatically balances I/O load across multiple physical hard drives. Striping technology divides a continuous piece of data into multiple small parts and stores them on different hard drives. This allows multiple processes to access multiple different parts of the data simultaneously without causing disk conflicts, and maximizes I/O parallelism when sequential access to this data is required.

Hard disk striping is to divide the hard disk space into multiple strips according to the set size. When data is written, the data modules are also divided according to the size of the strips.

For example, in a hard disk system consisting of four member disks (such as RAID 0), the first data block is written to the first member disk, the second data block is written to the second member disk, and so on. As shown in Figure 1-11. Since multiple hard disks are written to at the same time, system performance is greatly improved. However, this feature does not guarantee data redundancy.

Hard disk striping includes the following concepts:

Stripe width: The number of hard disks used in striping a hard disk group. For example, a hard disk group consisting of four member disks has a stripe width of “4”.
The stripe size of the hard disk group: the size of the data blocks written by the RAID controller card on all hard disks in a hard disk group at the same time.
Hard drive stripe size: The size of the data blocks written by the RAID control card on each hard drive.

For example, for a hard disk group, when writing data, a 1MB data stripe allocates 64KB data blocks on each member disk. Then the stripe size of this hard disk group is 1MB and the stripe size of the hard disk is 64KB.

7 Disk Image

Disk mirroring, applicable to RAID 1 and RAID 10, means that when performing the task of writing data, the same data will be written to two hard disks at the same time to achieve 100% data redundancy. Since the data on the two hard drives is exactly the same, when one hard drive fails, the data will not be lost. In addition, the data on the two disks is exactly the same at the same time. When one hard disk fails, the data flow will not be interrupted because the data is read and written at the same time.

Disk mirroring brings 100% complete redundancy, but it is relatively expensive because each hard disk requires a backup disk during the mirroring process.

8 External Configuration

External configuration is different from the configuration of the current RAID control card. It is generally reflected in the configuration interface as “Foreign Configuration” and the like.

Generally, external configuration will appear under the following circumstances:

A newly installed physical hard disk on the server contains RAID configuration information, and the RAID control card will recognize this information as external configuration.
After the server replaces the RAID control card, the new RAID control card will recognize the currently existing RAID information as an external configuration.
After hot-plugging a member disk of a RAID group, the member disk will be marked as carrying external configuration.

The detected external configuration can be processed according to the actual situation of the server. For example, when the RAID information carried by the newly inserted hard disk does not meet the current usage scenario, you can delete this configuration. After replacing the RAID controller card, if you still want to use the previous configuration, you can import the configuration to make it effective on the new RAID controller card.

9 Hard drive energy saving

The RAID control card has the hard disk energy-saving function. This feature allows hard drives to spin up based on drive configuration and I/O activity. All SAS and SATA mechanical hard drives support this feature.

When the hard disk energy-saving function is enabled, the idle hard disks and idle hot spare disks mounted under the RAID control card are in the energy-saving state. When there are related operations (such as creating a RAID group, creating a hot spare disk, dynamic disk expansion, hot spare reconstruction), the hard disk in the energy saving state can be awakened.

10 Hard Drive Passthrough

Hard drive pass-through, also known as “JBOD” function, also known as command transparent transmission, is a data transmission method that does not go through the transmission equipment and only guarantees the transmission quality.

After turning on the hard disk pass-through function, the RAID control card can transparently transmit commands to the connected hard disk. Without configuring a RAID group, user commands can be transparently transmitted directly to the hard disk, making it easier for upper-level business software or management software to access and control the hard disk.

For example, during the installation process of the server operating system, you can directly find the hard disk mounted under the RAID control card as the installation disk; for a RAID control card that does not support hard disk pass-through, during the installation process of the operating system, you can only find the hard disk under the RAID control card. The configured virtual disk is used as the installation disk.

10T technical resources are on sale! Including but not limited to: Linux, virtualization, containers, cloud computing, networking, Python, Go, etc. Reply 10T in the Open Source Linux official account and you can get it for free!

If you find something useful, click to see it