Redis’s safety net: Master the persistence technology of RDB and AOF [redis Part 4]

Safety net of Redis: Master the persistence technology of RDB and AOF

  • Preface
  • First: What are RDB and AOF?
    • RDB (Redis Database Backup):
    • AOF (Append-Only File):
    • Similarities and differences:
  • Second: In-depth analysis of RDB mechanism
    • working principle:
    • Configuration and triggering:
    • advantage:
    • limitation:
  • Third: In-depth analysis of the AOF mechanism
    • working principle:
    • Configuration and management:
    • rewrite mechanism
      • How the AOF rewriting mechanism works:
      • Advantages of AOF rewriting:
      • Limitations of AOF rewriting:
    • Advantage:
    • Disadvantages:
  • Fourth: AOF + RDB implements incremental snapshots
  • Fifth: High availability and disaster recovery

Foreword

“In the era of big data, data security and persistence have become more and more important. Redis, as a high-performance in-memory database, is no exception. But did you know that Redis uses two different data persistence mechanisms, namely RDB and AOF? This article will lead you into the data security journey of Redis. We will explore the essence of Redis data persistence, understand the difference between RDB and AOF, and how they protect your data from loss. Are you ready? Let’s get started Bar!”

First: What are RDB and AOF

RDB (Redis Database Backup) and AOF (Append-Only File) are two different persistence methods used in Redis to save data to the hard disk for recovery when Redis restarts. They have their own advantages and applicable scenarios.

RDB (Redis Database Backup):

  1. Basic concept: RDB is a snapshot persistence method of Redis, which regularly saves the entire data set to disk. This snapshot is a binary file that contains all the data at a certain time.
  2. Trigger conditions: RDB can be configured to trigger generation at a certain time interval or when a certain number of write operations is reached.
  3. Advantages:
    • Better performance: When generating an RDB snapshot, Redis will fork a child process. The snapshot generation process will not block the main process, so it will not have a big impact on performance.
    • Smaller space footprint: RDB files are usually smaller than AOF files because it only saves snapshot data and does not record every write operation.
  4. Applicable scenarios: RDB is suitable for scenarios that require fast data recovery speed and do not require high consistency of recent data, such as caching data.

AOF (Append-Only File):

  1. Basic concept: AOF is a persistence method that records each write operation as an append command to the file. When Redis restarts, the entire data set can be restored by re-executing these commands.
  2. Trigger conditions: AOF can be configured to synchronize to disk (fsync) for every write operation, or to perform fsync operations regularly.
  3. Advantages:
    • Higher data consistency: AOF records every write operation, so the data is more consistent during failure recovery.
    • Less prone to data loss: Since AOF records every write operation, even if Redis crashes before generating the AOF file, the data can be minimally recovered by playing back the AOF file.
  4. Applicable scenarios: AOF is suitable for scenarios that require high data consistency and can withstand a certain performance loss, such as key business data.

Similarities and differences:

  • Performance: RDB usually performs better because it does not need to record every write operation when generating a snapshot, while AOF needs to record every write operation, which may have a certain performance overhead.
  • Data recovery speed: RDB recovery speed is faster because only the snapshot file needs to be loaded, while AOF needs to re-perform all write operations, which may be slower.
  • Data consistency: AOF provides higher data consistency because it records each write operation, while RDB only saves a snapshot of the data at a certain moment.

In actual applications, you can choose to use RDB, AOF or a combination of both persistence methods according to your needs. Typically, RDB can be used to create backups, while AOF is used to ensure data consistency.

Second: In-depth analysis of RDB mechanism

RDB (Redis Database Backup) is a persistence mechanism of Redis, which is used to regularly save the entire data set to the hard disk for data recovery when the Redis server is restarted. The following is an in-depth analysis of the RDB mechanism, including working principles, configuration triggers, advantages and limitations:

How it works:

The working principle of RDB is relatively simple. It generates a binary snapshot file that contains all data at a certain moment. The following is the workflow of RDB:

  1. Trigger snapshot: The generation of RDB can be determined by a variety of trigger conditions, usually including:

    • Configured time interval (e.g. generated every hour).
    • The configured write operation count threshold (for example, once every 10,000 write operations).
    • Triggered manually, you can use the Redis command to force an RDB snapshot.
  2. Generate snapshot: When the RDB triggering conditions are met, Redis will create a sub-process responsible for generating RDB snapshots. During the process of generating a snapshot, the Redis main process continues to process requests without being blocked.

  3. Write to hard disk: After generating an RDB snapshot, Redis will write the snapshot to a new file on the hard disk. This process usually uses a copy-on-write mechanism to ensure data consistency.

  4. Replace old RDB files: After generating a new RDB snapshot, Redis will replace the old RDB files with new ones to maintain the latest data.

Configuration and triggering:

You can configure RDB trigger conditions through some parameters in the Redis configuration file, such as the following example:

save 900 1 # If there is a write operation within 900 seconds, trigger the generation of RDB
save 300 10 # If there are 10 write operations within 300 seconds, trigger RDB generation
save 60 10000 # If there are 10000 write operations within 60 seconds, trigger the generation of RDB

You can also manually trigger RDB generation using Redis commands:

SAVE # Manually trigger RDB generation
  • save: executed in the main thread, will cause blocking
  • bgsave: Create a sub-process specifically for writing RDB files, avoiding blocking of the main thread. This is also the default configuration for Redis RDB file generation.

Simply put, the bgsave sub-process is generated by fork of the main thread and can share all memory data of the main thread. After the bgsave sub-process runs, it starts to read the memory data of the main thread and writes them to the RDB file.

At this time, if the main thread also performs read operations on these data (such as key-value pair A in the figure), then the main thread and the bgsave sub-process will not affect each other. However, if the main thread wants to modify a piece of data (such as the key-value pair C in the figure), then the data will be copied to generate a copy of the data. Then, the bgsave sub-process will write this copy data to the RDB file, and during this process, the main thread can still directly modify the original data.

image-20220709153608415

Advantages:

  1. Performance is not affected: During the RDB generation process, the Redis main process will not be blocked and the performance of read and write operations will not be affected.
  2. Compact data format: RDB file is a binary file, which is usually more compact than AOF file and takes up less disk space.
  3. Fast Data Recovery: When recovering data, RDB snapshots load faster because it only requires loading a single file.

Limitations:

  1. Data may be lost: During the RDB generation process, if the Redis server crashes, data after the last RDB generation may be lost.
  2. Not suitable for real-time backups: RDB is suitable for creating backups, but not for real-time backups because the triggering conditions are based on time or write operations.
  3. Large file size: Although more compact than AOF, RDB files can still be large, especially if there is a large amount of data.
  4. The bgsave subprocess needs to be created from the main thread through the fork operation. Although the child process will not block the main thread after it is created, the creation process of fork itself will block the main thread, and the larger the memory of the main thread, the longer the blocking time. If the bgsave child process is frequently forked, it will frequently block the main thread.

In short, RDB is a snapshot persistence method of Redis. It is suitable for scenarios where data recovery speed is fast and the consistency of recent data is not high, such as being used as a cache. Configuring RDB trigger conditions and generating snapshots regularly can provide an extra layer of protection for your data. But be aware that in a production environment, it is usually recommended to combine AOF methods to improve data consistency and protection.

Third: In-depth analysis of AOF mechanism

AOF (Append-Only File) is a persistence mechanism of Redis, which is used to record each write operation and append it to a file as a command to facilitate data recovery when the Redis server is restarted. The following is an in-depth analysis of the AOF mechanism, including working principles, configuration and management, advantages and disadvantages:

How it works:

The working principle of AOF is relatively intuitive. It appends each write operation to an AOF file in the form of a command, and the file content is a series of Redis commands. The following is the workflow of AOF:

  1. Write operation record: Every time Redis performs a write operation (such as SET, DEL, INCR, etc.), the corresponding command will be appended to the end of the AOF file.

  2. Periodic writing: AOF file contents are regularly written to disk to ensure data durability. This operation can be triggered based on the configured fsync policy.

  3. Data recovery: When the Redis server is restarted, the commands in the AOF file will be re-executed in order, thereby restoring the data to the same state as before the restart.

For logs, we are more familiar with the pre-write log of the database, that is, before actually writing the data, the modified data is recorded in the log file to facilitate recovery in the event of a failure. AOF log is just the opposite. It is a post-write log, that is, the command is executed first, then the data is written to the memory, and then the log is recorded, as shown below:

image-20220709141919263

?: What is recorded in the log

Traditional database logs record modified logs. The AOF records every command received by Redis, and these commands are saved in text format.

: For example set testkey testvalue

image-20220709142146020

The *3 above indicates that the current command has three parts

: The point is that in order to avoid additional overhead, the syntax of the command will not be checked when recording the log. Therefore, if the log is recorded first and then the command is executed, there may be an incorrect command in the log.

: In addition, AOF has another advantage. It only records the log after the command is executed, so it will not block the current write operation

Configuration and management:

You can configure how AOF works through some parameters in the Redis configuration file, such as the following example:

appendonly yes # Enable AOF persistence
appendfsync always # Each write operation is synchronized to disk
appendfilename "appendonly.aof" #The name of the AOF file

Common fsync strategies include:

  • always: Each write operation will be synchronized to the disk, which is the safest but has lower performance.
  • everysec: Sync every second, moderate performance and data security.
  • no: No synchronization operation is performed, with the highest performance but the lowest security.

Rewriting mechanism

The rewriting mechanism of AOF (Append-Only File) is a feature used to optimize the size of AOF files. It helps solve the problem that AOF files may become too large. AOF rewriting not only reduces the size of AOF files, but also improves performance. The following is a detailed explanation of AOF’s rewriting mechanism:

How the AOF rewriting mechanism works:

The main goal of the AOF rewriting mechanism is to generate a new AOF file that contains the same data as the original AOF file, but the file size is smaller and does not contain unnecessary data. This process is implemented by scanning the data in memory and converting it into a set of commands for the AOF file.

Specific steps are as follows:

  1. The Redis server starts a background process, which is responsible for performing AOF rewriting.
  2. The background process scans the memory data of Redis and records all write operations that occur in the memory and the commands corresponding to these operations.
  3. These commands are written to a new AOF file, and the resulting file contains only the data written during the rewrite.
  4. When the new AOF file is generated, the original AOF file is backed up, and then the new AOF file replaces the original file.
  5. After the rewriting is completed, the Redis server only uses the new AOF file for data persistence.

image-20220709153608415

Advantages of AOF rewriting:

  1. Reducing the size of AOF files: AOF rewriting can remove data that is no longer needed in the AOF file, so the new AOF file is usually smaller than the original file and takes up less disk space.
  2. Improve read and write performance: The new AOF file contains write operations during rewriting, so loading the new AOF file is faster and does not require replaying the entire historical data.
  3. Reduce the maintenance cost of AOF files: The original AOF file may become very large, and the rewriting mechanism helps keep the AOF file moderate in size and reduce maintenance costs.

Restrictions on AOF rewriting:

  1. AOF rewriting is a background process, so it may occupy CPU and I/O resources during execution, which may have some impact on server performance. But this is usually controllable, and its execution timing can be limited through reasonable configuration.
  2. AOF rewriting only handles write operations that occur during the rewriting process, so there are certain limitations on the compression and performance improvement of AOF files. If the original AOF file is very large, rewriting may take a long time to perform.

In summary, the AOF rewriting mechanism is a very useful tool that can be used to optimize AOF file size, improve performance, and reduce AOF file maintenance costs. In practical applications, it is recommended to perform AOF rewriting regularly according to the actual situation to ensure that the AOF file maintains a reasonable size.

Advantages:

  1. Data consistency: AOF records each write operation, so the data is more consistent when recovering data and data loss is less likely to occur.
  2. Failure Recovery: Even if Redis crashes before generating the AOF file, the data can be minimally recovered by playing back the AOF file.
  3. Suitable for important data: AOF is suitable for scenarios that require high data consistency, such as key business data.
  4. Readability: AOF files are text files, and the commands in them can be viewed and checked to facilitate debugging and analysis.

Disadvantages:

  1. Performance overhead: AOF records each write operation, which may have a certain performance overhead, especially when the fsync policy is configured as always.
  2. Large file: AOF files are usually larger than RDB files because they contain commands for each write operation and may occupy more disk space.
  3. Slow data recovery: During data recovery, AOF needs to re-execute all commands, which may be slower than loading RDB snapshots.

In actual applications, you can choose to use AOF, RDB, or a combination of both persistence methods based on your needs and the importance of the data. Typically, AOF is used to ensure data consistency, while RDB is used to create backups. At the same time, data recovery speed and performance overhead can be balanced through reasonable configuration.

Fourth: AOF + RDB implements incremental snapshots

In practical applications, if AOF log files continue to accumulate, they may occupy a large amount of disk space. You can consider keeping only the AOF logs between two RDB snapshots, and then deleting the older AOF logs to reduce disk usage.

The following is an incremental snapshot implementation method based on this idea:

  1. Enable AOF persistence: Enable AOF persistence in the Redis configuration.
appendonly yes
  1. Configure AOF rewriting: Enable the AOF rewriting mechanism to control the AOF file size, and configure RDB to generate snapshot files regularly.
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
  1. Set AOF file retention policy: Use external scripts or periodic tasks to monitor AOF files and delete them if necessary.

You could write a script that periodically checks the generation time of the AOF files and then deletes the AOF files that are older than between the two RDB snapshots. This can be triggered via the Redis BGREWRITEAOF command to ensure that AOF files are cleaned when the RDB is generated. You can use the following steps:

  • Periodically execute the BGREWRITEAOF command to generate new AOF files.
  • When a new RDB snapshot is generated, check the generation time of the AOF file and delete the AOF file that is older than the time between the two RDB snapshots.

This method ensures that the AOF file remains within a reasonable size range while providing the function of incremental snapshots. When data needs to be restored, the RDB file is loaded first, and then the write operation in the latest AOF file is performed to maintain data consistency. In this way, you can not only obtain data protection, but also effectively control the size of the AOF file.

Fifth: High availability and disaster recovery

Redis Sentinel and Redis Cluster are Redis architecture components used to achieve high availability and disaster recovery. They can be used in conjunction with RDB and AOF persistence mechanisms to provide data security. Here’s an in-depth look at how they work together:

Redis Sentinel:

  1. High Availability: Redis Sentinel is used to monitor servers in the Redis master-slave architecture to ensure the availability of the master server. When the master server fails, Sentinel can automatically select a slave server to upgrade to the new master server, thereby maintaining high availability of the service.
  2. Persistence: Sentinel itself is not responsible for data persistence, but you can configure your Redis instance to use RDB and AOF for data persistence. This helps when the primary server fails, the new primary server can use RDB files or AOF files for data recovery.
  3. Actual use case: In an application with high availability requirements, Redis Sentinel can be configured to monitor multiple Redis instances. Each instance can be configured with RDB and AOF persistence. When the master server fails, Sentinel automatically upgrades a slave server to maintain service availability, while utilizing RDB and AOF to ensure data integrity.

Redis Cluster:

  1. High Availability: Redis Cluster is a tool for sharding and distributed Redis environments, which distributes data to multiple Redis nodes. Each node can be configured with RDB and AOF for data persistence. When a node fails, the cluster automatically migrates slots to maintain high availability of services.
  2. Persistence: Each Redis Cluster node can independently configure RDB and AOF persistence to ensure data security. This means that even if one node fails, data on other nodes is still available.
  3. Actual use cases: Redis Cluster is suitable for applications that require horizontal expansion and high availability. You can configure multiple Redis nodes to create a distributed cluster, and each node can be configured with RDB and AOF. This helps ensure data security and high availability.

Use RDB and AOF together:

  • In high availability and disaster recovery scenarios, it is often recommended to use both RDB and AOF. RDB is used to create regular snapshot backups, and AOF is used to record real-time write operations.
  • For Redis Sentinel and Redis Cluster, using RDB and AOF can provide additional protection to ensure that data will not be lost and that data can be quickly restored after the main server fails.
  • For Redis Cluster, independent persistence configuration for each node enables data recovery at the node level.

In summary, Redis Sentinel and Redis Cluster combined with RDB and AOF persistence mechanisms can achieve high availability and disaster recovery Redis environments. This provides multiple levels of data security and recoverability to meet the needs of different applications. When configuring, ensure that persistence settings are coordinated with high-availability configuration for optimal data protection and availability.