Redis principles cache expiration, consistent hashing, avalanche, penetration, concurrency, blooming, cache update strategy, cache database consistency

redis expiration policy

The expiration policy of redis can be configured through the configuration file

1. Regular deletion

Redis will put the keys with expiration time set in a separate dictionary, and periodically traverse to delete the expired keys.
1). Randomly select 20 keys from the expired dictionary every 100ms and delete the expired keys;
2). If the proportion of expired keys exceeds 1/4, repeat step 1
In order to ensure that excessive cycling will not cause lag, the upper limit of scanning time does not exceed 25ms by default. Based on the above principles, the system should avoid a large number of keys from expiring at the same time and set a random range for the keys to expire.

2. Lazy deletion

Expired keys will not necessarily be deleted immediately, and will also occupy memory. When you actually query this key, redis will check whether the key with an expiration time set has expired? If it has expired, it will be deleted and empty will be returned. This is lazy deletion.

3. Memory elimination mechanism

When the redis memory exceeds the physical memory limit, it will swap with the disk. In this case, the performance is extremely poor and is generally not allowed. Limit the maximum memory usage by setting maxmemory. When the limit is exceeded, users can decide how to free up new space to provide normal reading and writing services based on several memory elimination mechanisms provided by redis.
(1) noeviction: Write operations are rejected, reading and deletion can be used normally. Default strategy, not recommended;
(2) allkeys-lru: Remove the least recently used key, the most commonly used strategy;
(3) allkeys-random: randomly delete a key, not recommended;
(4) volatile-lru: Among the keys with an expiration time set, remove the least recently used key, which is not recommended;
(5) volatile-random: Randomly delete a key among keys with an expiration time set, which is not recommended;
(6) volatile-ttl: Among the keys with an expiration time set, the key that is about to expire earliest is deleted first.

4. Implement an LRU algorithm by handwriting in Java

public class LRUCache<K,V> extends LinkedHashMap<K,V> {<!-- -->
    private int cacheSize;
    public LRUCache(int cacheSize){<!-- -->
        super(10,0.75f,true);
        //Set the hashmap size, true to sort the linkedhashmap in order of access
        this.cacheSize = cacheSize;
    }
    @Override
    protected boolean removeEldestEntry(Map.Entry<K, V> eldest) {<!-- -->
        //When the number in the map is greater than the specified cache number, the oldest data is automatically deleted.
        return size()>cacheSize;
    }
}

redis distributed algorithm

If there are three redis service nodes, they are redis0, redis1, and redis2. Now for a resource, divide it by 3 and take the remainder after hashing it. The remainders are 0, 1, and 2 respectively. The resource is stored on the corresponding redis node based on the remainder.
Therefore, the hit rate at this time is 20%, that is, when the number of redis nodes changes from 4 to 5, the probability that the original resources are still stored on the corresponding redis node is 20%, and the remaining 80% needs to be reallocated. Greater impact. Therefore, deleting or adding a redis node using traditional algorithms will cause a large amount of cache loss, causing a large impact on the backend server. When the amount of data reaches tens of millions, if the business code is penetrating, a large amount of data will pass through the cache and hit the DB directly, causing the database to collapse. (Hash chain method)

Principle of consistent hash algorithm

Using hash ring algorithm
The Hash chain only goes through one hash, that is, the key is hashed to the corresponding machine number.
The Hash ring has 2 hashes:
(1) Hash all machine numbers into this ring, and use the IP of the machine to take modulo 2 to the 32nd power.
(2) Hash the key to this ring. Then perform matching on this ring to see which machine this key matches.
In this way, each machine is responsible for the data on the corresponding segment.

Their positions in this ring space will be fixed, so the following storage relationship will be formed: the ring cache and the storage value of the key, and the storage value is worth the ring cache. Because the hash algorithm is always the same, the corresponding storage locations are also the same;

If the architecture changes at this time and a cache node B is removed, the changed object4 will be stored in cacheC. Therefore, the range of impact is the range between cacheB and cacheA, and the impact is relatively small.
Advantages: Avoid the problem of cache avalanche caused by changes in the cache server address causing the entire cache to be hashed;
At this time, if instead of removing the node, a new node cacheD is added, object2 will no longer be stored in cacheC, but will be stored in cacheD. At this time, the scope of the impact will also be between cacheB and cacheD. Therefore, no matter whether a node is added or deleted, the scope of impact is very small.

Hash skewness

However, the hash algorithm is skewed. In the above figure, the three cache nodes ABC are relatively evenly distributed. However, the actual situation will be as shown in the figure below. ABC and they may be very close together. From the figure, there will be a large amount of data falling on A, which is not random, and the load performance of the three cache nodes is uneven.

Virtual node

Therefore, virtual nodes need to be added. Each cache node will generate a virtual node, rehash it, and redistribute it to the ring hash space, as shown in the figure below, which is relatively even. But even if virtual nodes are added, hash skew problems will still occur. Indeed, a certain ratio of virtual nodes to real nodes is configured during the actual encoding process. As more and more data becomes available, the number of virtual nodes becomes lower and lower, minimizing the impact.
Consistent hashing hit rate
(1-n/(n + m))*100%
The number of servers is n, and the number of new servers is m. When the number of changed servers m is larger, the hit rate is larger, so the impact of the change is getting smaller and smaller. As distributed clusters become larger and larger, the advantages of consistent hashing algorithms become more obvious.

The ShardedJedis object taken from the redis distributed connection pool, and this object is ultimately inherited from Sharded. It can also be seen in the source code that when initializing the block, there will be a virtual node with 160 times the weight. In general scenarios, 100-500 virtual nodes will be set up

Five major problems such as Redis avalanche, penetration, and concurrency

Cache Avalanche

The data is not loaded into the cache, or the cache fails in a large area at the same time, causing all requests to check the database, causing the database CPU and memory load to be too high, or even downtime.

Countermeasures

High availability of cache

The cache layer is designed to be highly available to prevent large-scale cache failures. Even if individual nodes, individual machines, or even computer rooms are down, services can still be provided. For example, Redis Sentinel and Redis Cluster have achieved high availability.

Cache downgrade

You can use local caches such as ehcache (temporarily supported), but the main focus is to limit source service access, resource isolation (circuit breaker), downgrade, etc.
When the number of visits increases sharply and service problems occur, you still need to ensure that the service is still available. The system can automatically downgrade based on some key data, or it can configure switches to implement manual downgrade, which will involve the cooperation of operation and maintenance.
For example, in recommendation services, many of them are personalized needs. If the personalized needs cannot be provided, hot data can be downgraded to supplement it, so that the front-end page will not be a big blank.
Before downgrading, it is necessary to sort out the system, such as: which businesses are core (must be guaranteed), which businesses can be allowed to temporarily not provide services (use static pages to replace), etc., and to cooperate with the core indicators of the server, and then set up an overall plan, such as :
(1) General: For example, some services occasionally time out due to network jitter or the service is going online, and can be automatically downgraded;
(2) Warning: Some services have fluctuating success rates within a period of time (such as between 95 and 100%), and can be automatically downgraded or manually downgraded, and alarms can be sent;
(3) Error: For example, if the availability rate is lower than 90%, or the database connection pool is exhausted, or the number of visits suddenly increases to the maximum threshold that the system can bear, then it can be automatically downgraded or manually downgraded according to the situation;
(4) Serious errors: For example, if the data is incorrect due to special reasons, emergency manual downgrade is required.

3.Redis backup and quick warm-up

1)Redis data backup and recovery
2) Fast cache warm-up
4. Practice in advance
Finally, it is recommended that before the project goes online, it is recommended to practice the load conditions of the application and backend and possible problems after the cache layer is down, and preview high availability in advance to detect problems in advance.

Cache penetration

Cache penetration refers to querying data that does not exist. For example: If there is no hit in redis from the cache, it needs to be queried from the mysql database. If the data cannot be found, it will not be written to the cache. This will cause the non-existent data to be queried in the database every time it is requested, causing cache penetration.
Solutions:
If the query database is also empty, directly set a default value and store it in the cache, so that the value will be retrieved from the cache for the second time without continuing to access the database. Just set an expiration time or replace the value in the cache when there is a value.
You can set some format rules for keys, and then filter out keys that do not comply with the rules before querying.

Cache concurrency

Concurrency here refers to concurrency problems caused by multiple redis clients setting keys at the same time. In fact, redis itself is a single-thread operation. Multiple clients operate concurrently. According to the principle of first come, first execution, the first come first is executed, and the rest are blocked. Of course, another solution is to put the redis.set operation in the queue to serialize it, and it must be executed one by one.

Cache warm-up

Cache preheating means loading relevant cache data directly into the cache system after the system goes online.
In this way, you can avoid the problem of querying the database first and then caching the data when the user requests it! Users directly query cached data that has been preheated!
Solutions:
1. Directly write a cache refresh page and do it manually when going online;
2. The amount of data is not large and can be loaded automatically when the project is started;
The purpose is to load data into the cache before the system goes online.
The above is an introduction to cache avalanche, warm-up, downgrade, etc.

Bloom filter

The Bloom filter is a space-efficient random data structure that uses bit arrays and hash functions to determine whether an element exists in a set. It is mainly used to determine whether an element exists in large-scale data.

Principle

The Bloom filter is based on a bit array and several hash functions. The bit array is an array composed of 0 and 1, and the initial values are all 0. When an element is added to the Bloom filter, multiple hash values are generated through multiple hash functions, and then the bit array positions corresponding to these hash values are set to 1. When an element is queried to see whether it exists in a Bloom filter, multiple hash values are also generated through multiple hash functions, and then the bit array positions corresponding to these hash values are queried to see if they are all 1. If any bit array position is not 1, then the element definitely does not exist in the Bloom filter. If all bit array positions are 1, then the element is probably present in a bloom filter. Because multiple elements may be hashed to the same bit array position, misjudgments may occur, but no element will be missed.

Advantages of Bloom filters

Bloom filters have the following advantages compared to other data structures:

High space efficiency: Bloom filter only requires a bit array and several hash functions, so it is very space efficient.
High query efficiency: The query efficiency of the Bloom filter is very high, because it only needs to query the bit array without actually querying the data.
Highly scalable: Bloom filters can dynamically adjust the bit array size as needed.

Disadvantages of Bloom filters

Bloom filters have the following disadvantages compared to other data structures:

Unable to delete elements: Because the bit corresponding to the element can only be set to 1 in the bit array of the Bloom filter and cannot be set to 0, the element cannot be deleted.
There is a false positive rate: Since the Bloom filter uses a hash function, the false positive rate is unavoidable when processing large amounts of data. Even if the number of hash functions and the size of Bloom filters are increased, the false positive rate cannot be completely eliminated.

Redis Bloom filter implementation

Redis provides an implementation of Bloom filters, which can be operated through Redis commands. The following are common commands for Redis bloom filters:

2.1 BF.ADD adds elements to a Bloom filter.

grammar:

BF.ADD key element [element …]
parameter:

key: The name of the bloom filter.
element: The element to be added.
return value:

If the element already exists in the bloom filter, return 0.
If the element does not exist in the bloom filter, adds the element to the bloom filter and returns 1.
Example:

BF.ADD myfilter fooBF.ADD myfilter bar

2.2 BF.EXISTS determines whether the element exists in the Bloom filter.

grammar:

BF.EXISTS key element
parameter:

key: The name of the bloom filter.
element: The element to be queried.
return value:

Returns 1 if the element exists in the bloom filter.
If the element does not exist in the bloom filter, returns 0.
Example:

BF.EXISTS myfilter fooBF.EXISTS myfilter baz

2.3 BF.MADD adds multiple elements to the Bloom filter.

grammar:

BF.MADD key element [element …]
parameter:

key: The name of the bloom filter.
element: The element to be added.
return value:

Returns an array indicating whether each element was added successfully. If the element already exists in the Bloom filter, return 0; if the element does not exist in the Bloom filter, add the element to the Bloom filter and return 1.
Example:

BF.MADD myfilter foo bar baz

2.4 BF.MEXISTS determines whether multiple elements exist in the Bloom filter.

grammar:

BF.MEXISTS key element [element …]
parameter:

key: The name of the bloom filter.
element: The element to be queried.
return value:

Returns an array indicating whether each element is present in the bloom filter. Returns 1 if the element exists in the Bloom filter; returns 0 if the element does not exist in the Bloom filter.
Example:

BF.MEXISTS myfilter foo bar baz

2.5 BF.INFO obtains Bloom filter information.

grammar:

BF.INFO key
parameter:

key: The name of the bloom filter.
return value:

Returns information about the Bloom filter, including the size of the Bloom filter, the number of hash functions, and the false positive rate.
Example:

BF.INFO myfilter

Cache update strategy

Memory elimination (no coding required)

Timeout elimination (no coding required)

Active update (requires coding)

We need to manually write business logic and update the cache while modifying the database. There are three active update strategies:

Cache Aside Pattern: The caller of the cache updates the cache while updating the database.
Read/Write Through Pattern: The cache and database are integrated into a service, and the service maintains consistency. The caller calls the service without worrying about consistency issues.
Write Behind Caching Pattern: The caller only operates the cache, and other threads asynchronously persist the cache data to the database, ultimately maintaining consistency.

Generally, the memory elimination mechanism can be used in scenarios with relatively low data consistency requirements, such as the classification information on the homepage of the mall. These things will basically not change. If the consistency requirements are relatively high, we can use active update + timeout elimination to handle it.

The most commonly used proactive update strategy in the enterprise is the Cache Aside Pattern. That is, we code it ourselves to ensure data consistency.

There are three issues we need to consider when operating caches and databases

Delete cache or update cache
1) Update cache: The cache is updated every time the database is updated, and there are many invalid write operations.
The shortcomings of this method are obvious. For example: If I update the database 100 times, and then update the cache 100 times at the same time, but no one checks the data during the update, then if I update the cache 100 times, it will look like It’s useless, it’s equivalent to the first 99 times being useless, and only the last one is useful. This is why there are too many invalid writes.

2) Delete the cache: Invalidate the cache when updating the database, and update the cache when querying. (Lazy loading) Generally choose this option
This solution is more reasonable and can avoid too many invalid write operations. After the cache is deleted, as long as no one queries this data, the data will not be written to the cache, thus avoiding a large number of invalid write operations.

Cache and database consistency

1) Monolithic system, putting cache and database operations in one transaction.
2) Distributed systems, using distributed transaction solutions such as TCC.

Cache or database should be operated first

1. Delete the cache first, then operate the database

There are obvious problems with this approach. Suppose there are two concurrent operations, thread A updates and thread B queries. Thread A first deletes the cache, and then before it has time to update the database, the CPU resources are snatched away by thread B. Thread B queries the cache and finds that there is no hit (because it has been deleted by thread A), queries the database, and then writes the results into the cache. At this time, thread A finally grabs the CPU resources and then updates the database, which will cause data inconsistency.

2. Operate the database first, then delete the cache (the most commonly used method)

This processing method is used most frequently because the probability of error is very small, and only an extreme situation will cause data consistency problems.
There are also two concurrent requests, thread A queries and thread B updates. When thread A queries, the cache just expires, and then queries the database to get the data. When preparing to write to the cache, the CPU resource is snatched away by thread B. , thread B starts to update the database, and then deletes the cache (this step is actually useless because the cache has expired). At this time, thread A obtains the CPU resources again and writes to the cache. At this time, the old data before the update is written, which will cause data consistency problems.
It seems that this is indeed a problem, but let us carefully analyze the conditions that need to be met in this situation:

Concurrent read and write operations
When reading the cache, the cache just expires
Writing to the database is faster than writing to the cache
Writing to the database operates on the disk, and writing to the cache operates on the memory, so it is unlikely that writing to the disk will be faster than writing to the memory. Therefore, the probability of data consistency using this method is very small.

3. Delayed double deletion strategy

The delayed double delete strategy is a common strategy for database storage and cache data to maintain consistency in distributed systems,but it is not strongly consistent. In fact, no matter which solution is used, the problem of dirty data in Redis cannot be avoided. It can only alleviate this problem. To completely solve it, synchronization locks and corresponding business logic levels must be used to solve it.

We have analyzed the shortcomings of the first two options. The second method is used more frequently, but it also has some minor flaws. Although the probability of occurrence is very low, it is hard to say whether this probability will happen online. Therefore, there is a delayed double deletion strategy to supplement the second method.

The so-called delayed double deletion is to clear the cache first, then perform database operations, and finally (delay N seconds) before clearing the cache. The delay of N seconds should be greater than the time of a write operation. This delay of N seconds is to perfect and ensure the shortcomings of the second strategy. It can ensure that thread A’s write cache and thread B’s modification of the database and deletion of the cache are executed. After completion, delete the cache again to ensure that subsequent query requests can query the latest data.
ps: The general delay time is set to about 3S. The optimal value should be determined according to the business scenario.