One article teaches you to completely defeat Redis Bigkey and Hotkey problems

Foreword

Bigkey and hotkey are two relatively common problems in Redis production. This article analyzes these two problems from the perspectives of their concepts, hazards, discovery, and solutions.

bigkey

Concept

In layman’s terms, Big Key means that a certain key corresponds to a large value and takes up a lot of redis space, which is essentially a problem of large value. The key is often set by the program itself, and the value is often not controlled by the program, so the value may be very large.

The values corresponding to these Big Keys in redis are very large, and it takes a lot of time in the serialization/deserialization process. Therefore, when we operate Big Keys, it is usually time-consuming, which may cause redis to block, thereby reducing Redis performance.

Use several practical examples to describe the characteristics of the big key:

A Key of String type, its value is 5MB (the data is too large)
A Key of List type, its number of lists is 20,000 (too many lists)
A Key of type ZSet with 10,000 members (too many members)
A Key in Hash format, although the number of its members is only 1000, but the total value of these members is 10MB (the size of the members is too large)

The general industry (refer to Alibaba and Kuaishou Redis development specifications) has the following specifications for keys:

The string type is controlled within 10KB, and the number of hash, list, set, and zset elements should not exceed 5000.

Hazard

Slow query: Due to the large amount of data contained in bigkey, the execution time of a request may be very long, resulting in slow query problems
Cluster memory distribution is unbalanced: In cluster mode, nodes with more bigkeys have high memory usage, which affects cluster stability
Expiration blocking: When the bigkey expires (deletes), due to the Redis single thread, it will cause Redis to block and affect the execution of client commands
Network card overload: Imagine a string type key with a data volume of 10mb. At this time, 10,000 users request the key, which will require about 100G of network card bandwidth, which will affect the normal operation of the server.

Discover

The main idea is to scan all the keys of Redis and judge the length of the key.

Clients after Redis 4.0 provide the bigkeys command, which can find out the Key that each data type occupies the most memory.
Ali cloud redis big key search tool

Delete

There are two main methods 1. Traverse delete 2. Redis asynchronous delete command

Loop through delete

Hash delete: hscan + hdel

public void delBigHash(String host, int port, String password, String bigHashKey) {<!-- -->
    Jedis jedis = new Jedis(host, port);
    if (password != null & amp; & amp; !"".equals(password)) {<!-- -->
        jedis.auth(password);
    }
    ScanParams scanParams = new ScanParams(). count(100);
    String cursor = "0";
    do {<!-- -->
        ScanResult<Entry<String, String>> scanResult = jedis.hscan(bigHashKey, cursor, scanParams);
        List<Entry<String, String>> entryList = scanResult. getResult();
        if (entryList != null & amp; & amp; !entryList.isEmpty()) {<!-- -->
            for (Entry<String, String> entry : entryList) {<!-- -->
                jedis.hdel(bigHashKey, entry.getKey());
            }
        }
        cursor = scanResult. getStringCursor();
    } while (!"0".equals(cursor));
    
    //delete bigkey
    jedis.del(bigHashKey);
}

List delete: ltrim

public void delBigList(String host, int port, String password, String bigListKey) {<!-- -->
    Jedis jedis = new Jedis(host, port);
    if (password != null & amp; & amp; !"".equals(password)) {<!-- -->
        jedis.auth(password);
    }
    long llen = jedis.llen(bigListKey);
    int counter = 0;
    int left = 100;
    while (counter < llen) {<!-- -->
        //Cut off 100 from the left each time
        jedis.ltrim(bigListKey, left, llen);
        counter + = left;
    }
    //finally delete the key
    jedis.del(bigListKey);
}

Set delete: sscan + srem

public void delBigSet(String host, int port, String password, String bigSetKey) {<!-- -->
    Jedis jedis = new Jedis(host, port);
    if (password != null & amp; & amp; !"".equals(password)) {<!-- -->
        jedis.auth(password);
    }
    ScanParams scanParams = new ScanParams(). count(100);
    String cursor = "0";
    do {<!-- -->
        ScanResult<String> scanResult = jedis.sscan(bigSetKey, cursor, scanParams);
        List<String> memberList = scanResult. getResult();
        if (memberList != null & amp; & amp; !memberList.isEmpty()) {<!-- -->
            for (String member : memberList) {<!-- -->
                jedis.srem(bigSetKey, member);
            }
        }
        cursor = scanResult. getStringCursor();
    } while (!"0".equals(cursor));
    
    //delete bigkey
    jedis.del(bigSetKey);
}

SortedSet delete: zscan + zrem

public void delBigZset(String host, int port, String password, String bigZsetKey) {<!-- -->
    Jedis jedis = new Jedis(host, port);
    if (password != null & amp; & amp; !"".equals(password)) {<!-- -->
        jedis.auth(password);
    }
    ScanParams scanParams = new ScanParams(). count(100);
    String cursor = "0";
    do {<!-- -->
        ScanResult<Tuple> scanResult = jedis.zscan(bigZsetKey, cursor, scanParams);
        List<Tuple> tupleList = scanResult. getResult();
        if (tupleList != null & amp; & amp; !tupleList.isEmpty()) {<!-- -->
            for (Tuple tuple : tupleList) {<!-- -->
                jedis.zrem(bigZsetKey, tuple.getElement());
            }
        }
        cursor = scanResult. getStringCursor();
    } while (!"0".equals(cursor));
    
    //delete bigkey
    jedis.del(bigZsetKey);
}

Delete asynchronously

Redis4.0 already supports asynchronous deletion of keys, just use the unlink command, Redis lazyfree

Resolve

To solve the big key problem mainly through the method of big key splitting, we need to split the data of a big key into multiple small keys, and then use client fragmentation > way to access.

for example:

string type
- Convert string type to hash type, list type, and then split hash and list
- For plain string types, you can pass
  - 1. Use the more data-saving serialization protocol
  - 2. Use the data compression algorithm to perform corresponding compression and decompression operations during the access process
list type: split into small keys such as list:0, list:1, list:2, list:N, etc., and hash the data into different subkeys by id % N
set type: same as list
…

hotkey

Concept

Since the data of a certain Key must be stored in a single instance of Redis on a server at the backend, if a large number of request operations suddenly appear for this Key, this will cause the traffic to be too concentrated and reach the upper limit of the processing of a single instance of Redis, which may cause The CPU usage of the Redis instance is 100%, or the network card traffic reaches the upper limit, which affects the stability and availability of the system, or more seriously, the server is down and cannot provide external services.

For a Redis stand-alone machine, the industry generally believes that the theoretical limit OPS is around 10W, and the actual situation is related to the specific machine configuration.

Hazard

The traffic is too concentrated, causing the load of a single Redis node to be too large (a single node is generally 10W), causing the Redis service to crash, a large number of Redis requests to fail, query operations may hit the database, and the database crashes, resulting in the unavailability of the entire service.

It can be seen that hotkey will cause great harm to the availability of services, so we should find hotkey in time and solve it.

Discover

Through the above analysis, the harm of hot Key is still great. We can’t wait until the hot Key appears to have dragged down the service before processing. At that time, the business must have It is self-evident that the loss is self-evident if it is affected; then it is possible to monitor the emergence of hot Key in advance through some means before the emergence of hot Key, which is very important for ensuring the stability of the business system Sex is very important, so what means do we have to observe the emergence of hot Key in advance?

1. Estimated business traffic

According to some activities and functions launched by the business system, we can predict the emergence of hot Key in some scenarios in advance. All will be cached in Redis, which is very likely to cause hot Key problems.

Advantages: simple, discover hot Key based on experience, early detection and early processing;
Disadvantage: There is no way to predict all hot Key occurrences, such as some hot news events, which cannot be predicted in advance.

2. Client monitoring

Generally, when we connect to the Redis server, we need to use a special SDK (for example: Java client tools Jedis, Redisson code>), we can encapsulate the client tool, collect and collect before sending the request, and report the collected data to a unified service regularly for aggregation calculation.

Advantages: simple solution
shortcoming:
- There is a certain intrusion into the client code, or the secondary development of the SDK tool is required;
- Unable to adapt to the multi-language architecture, the SDK of each language needs to be developed, and the later development and maintenance costs are relatively high.

3. Proxy layer monitoring

If all Redis requests go through Proxy (proxy), you can consider changing the Proxy code to collect, the idea is basically similar to that of the client.

Advantages: It is completely transparent to the user, and can solve the language heterogeneity and version upgrade problems of the client SDK;
shortcoming:
- The development cost will be higher than that of the client;
- Not all Redis cluster architectures have Proxy agents (you must deploy Proxy in this way).

4. Redis comes with commands

hotkeys parameter

Redis added hotkeys lookup feature in 4.0.3 version, you can directly use redis-cli --hotkeys to get the current keyspacekey of /code> is realized by scan + object freq.

Advantages: no need for secondary development, and ready-made tools can be used directly;
shortcoming:
- Due to the need to scan the entire keyspace, the real-time performance is relatively poor;
- The scanning time is positively related to the number of key, if the number of key is large, it may take a long time.

monitor command

The monitor command can capture the commands received by the Redis server in real time, capture data through redis-cli monitor, and combine some ready-made analysis Tools, such as redis-faina, count hot keys.

Advantages: no need for secondary development, and ready-made tools can be used directly;
Disadvantage: Under the condition of high concurrency, this command has the hidden danger of memory explosion, and it will also reduce the performance of Redis.

5. Rely on the infrastructure capabilities of major manufacturers

In fact, all major cloud vendors have the ability to discover hotkeys and bigkeys, including the corresponding Redis monitoring tools for the base frames of major manufacturers, which can discover hotkeys and bigkeys.

Resolve

1. Multi-level cache

When the hot Key appears, load the hot Key into the JVM of the system. Subsequent requests for these hot Key will be obtained directly from JVM instead of going to the Redis layer. There are many tools for these local caching, such as Ehcache, or the Cache tool in Google Guava, or directly use HashMap can be used as a local cache tool.

There are two issues to pay attention to when using local cache:

If the hot Key is cached locally, it is necessary to prevent the local cache from being too large and affecting the JVM Heap space;
Need to deal with local cache and Redis cluster Read and write data consistency issues.

2. Load balancing

Through the previous analysis, we can understand that the reason for the hot Key is that there are a large number of requests for the same Key falling to the same Redis instance, if we can have a way to load these requests to different instances to prevent traffic skew, then the problem of hot Key will not exist.

So how to split the request for a hot Key to different instances? We can use the hot Key backup method, the basic idea is that we can add a prefix or suffix to the hot Key, and put a hot Key The number of Redis instances becomes a multiple M of the number of Redis instances, so accessing a Redis Key becomes access to M Redis Key. M Redis Key are distributed to different instances after sharding, and the access traffic is evenly distributed to all instances.

// N is the number of Redis instances, M is 2 times of N
func getData() {<!-- -->
const M = N * 2
//generate random number
random = GenRandom(0, M)
// Construct a backup new Key
bakHotKey = hotKey + "_" + random
data = redis. GET(bakHotKey)
if data == NULL {<!-- -->
data = redis. GET(hotKey)
if data == NULL {<!-- -->
// Here you can pay attention to the problem of cache breakdown and cache avalanche
data = GetFromDB()
redis.SET(hotKey, data, expireTime)
redis.SET(bakHotKey, data, expireTime + GenRandom(0, 5))
} else {<!-- -->
redis.SET(bakHotKey, data, expireTime + GenRandom(0, 5))
}
}
return data
}

question:

To waste Redis memory space, you can set a switch through the configuration center, and only visit the temporary hotkey node when the switch is turned on
data consistency
- Data consistency between multiple Redis nodes cannot be guaranteed, and some data inconsistencies may exist during data synchronization
- If there is a data update, all Redis nodes need to be updated at the same time, and there will also be data inconsistencies here

Summary

As for the hotkeys solution, there is no one solution that is a silver bullet for all scenarios. We need to choose a specific solution for the business scenario, but what we can see is that no matter what the solution is, there will be certain consistency problems, but Since there are hotkeys, it must be a case of high concurrency. In this case, we generally only need to guarantee the final consistency, and there is no need to pursue strong data consistency.

And this also reveals a truth to us-consistency and availability cannot have both, this is very CAP.