Technical breakdown | Why does Redis use different encodings internally?

One weekend night, I suddenly received a wave of alarms about increased time consumption. After a closer look at the alarm message, it turned out that a slow check request occurred, which caused a significant increase in cluster time consumption. At this time, business students also received alarms that upstream services were affected. During the process of solving the problem, the operation and maintenance students discovered that only some instances in the Redis cluster experienced an increase in CPU utilization, and the slow check logs were also concentrated on these instances, while the upstream business did not go online or the business model changed at this time. Because a small number of hot key accesses resulted in high load on some Redis instances, current limiting was detrimental to the business, and capacity expansion could not achieve the goal of quickly stopping losses. Fortunately, the business side had a downgrade plan formulated in advance, and after quick communication, the business side downgrade was adopted. The plan was stopped.

Case review

Although the failure can be stopped in time through the business side downgrade plan, as a Redis service provider, it is necessary to conduct a review to find out the root cause of the problem and formulate a plan, so as to avoid the problem from happening again or stop the loss faster when it occurs. . By understanding the business scenario, we found that due to heavy rain in City A, the queue of orders increased three times. Huh? This scene is a bit familiar. This problem has occurred in this city during heavy rains before. This is why there is a downgrade plan on the business side. After understanding the background, the next step is to accept the soul torture of business students.

Why did the cluster have higher QPS during the same time period yesterday but no similar problems occurred?
This cluster stores data from multiple cities. Why is it that only city A has problems when it rains heavily, while other cities have no problems? Could it be that other cities don’t have heavy rains? Or does everyone not need the queuing function?

The answer is obviously no. There must be some technical issues that have not yet been understood. Next, I will take you to review our troubleshooting ideas at that time.

First, starting from observing some phenomena during the fault, we found the following phenomena:

At that time, 5 hot keys appeared, all belonging to the same city A. Due to heavy rain that night, the number of queued orders in this city was three times more than usual;
These five hot keys are all hash data structures, and each key stored in them has more than 400 elements;
With the same business logic, another city B also has 5 hash tables. The QPS is higher than that of city A, and the number of elements in the key is also higher than that of city A. However, city B does not have the problem of high time consumption;
When the problem occurs, most of the commands on these five hot keys are HEXISTS (used to check whether a field is in the hash table), and this command has O(1) complexity.

At this time, an idea flashed in my mind. When checking the meta information of the hot key, I found that the internal encoding used by the five keys of city A was ziplist. At this time, I compared the five keys of city B and found that the hashtable encoding was used. This discovery makes us seem to see the dawn of a solution to the problem. This is the biggest difference found so far between City A and City B. Subsequent analysis also proved that this direction is correct.

Comparison of two internal encodings of hash objects

Why does Redis’ hash data structure use two different encoding methods? What are the different consequences of different encoding methods?

Why doesRedis use different encodings internally?

Redis uses objects to represent keys and values in the database. Every time we create a new key-value pair in the Redis database, we will create at least two objects, one object is used as the key of the key-value pair (key object) , another object used as the value of the key-value pair (value object). For example: The command SET mykey “HelloWorld” will create two Redis objects, a string object “mykey” as the key, and a string object “HelloWorld” as the value. The structure of the Redis object is as follows:

typedef struct redisObject {
// type
unsigned type:4;
//encoding
unsigned encoding:4;
// Pointer to underlying implementation data structure
void *ptr;
// ...
} robj;

A Redis object can have multiple types (types), such as strings, lists, hashes, sets, ordered sets, etc. At the same time, each type can also be set to a different encoding (encoding): int, hash (ht) , zipmap, ziplist, intset, skiplist, embstr, quicklist, stream. Setting the encoding used by the object through the encoding attribute, rather than associating a fixed encoding with a specific type of object, greatly improves the flexibility and efficiency of Redis, allowing Redis to set settings for an object according to different usage scenarios. Different encodings to optimize the efficiency of objects in a certain scenario.

Data structure of two encoding methods

Encoding method 1: ziplist

<zlbytes><zltail><zllen><entry>...<entry><zlend>

Encoding method 2: hashtable encoding

Why use ziplist encoding?

When the hash object contains relatively few elements, Redis uses a compressed list as the underlying implementation of the list object, because the compressed list saves more memory than the double-ended linked list, and when the number of elements is small, the compressed list is saved in continuous blocks in memory. Lists can be loaded into the CPU cache faster than double-ended linked lists;
As the list object contains more and more elements, the performance of using a compressed list to save elements will become worse. Redis will convert the list object from compressed list encoding to hashtable encoding at the bottom. Hashtable encoding is a more suitable way to save a large number of elements. A double-ended linked list of elements.

Processing logic for the two encoding structures when the command is executed

When executing the hset/hmset command, the code will check the hash object encoding and related conditions to determine which encoding method should be used. The general process is as follows:

1. Check value size

First, check in hashTypeTryConversion(). When the encoding is OBJ_ENCODING_ZIPLIST, and the written value > server.hash_max_ziplist_value (our online default setting is 64 Bytes), the transcoding function hashTypeConvert() will be executed.

2. Check the number of fields

In hashTypeSet(), if the encoding is OBJ_ENCODING_ZIPLIST and the number of elements is >server.hash_max_ziplist_entries (our online default setting is 512), the transcoding function hashTypeConvert() will be executed

3. hashTypeConvert() function implementation

In hashTypeConvert(), the hashTypeConvertZiplist() function is called. First, a dictionary dict is created, and then the fields and values in the ziplist are copied one by one, and the corresponding Redis objects are created and put into the dict. After all the data in the ziplist are copied and added to the dict, Release the original ziplist, and then point the key to this dict, so that the encoding of the hash object is converted from ziplist to hashtable. This process is blocking. The hset/hmset command is not completed until the hashTypeConvert function is executed.

The source code is as follows:

void hashTypeConvertZiplist(robj *o, int enc) {
    serverAssert(o->encoding == OBJ_ENCODING_ZIPLIST);
 
    if (enc == OBJ_ENCODING_ZIPLIST) {
        /* Nothing to do... */
 
    } else if (enc == OBJ_ENCODING_HT) {
        hashTypeIterator *hi;
        dict *dict;
        int ret;
 
        hi = hashTypeInitIterator(o);
        dict = dictCreate( & amp;hashDictType, NULL);
 
        while (hashTypeNext(hi) != C_ERR) {
            robj *field, *value;
 
            field = hashTypeCurrentObject(hi, OBJ_HASH_KEY);
            field = tryObjectEncoding(field);
            value = hashTypeCurrentObject(hi, OBJ_HASH_VALUE);
            value = tryObjectEncoding(value);
            ret = dictAdd(dict, field, value);
            if (ret != DICT_OK) {
                serverLogHexDump(LL_WARNING,"ziplist with dup elements dump",
                    o->ptr,ziplistBlobLen(o->ptr));
                serverAssert(ret == DICT_OK);
            }
        }
 
        hashTypeReleaseIterator(hi);
        zfree(o->ptr);
 
        o->encoding = OBJ_ENCODING_HT;
        o->ptr = dict;
 
    } else {
        serverPanic("Unknown hash encoding");
    }
}

Comparison of time complexity of two encodings

After talking about some background knowledge, it’s time to enter the real topic. Why does the hot key problem in city A cause business impact? First, let’s take a look at the time complexity of hash objects using ziplist and hashtable commands with different encodings:

Command

Ziplist encoding implementation method

Implementation method of hashtable encoding

HSET

First call the ziplistPush function to push the keys to the end of the compressed list, and then call the ziplistPush function again to push the values to the end of the compressed list.

Time complexity: O(1)

Call the dictAdd function to add new nodes to the dictionary.

Time complexity: O(1)

HGET

First call the ziplistFind function to find the node corresponding to the specified key in the compressed list, then call the ziplistNext function to move the pointer to the value node next to the key node, and finally return the value node.

Time complexity: O(N)

Call the dictFind function to find a given key in the dictionary, and then call the dictGetVal function to return the value corresponding to that key.

Time complexity: O(1)

HEXISTS

Call the ziplistFind function to search for the node corresponding to the specified key in the compressed list. If it is found, it means that the key-value pair exists. If it is not found, it means that the key-value pair does not exist.

Time complexity: O(N)

Call the dictFind function to search for a given key in the dictionary. If it is found, it means that the key-value pair exists. If it is not found, it means that the key-value pair does not exist.

Time complexity: O(1)

HDEL

Call the ziplistFind function to find the node corresponding to the specified key in the compressed list, and then delete the corresponding key node and the value node next to the key node.

Time complexity: O(N)

Call the dictDelete function to delete the key-value pair corresponding to the specified key from the dictionary.

Time complexity: O(1)

HLEN

Call the ziplistLen function to get the total number of nodes contained in the compressed list, divide this number by 2, and the result is the number of key-value pairs saved in the compressed list.

Time complexity: O(1)

Call the dictSize function to return the number of key-value pairs contained in the dictionary. This number is the number of key-value pairs contained in the hash object.

Time complexity: O(1)

HGETALL

Iterate through the entire compressed list and use the ziplistGet function to return all keys and values (which are nodes).

Time complexity: O(N)

Traverse the entire dictionary, use the dictGetKey function to return the dictionary keys, and use the dictGetVal function to return the dictionary values.

Time complexity: O(N)

Note: The operation of the dictionary dict structure can be simply considered to be O(1). Although if there is a hash conflict, multiple keys will be placed in the linked list, and many corresponding operations will need to traverse the linked list, but generally the linked list will not be too large. Compared with the size of the entire hash table, it can be considered O(1).

It is not difficult to see from the above comparison: The time complexity of the HEXISTS command using hashtable is O(1), while using ziplist encoding is O(N). When a problem occurs, the hash table elements stored in city A are 400 ( Using ziplist encoding), and the number of hash table elements in city B exceeds 512 (using hashtable encoding), which explains why the performance of accessing city A’s key is worse.

Testing and verification of integrating theory with practice

Test plan

After talking about the theory, we need to use actual testing to prove the above inference (here is only a performance test description of the HEXISTS command).

The test method is as follows:

Generate two hash objects in the same Redis instance: mykey001 and mykey002, both of which have 500 fields, and each value=60B;
When stress testing mykey001, the hash-max-ziplist-entries parameter = 512, and the encoding will use ziplist. When stress testing mykey002, the hash-max-ziplist-entries parameter = 256, so the encoding will be hashtable;
The same pressure is sent in each round. The test here is 20,000 QPS and the command is HEXISTS.

Test Data:

QPS is the same for both rounds of testing

P99 time-consuming comparison

cpu utilization comparison

Comparison of memory occupied by hash objects

Encoding

500 elements

ziplist

4.3MB

hashtable

19.6MB

Test conclusion

The P99 time consumption of hashtable encoding (HEXISTS command) is reduced by 14% (70us -> 60us)
When hashtable encoding (HEXISTS command) Redis’s CPU utilization drops by 72% (50% vs 14%)
The memory usage of hashtable encoding is 4.6 times that of ziplist

Additional instructions:

Each hash object stores 500 fields, each field is 8 Bytes in length, and each value is an integer timestamp.
Memory usage is measured by viewing the used_memory_rss indicator in info. This indicator is actually collected from /proc/$pid/stat and should be the most accurate memory usage data. A single object is viewed through debug object, which is not very accurate.

A small episode. During the previous test, I used debug object to obtain the memory usage of the key. Ignore a problem. This command reuses the rdbSaveObject function. When calculating the length of the Redis object in this function, if the system parameter rdbcompression is set to yes, and the length exceeds 20B, it will be compressed, which will cause the serializedlength value output by the debug object to be distorted. In version 3.2.8, ziplist will be compressed in the rdbSaveObject function, and hashtable will not be compressed, so the rdb file generated during ziplist encoding It will be much smaller. In versions after 3.2.8, hashtable is also compressed in the rdbSaveObject function, and the size difference of the generated rdb files is not that big. Here we only test the 3.2.8 version for this case.

# Use ziplist encoding
redis-cli -p 5555 config set hash-max-ziplist-entries 512
OK
redis-cli -p 5555 config set rdbcompression yes
OK
redis-cli -p 5555 debug object mykey002
Value at:0x7fdf72868d40 refcount:1 encoding:ziplist serializedlength:3692 lru:5959713 lru_seconds_idle:35
redis-cli -p 5555 config set rdbcompression no
OK
redis-cli -p 5555 debug object mykey002
Value at:0x7fdf72868d40 refcount:1 encoding:ziplist serializedlength:34585 lru:5959713 lru_seconds_idle:35




# Use ht encoding
redis-cli -p 5555 config set hash-max-ziplist-entries 256
OK
redis-cli -p 5555 hset mykey002 mykey00000000000000000000001500 8c1dfc543d72c0826fc0968276932c88af7ed1deebd74d9c95c129fe6371500
(integer) 1
redis-cli -p 5555 config set rdbcompression yes
OK
redis-cli -p 5555 debug object mykey002
Value at:0x7fdf72868d40 refcount:1 encoding:hashtable serializedlength:33651 lru:5959748 lru_seconds_idle:0
redis-cli -p 5555 config set rdbcompression no
OK
redis-cli -p 5555 debug object mykey002
Value at:0x7fdf72868d40 refcount:1 encoding:hashtable serializedlength:33666 lru:5959748 lru_seconds_idle:

Optimization ideas

After the above test, we will find that ziplist has an advantage in terms of cost, but for some operations that were originally O(1), they will become O(N). If you pursue performance and do not pay attention to cost, or you only need to perform a few operations Key pursues performance and can be optimized from the following aspects:

1. Redis server-side optimization:

By adjusting hash-max-ziplist-entries or hash-max-ziplist-value configuration items, you can control the encoding method of hash objects. For example, if you change hash-max-ziplist-entries from 512 to 256, there will be more than 256 fields. It will be automatically converted into a hashtable, but please note that if most of your hash objects have more than 256 fields, such modification may cause a significant increase in memory usage, so you need to pay attention to your wallet.

2. Business side optimization methods:

Although it is simple to adjust parameters on the server side, the parameters need to be determined according to the actual business situation, and the number of elements will change in actual use. There are no operable conditions for adjusting parameters, and it will also cause unnecessary costs when the number of keys is large. waste. If only some hash objects need to pay attention to performance, you can consider optimizing a single key at the business layer.

Meet hash-max-ziplist-value condition

For example, if a special field is stored in a hash object and the value of this field exceeds 64B, the key will automatically become hashtable encoding, but the business needs to be able to identify this special field to avoid bugs.

Meet hash-max-ziplist-entries condition

Many business hash objects are split once. They are split into multiple hash objects by taking the modulo hash method. You can also reduce the number of split hash objects to satisfy the number of fields in a single hash object. The number exceeds 512 conditions.

3. Business usage suggestions:

Try to avoid hot key phenomenon

We are using the cluster version of Redis. If most of the business traffic is concentrated on a certain key or a few keys, the distributed cluster cannot be fully utilized, and the pressure is concentrated on individual Redis instances, resulting in problems. Hotspot bottleneck.

Pay attention to performance issues when using data structures such as list/hash/set/zset

You need to consider the performance and cost based on the number of fields in your single key, the value size, and the commands used. If you need to know the actual encoding number of your key, you can view it through the command: object encoding.

Summarize

Through this problem analysis, we can see that different encodings provided within Redis will bring different performance and cost differences. It is recommended that when using Redis, you can also learn more about your own access scenarios and make some adjustments based on the actual situation. . At the same time, it also reminds us that only by always maintaining the spirit of exploring the root causes of problems can we continuously improve our own technical and business capabilities.

The knowledge points of the article match the official knowledge files, and you can further learn related knowledge. MySQL entry skill treeUsing the databaseDatabase coding 77917 people are learning the system