Redis source code ZipList compression list

The three data types List (before version 3.2), Hash and Sorted Set can all use compressed lists (ziplist) to save data.

The underlying quickList of the new version of Redis also uses zipList support. The Redis version is updated frequently, and this article does not guarantee timeliness.

1. ziplist structure

ziplist is a special doubly linked list. Unlike ordinary linked lists, which use front and rear pointers to associate them together, it is stored in continuous memory and is designed mainly to save memory.

Redis’ official definition of ziplist is (from the file header comment of ziplist.c):

The ziplist is a specially encoded dually linked list that is designed to be very memory efficient. It stores both strings and integer values, where integers are encoded as actual integers instead of a series of characters. It allows push and pop operations on either side of the list in O(1) time.

Translate: ziplist is a specially encoded doubly linked list, and its design goal is to improve storage efficiency. ziplist can be used to store strings or integers, where the integers are encoded as true binary representations rather than as sequences of strings. It can provide push and pop operations at both ends of the table with O(1) time complexity.

1. Code definition

The function definition and implementation code of compressed list are in ziplist.h and ziplist.c respectively.

However, we actually cannot see the structure definition of the compressed list in the ziplist.h file at all. This is because the compressed list itself is a continuous memory space that uses different encodings to save data.

In order to facilitate the understanding of the design and implementation of the compressed list, let’s first take a look at its creation function ziplistNew, as shown below:

unsigned char *ziplistNew(void) {
    //Initial allocated size
    unsigned int bytes = ZIPLIST_HEADER_SIZE + ZIPLIST_END_SIZE;
    unsigned char *zl = zmalloc(bytes);
    …
   //Set the end of the list to ZIP_END
    zl[bytes-1] = ZIP_END;
    return zl;
}

In fact, the logic of the ziplistNew function is very simple, which is to create a continuous memory space with a size of the sum of ZIPLIST_HEADER_SIZE and ZIPLIST_END_SIZE, and then assign the last byte of the continuous space to ZIP_END, indicating the end of the list.

In addition, you should note that the three macros ZIPLIST_HEADER_SIZE, ZIPLIST_END_SIZE and ZIP_END defined in the above code are also defined in ziplist.c, which respectively represent the list header size, list tail size and list tail byte content of ziplist. As follows.

//The list header size of ziplist includes two 32-bit integers and one 16-bit integer, which respectively represent the total number of bytes of the compressed list, the offset of the last element in the list from the list header, and the elements in the list number
//The list header size of ziplist includes two 32-bit integers and one 16-bit integer, which respectively represent the total number of bytes of the compressed list, the offset of the last element in the list from the list header, and the number of elements in the list
#define ZIPLIST_HEADER_SIZE (sizeof(uint32_t)*2 + sizeof(uint16_t))
//The tail size of the ziplist list, including an 8-bit integer, indicating the end of the list.
#define ZIPLIST_END_SIZE (sizeof(uint8_t))
//ziplist list tail byte content
#define ZIP_END 255

Then, after creating a new ziplist, the memory layout of the list will be as shown below. Note that there is no actual data in the list at this time.

Then, when we insert data into the ziplist, the ziplist encodes it differently depending on whether the data is a string or an integer, and their size. This design idea of encoding accordingly according to data size is exactly what Redis adopts to save memory.

2. Storage structure

The ziplist list item includes three parts, namely the length of the previous item (prevlen), the encoding result of the current item’s length information (encoding), and the actual data of the current item (data). The figure below shows the structure of the list items (the contents in the figure except the list items are the beginning and end of the ziplist memory space respectively).

3. Node structure and coding

Node structure:

, but sometimes the value is very small, and the data can be saved using without , that is, .

4. encoding encoding

The encoding field of entry depends on the content of the specific value and is divided into two types: string and number, which are processed separately.

In fact, the so-called encoding technology refers to using different numbers of bytes to represent the saved information. In ziplist, encoding technology is mainly applied to the two metadata of prevlen and encoding in the list item. The actual data of the current item, data, is normally represented by an integer or a string.

2. Disadvantages of ziplist

The layout of a ziplist data structure in memory is a continuous memory space. The beginning of this space is a fixed-size 10-byte metadata, which records the total number of bytes of the ziplist, the offset of the last element, and the number of list elements. The memory space after these 10 bytes is The actual list data is saved. At the last part of the ziplist, there is a 1-byte identifier (fixed at 255), which is used to indicate the end of the ziplist, as shown in the following figure:

Although ziplist saves memory space by saving data in a compact memory layout, ziplist also faces two consequent shortcomings:

High search complexity
Chain update risk

1. High search complexity

Because the size of the ziplist head and tail metadata is fixed, and the position of the last element is recorded in the ziplist header, when searching for the first or last element in the ziplist, it can be found quickly.

But the problem is that when you want to find an element in the middle of the list, ziplist has to traverse from the head or the end of the list. When ziplist saves too many elements, the complexity of finding intermediate data increases. What’s worse is that if the ziplist stores a string, when the ziplist is looking for an element, it needs to judge each character of the element one by one, which further increases the complexity.

Because of this, when we use ziplist to save Hash or Sorted Set data, we will control the data saved in ziplist through the two parameters hash-max-ziplist-entries and zset-max-ziplist-entries in the redis.conf file. the number of elements in .

Not only that, in addition to the high search complexity, if ziplist runs out of memory space when inserting elements, ziplist will also need to reallocate a continuous memory space, which will further cause chain update problems.

2. Cascading update problem

When the length of the entry before an entry changes, it will be necessary to increase the size of the entry’s prevlen field to store the length of the previous entry. If there are multiple consecutive entries with a capacity close to 254, multiple entries will occur. The size of prevlen needs to be expanded, and a so-called cascade update occurs.

Under what circumstances does a cascading update occur:

Cascading updates may occur when inserting elements or deleting elements in ziplist.

3. When to use zipList 1. When to use ziplist when hashing?

Meet the following conditions at the same time:

The string length of all key values stored in the hash object is less than 64 bytes;
The number of key-value pairs saved by the hash object is less than 512;

Why not just use hastable:

Compared with hashtable, the ziplist structure has fewer pointers, which greatly reduces the use of memory, and memory is very precious for redis.

Why not use linklist?

When ziplist is stored, the memory allocation is continuous, and the query is faster. The speed here is only relative to the double-ended queue.

How does ziplist implement hash storage?

Save two nodes of the same key-value pair next to each other, with the node that holds the key in front and the node that holds the value in the back. The newly added key-value pair is placed at the end of the compression list.

2. When does ZSet use zipList?

In redis.conf, there are the following two parameters:

zset-max-ziplist-entries 128
zset-max-ziplist-value 64

Neither of these two conditions are met, use ziplist compression table to implement sorted set

When one of these two conditions is met, the internal implementation of sorted set will be converted from ziplist to zset

3. When does List use zipList

Before version 3.2, Redis list used two data structures as the underlying implementation:

Quicklist is a data type introduced in Redis 3.2. The early list type used ziplist (compressed list) and a doubly linked list. Redis 3.2 instead uses quicklist to store list elements.

References:

https://redis.com/ebook/part-2-core-concepts/01chapter-9-reducing-memory-use/9-1-short-structures/9-1-1-the-ziplist-representation/

The knowledge points of the article match the official knowledge files, and you can further learn relevant knowledge. MySQL entry-level skills treeDatabase compositionTable 69610 people are learning the system