Redis data structure – dynamic string, Dict, ZipList

1. Redis data structure – dynamic string

We all know that the Key stored in Redis is a string, and the value is often a string or a collection of strings. Visible string is the most commonly used data structure in Redis.

However, Redis does not directly use strings in C language, because there are many problems with C language strings:

Obtaining the length of the string requires operations.
Not binary safe.
Unchangeable.

Redis builds a new string structure called Simple Dynamic String (SDS for short).
For example, we execute the command:

Then Redis will create two SDSs under the hood, one is the SDS containing “name” and the other is the SDS containing “KEKE”.

Redis is implemented in C language, in which SDS is a structure, the source code is as follows:

For example, an sds structure containing the string “name” is as follows:

The reason why SDS is called a dynamic string is because it has the ability to dynamically expand, for example, an SDS whose content is “hi”:

If we want to add a string “,Amy” to SDS, here we will first apply for a new memory space:

If the new string is less than 1M, the new space is twice the length of the extended string + 1;

If the new string is larger than 1M, the new space is the string length after expansion + 1M + 1. This is called memory pre-allocation.

Benefits:

The time complexity of getting the string length is O(1).
Support dynamic expansion.
Reduce the number of memory allocations.
Binary security.

Redis data structure-intset

IntSet is an implementation of the set collection in Redis, which is implemented based on an integer array and has the characteristics of variable length and order.
The structure is as follows:

The encoding contains three modes, indicating that the stored integer sizes are different:

Now, each number in the array is within the range of int16_t, so the encoding method adopted is INTSET_ENC_INT16, and the byte size occupied by each part is:
encoding: 4 bytes
length: 4 bytes
contents: 2 bytes * 3 = 6 bytes

We add a number to it: 50000, this number is beyond the range of int16_t, intset will automatically upgrade the encoding method to a suitable size.
In the current case, the process is as follows:

The upgrade encoding is INTSET_ENC_INT32, each integer occupies 4 bytes, and the array is expanded according to the new encoding method and the number of elements.
Copy the elements in the array to the correct position after expansion in reverse order.
Put the element to be added at the end of the array.
Finally, change the encoding property of the inset to INTSET_ENC_INT32 and the length property to 4.

The source code is as follows:

intset *intsetAdd(intset *is, int64_t value, uint8_t *success){<!-- -->
uint8_t valenc = _intsetValueEncoding(value);//Get the current value encoding
    uint32_t pos;//position to be inserted
if (success) *success = 1;
    //Judge whether the encoding exceeds the encoding of the current intset
    if (valenc > intrev32ifbe(is->encoding)){<!-- -->//Out of encoding, need to upgrade
    return intsetUpgradeAndAdd(is,value);
    } else {<!-- -->
    //Find the subscript pos of the element with the same value as value in the current intset
        if (intsetSearch(is,value, &pos)) {<!-- -->
          if (success)*success = O;//If found, no need to insert, end directly and return failure return is;
    }
    //Array expansion
    is = intsetResize(is,intrev32ifbe(is->length) + 1);
    //Move the element after pos in the array to pos + 1 to make room for the new element
    if (pos < intrev32ifbe(is->length)) intsetMoveTail(is, pos, pos + 1);
    }
    // insert new element
    _intsetSet(is,pos,value);/Reset element length
    is->length = intrev32ifbe(intrev32ifbe(is->length) + 1);
    return is;
}

static intset *intsetUpgradeAndAdd(intset *is, int64_t value){<!-- -->//Get the current intset encoding
    uint8_t curenc = intrev32ifbe(is->encoding);//Get new encoding
    uint8_t newenc = _intsetValueEncoding(value);//Get the number of elements
    int length = intrev32ifbe(is->length);
    //Judge whether the new element is greater than 0 or less than 0, less than 0 is inserted at the head of the queue, and greater than 0 is inserted at the end of the queue
    int prepend = value < 0 ? 1 : 0;
    //Reset encoding to new encoding
    is->encoding = intrev32ifbe(newenc);
    //reset array size
    is = intsetResize(is,intrev32ifbe(is->length) + 1);
    // Traversing in reverse order, moving elements to new positions one by one, _intsetGetEncoded finds old elements according to the old encoding method
    while(length--) // _intsetSet inserts new elements according to the new encoding method
    intsetSet(is,length + prepend, _intsetGetEncoded(is,length,curenc));
    //Insert a new element, prepend decides whether it is the head of the team or the tail of the team
    if (prepend)
    _intsetSet(is,0,value);
    else
    _intsetSet(is,intrev32ifbe(is->length),value);
    //Modify the length of the array
    is->length = intrev32ifbe(intrev32ifbe(is->length) + 1);
    return is;
}

Summary:

Intset can be regarded as a special integer array with some characteristics:

Redis will ensure that the elements in Intset are unique and ordered.
With a type upgrade mechanism, it can save memory space.
The bottom layer uses a binary search method to query.

2. Redis data structure-Dict

We know that Redis is a key-value (Key-Value Pair) database, and we can quickly add, delete, modify and query according to the key. The mapping relationship between key and value is realized through Dict.
Dict consists of three parts, namely: hash table (DictHashTable), hash node (DictEntry), dictionary (Dict).

typedef struct dictht {<!-- -->
    // entry array
//The pointer to the entry is saved in the array
    dictEntry **table;
// hash table size
unsigned long size;
//The mask of the size of the hash table, always equal to size - 1
    unsigned long sizemask;
// number of entries
unsigned long used;
} dict;


typedef struct dictEntry {<!-- -->
    void *key;//key
union {<!-- -->
void *val;
        uint64_t u64;
        int64_t s64;
        double d;
    } v;//value
//The pointer to the next Entry
    struct dictEntry *next;
} dictEntry;

When we add a key-value pair to Dict, Redis first calculates the hash value (h) based on the key, and then uses h & sizemask to calculate which index position the element should be stored in the array. We store k1=v1, assuming that the hash value of k1 is h=1, then 1 & amp;3=1, so k1=v1 should be stored at position 1 of the array.

Dict consists of three parts, namely: hash table (DictHashTable), hash node (DictEntry), dictionary (Dict).

typedef struct dict {<!-- -->
    dictType *type; // dict type, built-in different hash functions
    void *privdata;//private data, used when doing special hash operations
    dictht ht[2];//A Dict contains two hash tables, one of which is the current data, and the other is generally empty, used when rehash
    long rehashidx; // progress of rehash, -1 means not in progress
    int16_t pauserehash; // Whether rehash is paused, 1 means pause, 0 means continue
} dict;

typedef struct dictht {<!-- -->// entry array
    //The pointer to the entry is saved in the array
    dictEntry **table;
    // hash table size
    unsigned long size;
    //The mask of the size of the hash table, always equal to size - 1
    unsigned long sizemask;
    // number of entries
    unsigned long used;
}dict;

typedef struct dictEntry {<!-- -->
    void *key;//key
union {<!-- -->
void *val;
            uint64_t u64;
            int64_t s64;
            double d;
        } v;//value
//The pointer to the next Entry
    struct dictEntry *next;
}dictEntry;

typedef struct dict {<!-- -->
    dictType *type; // dict type, built-in different hash functions
    void *privdata;//private data, used when doing special hash operations
    dictht ht[2];//A Dict contains two hash tables, one of which is the current data, and the other is generally empty, used when rehash
    long rehashidx; // progress of rehash, -1 means not in progress
    int16_t pauserehash; // Whether rehash is paused, 1 means pause, O means continue
} dict;

Dict expansion

The HashTable in Dict is the realization of an array combined with a one-way linked list. When there are many elements in the set, it will inevitably lead to more hash conflicts. If the linked list is too long, the query efficiency will be greatly reduced.
Dict will check the load factor (LoadFactor = used/size) every time a new key-value pair is added, and the expansion of the hash table will be triggered when the following two conditions are met:
The LoadFactor of the hash table is >= 1, and the server does not execute background processes such as BGSAVE or BGREWRITEAOF;
LoadFactor > 5 for the hash table;

static int dictExpandlfNeeded(dict*d){<!-- -->
    //If rehash is in progress, return ok
    if (dictlsRehashing(d)) return DICT_OK;
    //If the hash table is empty, initialize the hash table to the default size: 4
    if (d->ht[0].size == 0) return dictExpand(d, DICT_HT_INITIAL_SIZE);
    //When the load factor (used/size) reaches 1 or more, and no sub-process operations such as bgrewrite are currently performed
    //or the load factor exceeds 5, then perform dictExpand, that is, expand
    if (d->ht[0].used >= d->ht[0].size & amp;i & amp;
    (dict_can_resize || d->ht[0].used/d->ht[0].size > dict_force_resize_ratio){<!-- -->
    //The expansion size is used + 1, the bottom layer will judge the expansion size, in fact, it is looking for the first 2^n greater than or equal to used + 1
        return dictExpand(d, d->ht[0].used + 1);
    }
    return DICT_OK;
}

// t_hash.c # hashTypeDeleted()
//...
if (dictDelete((dict*)o->ptr, field) == C_OK){<!-- -->
    deleted = 1;
// After the deletion is successful, check whether the Dict size needs to be reset, and call dictResize to reset if necessary
    /*Always check if the dictionary needs a resize after a delete.*/
if (htNeedsResize(o->ptr)) dictResize(o->ptr);
}
...

// server.c file
int htNeedsResize(dict *dict) {<!-- -->
    long long size, used;
    // hash table size
    size = dictSlots(dict);
    // number of entries
    used = dictSize(dict);
    // size > 4 (the initial size of the hash table) and the load factor is lower than 0.1
    return (size > DICT_HT_INITIAL_SIZE & amp; & amp; (used*100/size <HASHTABLE_MIN_FILL));
}
        
int dictResize(dict *d){<!-- -->
    unsigned long minimal;
    // return error if doing bgsave or bgrewriteof or rehash
    if (!dict_can_resize|l dictlsRehashing(d))
        return DICT_ERR;
    //Get used, which is the number of entries
    minimal = d->ht[0].used;//If used is less than 4, reset to 4
    if (minimal < DICT_HT_INITIAL_SIZE)
        minimal = DICT_HT_INITIAL_SIZE;
    //Reset the size to minimal, which is actually the first 2^n greater than or equal to minimal
    return dictExpand(d, minimal);
}

Dict rehash

Whether it is expansion or contraction, a new hash table must be created, resulting in changes in the size and sizemask of the hash table, and the query of the key is related to the sizemask. Therefore, the index must be recalculated for each key in the hash table and inserted into a new hash table. This process is called rehash. The process is like this:

Calculate the realeSize of the new hash table, the value depends on whether the current expansion or contraction is to be done:
- If it is expansion, the new size is the first 2^n greater than or equal to dict.ht[0].used + 1.
- If it is contraction, the new size is the first 2^n (not less than 4) that is greater than or equal to dict.ht[0].used.
Apply for memory space according to the new realeSize, create dicttht, and assign it to dict.ht[1].
Set dict.rehashidx = 0 to start rehash.
Rehash each dictEntry in dict.ht[0] to dict.ht[1].
Assign dict.ht[1] to dict.ht[0], initialize dict.ht[1] as an empty hash table, and release the memory of the original dict.ht[0].
Assign rehashidx to -1, which means rehash ends.
In the rehash process, new operations are directly written to ht[1], and query, modification and deletion are searched and executed in dict.ht[0] and dict.ht[1] in turn. This can ensure that the data of ht[0] will only decrease but not increase, and will eventually be empty with rehash.

The whole process can be described as:

Summary:

Dict structure:

Similar to java’s HashTable, the underlying layer is an array plus a linked list to resolve hash conflicts.
Dict contains two hash tables, ht[0] is usually used, and ht[1] is used for rehash.

Stretching of Dict:

When LoadFactor is greater than 5 or LoadFactor is greater than 1 and there are no child process tasks, Dict will be expanded.
Dict shrinks when LoadFactor is less than 0.1.
The expansion size is the first 2^n greater than or equal to used + 1.
The shrink size is the first 2^n greater than or equal to used.
Dict uses progressive rehash, and rehash is performed every time Dict is accessed.
During rehash, ht[0] only decreases but does not increase, new operations are only performed on ht[1], and other operations are performed on the two hash tables.

3. Redis data structure – ZipList

A ZipList is a special kind of “doubly linked list” consisting of a series of specially encoded contiguous blocks of memory. A push/pop operation can be done on either end, and the time complexity of this operation is O(1).

Attribute	Type	Length	purpose
zlbytes	uint32_t	4 bytes	Record the number of memory bytes occupied by the entire compressed list
zltail	uint32_t	4 bytes	Record the number of bytes between the end node of the compressed list and the start address of the compressed list. Through this offset, the address of the end node can be determined.
zllen	uint16_t	2 bytes	records the number of nodes contained in the compressed list. The maximum value is UINT16_MAX (65534). If it exceeds this value, it will be recorded as 65535 here, but the actual number of nodes needs to traverse the entire compressed list to be calculated.
entry	List nodes	Indeterminate	Compressed list contains each node, the length of the node is determined by The content saved by the node is determined.
zlend	uint8_t	1 byte	Special value 0xFF (255 decimal) for Mark the end of the compressed list.

ZipListEntry

The Entry in ZipList does not record the pointers of front and back nodes like ordinary linked lists, because recording two pointers takes up 16 bytes, wasting memory. Instead, the following structure is used:

previous_entry_length: The length of the previous node, 1 or 5 bytes.
- If the length of the previous node is less than 254 bytes, then use 1 byte to save the length value.
- If the length of the previous node is greater than 254 bytes, use 5 bytes to store the length value, the first byte is 0xfe, and the last four bytes are the real length data.
encoding: encoding attribute, record the data type of content (string or integer) and length, occupying 1, 2 or 5 bytes.
contents: responsible for saving the data of the node, which can be a string or an integer.

All values of storage length in ZipList adopt little-endian byte order, that is, the low-order byte comes first, and the high-order byte follows. For example: the value 0x1234, the actual storage value after adopting little-endian byte order: 0x3412.

Encoding

The encoding in ZipListEntry is divided into two types: string and integer:
String: If the encoding starts with “00”, “01” or “10”, it proves that the content is a string.

encoding	encoding length	string size< /strong>
\|00pppppp\|	1 bytes	<= 63 bytes
\|01pppppp\|qqqqqqqq\|	2 bytes	<= 16383 bytes
\|10000000\|qqqqqqqq\|rrrrrrrr\|ssssssss\|tttttttt\|	5 bytes	<= 4294967295 bytes

For example, we want to save the strings: “ab” and “bc”.

The encoding in ZipListEntry is divided into two types: string and integer:

Integer: If the encoding starts with “11”, it proves that the content is an integer, and the encoding only occupies 1 byte.

encoding	encoding length	integer type strong>
11000000	1	int16_t（2 bytes）
11010000	1	int32_t（4 bytes）
11100000	1	int64_t（8 bytes）
11110000	1	24-bit signed integer ( 3 bytes)
11111110	1	8-bit signed integer (1 bytes)
1111xxxx	1	Save the value directly in the xxxx position, the range is from 0001 to 1101, and the result after subtracting 1 is the actual value

Redis data structure – chain update problem of ZipList

Each Entry of ZipList contains previous_entry_length to record the size of the previous node, and the length is 1 or 5 bytes:

If the length of the previous node is less than 254 bytes, then use 1 byte to save the length value.
If the length of the previous node is greater than or equal to 254 bytes, then use 5 bytes to save the length value, the first byte is 0xfe, and the last four bytes are the real length data.

Now, suppose we have N consecutive entries with a length between 250 and 253 bytes, so the previous_entry_length attribute of the entry can be represented by 1 byte, as shown in the figure:

$[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture directly Upload (img-r3EpMeDL-1683891939936)(.\Principles.assets\1653986217182.png)]$

The continuous multiple space expansion operations generated in the special case of ZipList are called Cascade Update. New additions and deletions may lead to chain updates.

Summary:

ZipList Features:

The compressed list can be regarded as a “doubly linked list” of continuous memory space.
The nodes of the list are not connected by pointers, but are addressed by recording the length of the previous node and the current node, and the memory usage is low.
If there is too much data in the list, the linked list will be too long, which may affect the query performance.
Continuous update problems may occur when adding or deleting large data.