AntDB-M high-performance design of hash index dynamic rehash

AntDB-M supports hash index, btree index and other index types. The hash index is implemented in the form of hash table. A simple hash representation is shown in Figure 1. The element nodes under the hash bucket are one-way or two-way linked lists. One or several fields on the data row form an index. The value of the index field is calculated through the hash function and mapped to a certain hash bucket. The elements under the hash bucket The node stores the row number of the data row.

Figure 1: Schematic diagram of hash table principle

When using select * from table where a = value; to query, first calculate the hash value based on value, calculate which hash bucket it is in, then traverse the elements under the hash bucket, and retrieve column a of each row for storage based on the stored row number. The value is compared with value. If they are completely equal, it is the row you are looking for.

Taking the nodes under the hash bucket as a doubly linked list as an example, the element node structure under the bucket is generally as follows

struct node</code><code>{<!-- --></code><code> uint8 oid; // Data row number or other meaning of value</code><code> node * node_prev; // Previous node</code><code> node * node_next; // Next node</code><code>}

sizeof(node) total 8 + 8 + 8 =24 bytes;

For AntDB-M, memory is a very precious resource. Under the premise of implementing hash index and ensuring performance, the memory footprint must be as small as possible. For a certain table in the database, assuming that the table has a total of m rows and n hash indexes on the table, then the table has n sets of hash structures, and each set of hash structures has m bucket nodes. Take the above doubly linked list as an example. , the memory occupied by the hash index of this table is n* (the memory occupied by the hash index head node + m*24 bytes), and 24 bytes are sometimes even larger than the row of data itself in the data table.

AntDB-M hash index data structure optimization

In order to reduce the memory usage of index data, AntDB-M uses array elements to simulate linked list nodes and no longer allocates additional space to store the values of linked list nodes. Allocate all nodes at once to avoid frequent memory allocation and release.

struct array_node</code><code>{<!-- --></code><code>uint8 prev_oid; // The position of the previous node</code><code>uint8 next_oid; // Next The position of a node</code><code>}

sizeof(array_node) has a total of 8 + 8 =16 bytes. If oid is 0, it means that the previous and next nodes are empty, that is, there are no other nodes in front or behind this node. For a data table with n rows of data, apply to allocate array space array_node[n]. For an element array_node[k] in the array, k has two meanings:

Subscript in the array, used to access array_node[k].prev_oid and array_node[k].next_oid;
array_node[k] points to the k-th row in this table in the database, and you can access the content of the k-th row in this table. This avoids storing uint8 oid (data row number) and saves 1/3 of the memory space; if the node under the hash bucket is a one-way linked list, 2/3 of the memory is saved.

The following is an example:

Assuming that m rows are pre-allocated when a data table is initially created, the number of buckets in the corresponding hash structure is n (n is the smallest prime number greater than m when implemented), and n bucket node arrays corresponding to the hash index are pre-allocated. Element, i.e. uint8 bucket_head[n], bucket_head records which row the head node of each bucket points to (also represents which element points to the array array_node); allocates a continuous array array_node[m], corresponding to the node of the two-way link below the bucket element.

Assume that for a certain hash index, row 3, row 29, and row 36815 of the data table are all mapped to bucket 2, then the head node of bucket 2 points to the third row of data on the table, and also points to the third row of array_node elements.

bucket_head[2]=3;</code><code>array_node [3] ->prev_oid=0; </code><code>array_node [3] -> next_oid=29;</code><code>array_node [29] ->prev_oid=3; </code><code>array_node [29] ->next_oid=36815; </code><code>array_node [36815] ->prev_oid=29; </code><code>array_node [36815] ->next_oid=0;

In this way, the front and back traversal is also achieved (next_oid=0 means that this is the last valid element on the linked list). Compared with the previous method, the linked list is simulated through an array, the memory usage is reduced, and the memory of the array is allocated at once. , the memory is continuous and the access speed is fast.

As the business runs, the size of the data table may continue to expand. For example, a table is pre-allocated with 10 million rows when it is first created, and expands to 50 million rows after running for a period of time. If the number of buckets in the hash structure is still about 10 million, Then there are an average of 5 elements under each bucket, hash conflicts increase, and search efficiency decreases. At this time, we need to rehash the data, dynamically adjust the hash structure, and reduce hash conflicts without blocking additions, deletions, modifications, and searches of the hash table.

AntDB-M dynamic rehash principle

As shown in Figure 2, there is a bucket lock under each hash bucket. When reading, inserting, or deleting elements under the bucket, the bucket needs to be locked. migrate_node indicates the bucket currently being migrated. When migrating a specific bucket, a bucket lock is first added to the bucket. After all elements under the bucket are migrated, the bucket lock is released, and then migrate_node points to the next bucket.

1. Whenever the data rows of the table increase by a certain number, a new hash_table structure is created. At this time, the new and old sets of hash_table coexist;

2. Traverse the old hash_table, start from the first bucket, add bucket lock, traverse each element under the bucket, calculate its position on the new hash_table, and migrate and insert it into the new hash_table structure;

3. After the element migration of all buckets is completed, the old hash_table structure is released, and data addition, deletion, modification and query are completely switched to the new hash_table;

Figure 2: AntDB-M dynamic rehash diagram

Why the process of dynamic rehash does not block the addition, deletion, modification and query of the hash table? This article uses search and insertion as examples and explains it as follows in the form of a flow chart. The process of updating (decomposed into search + deletion + insertion) and deletion is also similar. Regardless of search, insertion, update, or deletion, you must first find which bucket the data is located in the old hash structure.

Figure 3: find during dynamic rehash

Figure 4: insert during dynamic rehash

Performance Advantages

The hash structure is dynamically expanded when the table is expanded, without blocking AntDB-M services and applications, and is transparent to users. AntDB-M internally reduces hash conflicts by increasing the number of hash buckets and migrating elements under buckets, so that hash index performance is not reduced due to the increase in data rows and data can be quickly located.

To sum up, the cleverly designed data structure of the hash index and the parallel algorithm of dynamic rehash enable AntDB-M’s hash index to have sustained high performance to meet the performance requirements of complex business applications.

About AntDB database

AntDB database was started in 2008. It is a distributed relational database brand independently developed by AsiaInfo Technology. AntDB-M is a high-performance memory database and is one of the sub-products of AntDB. On the core system of operators, it serves 24 countries nationwide. Online services are provided to more than 1 billion users in all provinces, with product features such as high performance, elastic expansion, and high reliability. The peak value can process one million core communication transactions per second, ensuring the continuous and stable operation of the system for nearly 15 years, and in communications, finance, etc. , transportation, energy, Internet of Things and other industries have been successfully commercialized.

The knowledge points of the article match the official knowledge files, and you can further learn related knowledge. Algorithm skill tree Home page Overview 57493 people are learning the system