SkipList_DB source code analysis

SkipList_DB

Table of Contents

  • 1. Source code analysis
  • 2. Node node class
    • member attribute
    • member function
  • 3. SkipList jump list class
    • member attribute
    • member method
      • 1. Constructor and destructor
      • 2. get_random_value
      • 3. create_node
      • 4. Insert data insert_element
      • 5. Display data display_list
      • 6. Find data search_element
      • 7. Delete data delete_element
      • 8. Data dump dump_file
      • 9. Load data load_file
      • 10. Return data size size
      • 11. Data preprocessing operations

1. Source code analysis

core code file

SkipList.h // Implementation file of node class and skip list class

core function

insert_element // insert element
delete_element // delete element
search_element // find element

2. Node node class

Member attributes

// private member
K key // key
V value // value
// public members
Node<K, V> **forward; // point to the next node in a different layer in the linear table
int node_level; // which node level the current node is in

Member function

No-argument constructor and destructor

// no parameter construction
Node() {}
// destructor
~Node();

template<typename K, typename V>
Node<K, V>::~Node() {
    // Do some memory release operations
    delete []forward;
};

Constructor with parameters

Node(K k, V v, int);

template<typename K, typename V>
Node<K, V>::Node(const K k, const V v, int level) {
    this->key = k;
    this->value = v;
    this->node_level = level;

    // level + 1, the index is from 0 - level
    this->forward = new Node<K, V>*[level + 1];
    
  // Fill forward array with 0(NULL)
    memset(this->forward, 0, sizeof(Node<K, V>*)*(level + 1));
};

Question 1: Why is there a difference in the parameter list between function declaration and implementation?

When function is declared, the formal parameter may not have a parameter name, but when function is defined, the formal parameter must have a parameter name.

Available in both C and C++, but both are not recommended to omit

Question 2: Should the initialization operation of forward here be clearer?

// Create a memory space of level + 1 size, in which the address of Node<K,V> type is to be placed
this->forward = new Node<K,V>*[level + 1];
// The memset function initializes this memory
// The sizeof function first obtains the size of a Node<K,V>* type memory block, and then multiplies it by level + 1
memset(this->forward, 0, sizeof(Node<K, V>*)*(level + 1));

Get method of key and value

K get_key() const;
 
template<typename K, typename V>
K Node<K, V>::get_key() const {
    return key;
};

V get_value() const;
template<typename K, typename V>
V Node<K, V>::get_value() const {
    return value;
};

value set method

void set_value(V);

template<typename K, typename V>
void Node<K, V>::set_value(V value) {
    this->value=value;
};

3. SkipList jump list class

Member attributes

// all are private attributes
// The maximum number of layers of the skip table
int _max_level;
// The current level of the skip table
int _skip_list_level;
// The head node pointer of the jump list
Node<K, V> *_header;
// file operations
std::ofstream_file_writer;
std::ifstream_file_reader;
// Skip the current number of elements in the list
int _element_count;

Question 5: File operations, ofstream and ifstream

Member method

1. Constructor and destructor

Constructor

SkipList(int);

// Constructor
template<typename K, typename V>
SkipList<K, V>::SkipList(int max_level) {

    this->_max_level = max_level;
    this->_skip_list_level = 0;
    this->_element_count = 0;

    // create header node and initialize key and value to null
    K k;
    V v;
    this->_header = new Node<K, V>(k, v, _max_level);
};

Initialize the jump table, set information such as the maximum height and the number of current nodes, create a head node, the key value of the head node is empty, but the level is set to the maximum level (here because the virtual head node should exist in Each layer).

Destructor

~SkipList();

template<typename K, typename V>
SkipList<K, V>::~SkipList() {

    if (_file_writer. is_open()) {
        _file_writer. close();
    }
    if (_file_reader. is_open()) {
        _file_reader. close();
    }
    delete_header;
}

Perform some resource recovery operations, close file descriptors, and delete virtual head nodes.

2. get_random_value

Function implementation

template<typename K, typename V>
int SkipList<K, V>::get_random_level(){

    int k = 1;
    while (rand() % 2) {
        k++;
    }
    k = (k < _max_level) ? k : _max_level;
    return k;
};

The rand() function in C++ will return an arbitrary integer from 0 to the maximum random number.

In this way, the probability that k ++ can be executed in each cycle is 1/2, then this function can be realized

  • The probability of returning 1 is 1/2
  • The probability of returning 2 is 1/4
  • The probability of returning 3 is 1/8, and so on

3. create_node

Create a new node according to the incoming k,v,level

// create new node
template<typename K, typename V>
Node<K, V>* SkipList<K, V>::create_node(const K k, const V v, int level) {
    Node<K, V> *n = new Node<K, V>(k, v, level);
    return n;
}

4. Insert data insert_element

Return value

  • 1 indicates that the current node already exists
  • 0 indicates successful insertion

Process

  1. Initialize the update array, which is used to store the front node of the node to be inserted
  2. Starting from the highest level of the jump table, find the preceding node corresponding to the node to be inserted in each layer, and store it in the corresponding update[i]
  3. Returns 1 if there is a node equal to the value currently to be inserted
  4. Call get_random_level to get the level of the current node
  5. If random_level > current level, it means that the number of layers needs to be increased, and then the new node should be placed after the _header in the newly added number of layers, and the number of layers in the current skip table should be updated
  6. Create a new node and insert the current node after update[i] in each layer

template<typename K, typename V>
int SkipList<K, V>::insert_element(const K key, const V value) {
    
    mtx.lock(); // Mutex locks are required for operations on shared resources
    Node<K, V> *current = this->_header;

    // Create an update array and initialize it,
    // The update array represents the nodes that need to be updated in each layer during this insertion process
    Node<K, V> *update[_max_level + 1];
    memset(update, 0, sizeof(Node<K, V>*)*(_max_level + 1));

    // Operate from the highest level in the jump list
    for(int i = _skip_list_level; i >= 0; i--) {
        while(current->forward[i] != NULL & amp; & amp; current->forward[i]->get_key() < key) {
            current = current->forward[i];
        }
        update[i] = current;
    }

    // reached level 0 and forward pointer to right node, which is desired to insert key.
    // Reach level 0, the actual linked list level
    current = current->forward[0];

    // If there is a node equal to the current value, it means that it cannot be inserted, and returns 1
    if (current != NULL & amp; & amp; current->get_key() == key) {
        std::cout << "key: " << key << ", exists" << std::endl;
        mtx. unlock();
        return 1;
    }

    // If current == NULL, it means we have reached the end of this layer
    // if current's key is not equal to key means we need to insert data between update[0] and current node
    if (current == NULL || current->get_key() != key ) {
        
        // Randomly obtain the current node, which levels should be inserted
        int random_level = get_random_level();

        // If the randomly returned layer is greater than the current layer number of the jump table, then the number to be inserted should be placed after the _header in the layer larger than the current layer number
        if (random_level > _skip_list_level) {
            for (int i = _skip_list_level + 1; i < random_level + 1; i ++ ) {
                update[i] = _header;
            }
            _skip_list_level = random_level;
        }

        // Create a new node with key, value, random_level
        Node<K, V>* inserted_node = create_node(key, value, random_level);
        
        // Insert nodes at each level
        for (int i = 0; i <= random_level; i ++ ) {
            inserted_node->forward[i] = update[i]->forward[i];
            update[i]->forward[i] = inserted_node;
        }
        std::cout << "Successfully inserted key:" << key << ", value:" << value << std::endl;
        _element_count++;
    }
    mtx. unlock();
    return 0;
}

5. Display data display_list

Print formatted information, each layer of information has

// Display skip list
// traverse layer by layer, print output
template<typename K, typename V>
void SkipList<K, V>::display_list() {

    std::cout << "\
*****Skip List*****"<<"\
";
    // loop through each layer
    for (int i = 0; i <= _skip_list_level; i ++ ) {
        Node<K, V> *node = this->_header->forward[i];
        std::cout << "Level " << i << ": ";
        while (node != NULL) {
            std::cout << node->get_key() << ":" << node->get_value() << ";";
            node = node->forward[i];
        }
        std::cout << std::endl;
    }
}

6. Find data search_element

Process: Relatively simple, because corresponding search operations are also performed when adding and deleting

Finally, it is necessary to judge whether the corresponding key is found

Question: Why is the and operator symbol used here

Answer: and, or, not are keywords in c++, but not c, and, or, not keywords in c++ are & amp; & amp;, || ,! The spare tire, this is because some machines did not support ios646 encoding

template<typename K, typename V>
bool SkipList<K, V>::search_element(K key) {

    std::cout << "search_element-----------------" << std::endl;
    Node<K, V> *current = _header;

    // Start looking from the top level of the jump list
    for (int i = _skip_list_level; i >= 0; i--) {
        while (current->forward[i] & amp; & amp; current->forward[i]->get_key() < key) {
            current = current->forward[i];
        }
    }

    // When reaching level 0, take out current->forward[0]
    current = current->forward[0];

    // If it is not empty and the key is equal to the key we want, it is found
    if (current and current->get_key() == key) {
        std::cout << "Found key: " << key << ", value: " << current->get_value() << std::endl;
        return true;
    }

    std::cout << "Not Found Key:" << key << std::endl;
    return false;
}

7. Delete data delete_element

Process

  • Initialize the update array to store the front node of the node to be deleted
  • Find the node to be deleted, if not found in the code, no operation will be performed by default
  • Starting from level 0, if the post node of update[i] is equal to the node to be deleted, then the post node of update[i] points to the post node of the node to be deleted
  • Delete the layer without nodes, that is, the current number of layers is reduced
// Delete element from skip list
// delete from skip list
template<typename K, typename V>
void SkipList<K, V>::delete_element(K key) {

    mtx. lock();
    Node<K, V> *current = this->_header;
    // Here we still need to find the predecessor node of the node to be deleted
    Node<K, V> *update[_max_level + 1];
    memset(update, 0, sizeof(Node<K, V>*)*(_max_level + 1));

    // start from highest level of skip list
    for (int i = _skip_list_level; i >= 0; i--) {
        while (current->forward[i] !=NULL & amp; & amp; current->forward[i]->get_key() < key) {
            current = current->forward[i];
        }
        update[i] = current;
    }

    current = current->forward[0];
    if (current != NULL & amp; & amp; current->get_key() == key) {
       
        // Starting from level 0, deleting the point to be deleted in each layer is actually changing the relationship of pointers
        for (int i = 0; i <= _skip_list_level; i ++ ) {

            // If starting from level i, the successor of update[i] is not the node to be deleted, just jump out
            if (update[i]->forward[i] != current)
                break;

            update[i]->forward[i] = current->forward[i];
        }

        // Delete the layer without nodes, in fact, the current number of layers can be reduced
        // This can be compared with 0 because the memset function is all initialized to 0 during initialization
        while (_skip_list_level > 0 & amp; & amp; _header->forward[_skip_list_level] == 0) {
            _skip_list_level--;
        }

        std::cout << "Successfully deleted key "<< key << std::endl;
        _element_count --;
    }
    mtx. unlock();
    return;
}

8. Data dump_file

// Dump data in memory to file
template<typename K, typename V>
void SkipList<K, V>::dump_file() {

    std::cout << "dump_file-----------------" << std::endl;
    // open a file
    _file_writer.open(STORE_FILE);
    Node<K, V> *node = this->_header->forward[0];

    while (node != NULL) {
        // Write to the file while printing on the console, only operate level 0
        _file_writer << node->get_key() << ":" << node->get_value() << "\
";
        std::cout << node->get_key() << ":" << node->get_value() << ";\
";
        node = node->forward[0];
    }

    _file_writer. flush();
    _file_writer. close();
    return;
}

9. Load data load_file

// Load data from disk
// load data from disk file
template<typename K, typename V>
void SkipList<K, V>::load_file() {

    _file_reader.open(STORE_FILE);
    std::cout << "load_file-----------------" << std::endl;
    std::string line;
    std::string* key = new std::string();
    std::string* value = new std::string();
    while (getline(_file_reader, line)) {
        // Parse each row of data, each row of data is a key-value pair
        // The line here is the incoming parameter, key and value are the outgoing parameters
        get_key_value_from_string(line, key, value);
        if (key->empty() || value->empty()) {
            continue;
        }
        // insert each data
        insert_element(*key, *value);
        std::cout << "key:" << *key << "value:" << *value << std::endl;
    }
    _file_reader. close();
}

10. Return data size size

// Get current SkipList size
template<typename K, typename V>
int SkipList<K, V>::size() {
    return_element_count;
}

11. Data preprocessing operations

Separate key and value from string

template<typename K, typename V>
void SkipList<K, V>::get_key_value_from_string(const std::string & str, std::string* key, std::string* value) {

    if(!is_valid_string(str)) {
        return;
    }
    *key = str.substr(0, str.find(delimiter));
    *value = str.substr(str.find(delimiter) + 1, str.length());
}

Judging whether the string is legal

template<typename K, typename V>
bool SkipList<K, V>::is_valid_string(const std::string & amp; str) {

    if (str.empty()) {
        return false;
    }
    if (str. find(delimiter) == std::string::npos) {
        return false;
    }
    return true;
}

If the find() function in string does not find a matching character, it will return std::string::nops, std::string::npos represents a constant of type size_type, whose value is equal to the maximum value that type size_type can represent.