Why does MySQL use B+ trees instead of skip tables?

Article directory

B + tree or skip list
B + tree simple code
Jump table simple code

B + tree or skip table

MySQL’s InnoDB storage engine uses B+ trees instead of skip tables because B+ trees have some advantages in relational database systems, especially in handlingrange queries, transaction processing, and data persistence. The following explains in detail the underlying principles of B + trees and skip lists and their respective advantages and disadvantages:

B + tree (B-Tree):
- Principle: A B+ tree is a balanced tree structure that has a root node, internal nodes, and leaf nodes. Each node contains a certain number of key-value pairs, and the key-value pairs are arranged in order according to the size of the key value. Internal nodes contain only keys, leaf nodes contain both keys and pointers to data.
- advantage:
  - High efficiency of range query: B+ tree supports range query, because in B+ tree, adjacent leaf nodes are ordered, so it is very efficient when searching for data within the range.
  - Transaction support: B+ tree is a multi-version concurrency control (MVCC)-friendly data structure, suitable for transaction processing scenarios, and can guarantee the ACID properties of transactions.
  - Data persistence: The leaf nodes of the B+ tree contain all data, which means that the data is easily persisted to disk, supporting high reliability and data recovery.
- shortcoming:
  - High overhead for insertion and deletion: Due to the balanced nature of the B+ tree, insertion and deletion operations may require the splitting and merging of nodes, which will result in large performance overhead.
  - Highly unstable: The height of the B+ tree is usually relatively large. It may require multiple disk I/Os to access the leaf nodes. It may not be efficient for certain specific queries.
Skip List:
- Principle: Jump list is a multi-level data structure. Each level is an ordered linked list. The bottom level contains all data, and the data contained in the upper level is a subset of the lower level. The target data can be quickly located through jump nodes.
- advantage:
  - The average lookup time is low: The query time complexity of the skip table is O(log n), which is similar to the balanced tree structure, but is simpler to implement.
  - Insertion and deletion operations are relatively fast: Since the jump table does not require frequent balancing adjustments of nodes, the performance of insertion and deletion operations is better.
- shortcoming:
  - Not suitable for range queries: The structure of the skip table is not suitable for efficient processing of range queries and requires linear scanning.
  - Difficulty in achieving Transaction and data persistence: The update operation of the skip table may involve multiple levels, and achieving transaction and data persistence requires more complexity.
  - Large space overhead: Jump tables require additional pointers to connect different levels and occupy more memory space.

All things considered, B + tree is more suitable for the needs of relational database systems, especially in handling range queries, transaction processing and data persistence. **Although skip tables perform well in certain scenarios, they are not suitable as the core data structure of database storage engines. Database systems need to ensure a high degree of data consistency, reliability and efficient transaction processing, and B+ trees can better meet these requirements.

B + tree simple code

Implementing a simple B+ tree data structure in Java can be a complex task as operations such as insertion, deletion, lookup, etc. need to be handled and the balanced nature of the B+ tree needs to be maintained. The following is an example of a simple B + tree data structure, including common methods, but please note that this is just a basic example, and the actual implementation will be more complex:

import java.util.ArrayList;
import java.util.List;

class BPlusTree {<!-- -->
    private Node root;
    private int degree; // B + degree of tree

    public BPlusTree(int degree) {<!-- -->
        this.root = new LeafNode();
        this.degree = degree;
    }

    // Find operation
    public String search(int key) {<!-- -->
        return root.search(key);
    }

    //Insert operation
    public void insert(int key, String value) {<!-- -->
        root = root.insert(key, value);
    }

    // Delete operation
    public void delete(int key) {<!-- -->
        root.delete(key);
    }

    // internal node
    private abstract class Node {<!-- -->
        List<Integer> keys;

        abstract String search(int key);

        abstract Node insert(int key, String value);

        abstract void delete(int key);
    }

    //leaf node
    private class LeafNode extends Node {<!-- -->
        List<String> values;
        LeafNode next;

        LeafNode() {<!-- -->
            keys = new ArrayList<>();
            values = new ArrayList<>();
            next = null;
        }

        @Override
        String search(int key) {<!-- -->
            int index = keys.indexOf(key);
            if (index != -1) {<!-- -->
                return values.get(index);
            } else {<!-- -->
                return null;
            }
        }

        @Override
        Node insert(int key, String value) {<!-- -->
            int index = 0;
            while (index < keys.size() & amp; & amp; keys.get(index) < key) {<!-- -->
                index + + ;
            }
            keys.add(index, key);
            values.add(index, value);

            if (keys.size() > degree) {<!-- -->
                return splitLeaf();
            } else {<!-- -->
                return this;
            }
        }

        @Override
        void delete(int key) {<!-- -->
            int index = keys.indexOf(key);
            if (index != -1) {<!-- -->
                keys.remove(index);
                values.remove(index);
            }
        }

        private Node splitLeaf() {<!-- -->
            LeafNode newLeaf = new LeafNode();
            int mid = keys.size() / 2;

            newLeaf.keys.addAll(keys.subList(mid, keys.size()));
            newLeaf.values.addAll(values.subList(mid, values.size()));

            keys.subList(mid, keys.size()).clear();
            values.subList(mid, values.size()).clear();

            newLeaf.next = this.next;
            this.next = newLeaf;

            return newLeaf;
        }
    }
}

This is just a simple B+ tree implementation example, and the actual implementation may require more details and optimizations, especially in terms of node splitting, merging, and balancing. Note that a B+ tree is a complex data structure, and database systems actually used in production environments typically have more complex and efficient implementations. This example is only used to demonstrate the basic concepts and structure of B+ trees.

Simple code for skipping tables

Skip List is an ordered data structure, similar to a balanced tree (such as a red-black tree), used to implement fast insertion, deletion and search operations. Compared to balanced trees, skip lists are simpler to implement and have similar performance in some cases.

The following is an example of a simple Java implementation of a skip table, including some common methods:

import java.util.Random;

public class SkipList {<!-- -->
    private static final int MAX_LEVEL = 32;
    private Node head;
    private int level;

    public SkipList() {<!-- -->
        this.head = new Node(Integer.MIN_VALUE);
        this.level = 1;
    }

    public boolean search(int target) {<!-- -->
        Node current = head;

        for (int i = level - 1; i >= 0; i--) {<!-- -->
            while (current.next[i] != null & amp; & amp; current.next[i].value < target) {<!-- -->
                current = current.next[i];
            }

            if (current.next[i] != null & amp; & amp; current.next[i].value == target) {<!-- -->
                return true;
            }
        }

        return false;
    }

    public void insert(int num) {<!-- -->
        int newLevel = randomLevel();
        Node newNode = new Node(num, newLevel);

        Node current = head;

        for (int i = level - 1; i >= 0; i--) {<!-- -->
            while (current.next[i] != null & amp; & amp; current.next[i].value < num) {<!-- -->
                current = current.next[i];
            }

            if (i < newLevel) {<!-- -->
                newNode.next[i] = current.next[i];
                current.next[i] = newNode;
            }
        }

        if (newLevel > level) {<!-- -->
            for (int i = level; i < newLevel; i + + ) {<!-- -->
                head.next[i] = newNode;
            }
            level = newLevel;
        }
    }

    public boolean erase(int num) {<!-- -->
        boolean deleted = false;
        Node current = head;

        for (int i = level - 1; i >= 0; i--) {<!-- -->
            while (current.next[i] != null & amp; & amp; current.next[i].value < num) {<!-- -->
                current = current.next[i];
            }

            if (current.next[i] != null & amp; & amp; current.next[i].value == num) {<!-- -->
                current.next[i] = current.next[i].next[i];
                deleted = true;
            }
        }

        return deleted;
    }

    private int randomLevel() {<!-- -->
        int level = 1;
        Random rand = new Random();

        while (rand.nextInt(2) == 1 & amp; & amp; level < MAX_LEVEL) {<!-- -->
            level + + ;
        }

        return level;
    }

    private class Node {<!-- -->
        int value;
        Node[] next;

        Node(int value) {<!-- -->
            this.value = value;
            this.next = new Node[MAX_LEVEL];
        }

        Node(int value, int level) {<!-- -->
            this.value = value;
            this.next = new Node[level];
        }
    }
}

This is a simple skip table implementation example, including basic operations such as search, insertion and deletion. The core idea of the skip table is to achieve fast search through multi-layer indexes. Each layer is an ordered linked list. The randomLevel() method in this example is used to generate a random number of levels to control the balance of the jump list.

Please note that the actual skip list implementation may contain more details and optimizations, especially the need to maintain the balance of the skip list during insertion and deletion operations. This example is only used to demonstrate the basic concepts and structure of skip tables.