Insertion and deletion operations of B+Tree in mysql

The query operation of B + Tree is being discussed on the Internet, but I have a question about how the insertion and deletion operations of B + Tree are performed.

Before discussing this issue, let us first talk about a concept, primary key auto-increment, why it is best to set the primary key auto-increment.

I wrote in this article about the difference between primary key auto-increment and UUID, and which one to use in which case. For details, you can read this article. Indexing principles and B + Tree, indexing knowledge that you can understand after reading it, is most suitable for people who are new to indexing – CSDN Blog

Having said that, the advantage of auto-incrementing the primary key is that when the leaf node, that is, the page is full, inserting data again will insert the newly created page.

Similar to the tail insertion of a linked list, there is no need to go to war in the front. A situation similar to the picture below (of course the picture below does not show the primary key being incremented automatically, it just shows its tail insertion).

Speaking of primary key auto-increment, what will happen to these records if they are not in order?

Introduction to inserting and deleting nodes in B + Tree (note that it is not B + Tree in InnoDB):

For example, in a five-order B + Tree, a node can only have four keywords at most. When a fifth node is inserted, it will be split. The left leaf node contains the first m/2 records, and the right node contains the remaining records. , carry the key of the m/2 + 1th record into the parent node (the parent node must be an index type node)

For example, fifth-order B+Tree:

Insert 2, 3, 4, and 5 in sequence.

Insert 6, because there can only be four keywords, so there will be a split:

Steps for page splitting:

(1) The left node contains the first 5/2=2 (division in the computer world) records

(2) The right node contains the remaining records

(3) Carry the key of the 5/2 + 1=3 record to the parent node, and the third record is 4

Thinking: What would happen if the bottom layer of mysql also follows this principle?

Answer: If a page can hold 2,000 pieces of data, and when it is full, the page will be split into 1,000 pieces each. If the primary key is incremented, then the previous page will never be able to insert data (the primary key will be incremented, and the inserted data will not be inserted). to the separated page), and so on, which is a waste of space.

In order not to waste space, the designer optimized InnoDB’s B + Tree insertion operation.

Optimization v1.0: The first version of optimization

When the page is full, it does not split, but creates a new page and inserts new records into the new page (I directly say that the node is the page).

It started with the picture above. Assume it is a 4th order B+Tree. Inserting 15 will cause page splitting, but after optimization, only a new page will be created to store records. As shown below.

Then insert 16 and put it after 15 until the page is full. OK, there is no problem, and the problem of wasted space is solved.

But if you insert 13, it will have this effect, which is still a waste of space.

Optimization v1.0: Optimized second version

experiment:
Create a table and insert data, with id as the primary key.

CREATE TABLE test.page_split_test4
(
  id BIGINT UNSIGNED NOT NULL,
  payload1 CHAR(255) NOT NULL,
  payload2 CHAR(255) NOT NULL,
  payload3 CHAR(255) NOT NULL,
  payload4 CHAR(255) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=INNODB;



INSERT INTO test.page_split_test4 (id, payload1, payload2, payload3, payload4)
VALUES
(1, REPEAT("A", 255), REPEAT("B", 255), REPEAT("C", 255), REPEAT("D", 255)),
(2, REPEAT("A", 255), REPEAT("B", 255), REPEAT("C", 255), REPEAT("D", 255)),
(3, REPEAT("A", 255), REPEAT("B", 255), REPEAT("C", 255), REPEAT("D", 255)),
(4, REPEAT("A", 255), REPEAT("B", 255), REPEAT("C", 255), REPEAT("D", 255)),
(5, REPEAT("A", 255), REPEAT("B", 255), REPEAT("C", 255), REPEAT("D", 255)),
(6, REPEAT("A", 255), REPEAT("B", 255), REPEAT("C", 255), REPEAT("D", 255)),
(7, REPEAT("A", 255), REPEAT("B", 255), REPEAT("C", 255), REPEAT("D", 255)),
(8, REPEAT("A", 255), REPEAT("B", 255), REPEAT("C", 255), REPEAT("D", 255)),
(9, REPEAT("A", 255), REPEAT("B", 255), REPEAT("C", 255), REPEAT("D", 255)),
(10, REPEAT("A", 255), REPEAT("B", 255), REPEAT("C", 255), REPEAT("D", 255)),
(11, REPEAT("A", 255), REPEAT("B", 255), REPEAT("C", 255), REPEAT("D", 255)),
(12, REPEAT("A", 255), REPEAT("B", 255), REPEAT("C", 255), REPEAT("D", 255)),
(13, REPEAT("A", 255), REPEAT("B", 255), REPEAT("C", 255), REPEAT("D", 255)),
(14, REPEAT("A", 255), REPEAT("B", 255), REPEAT("C", 255), REPEAT("D", 255))

Check the status of the underlying page.

SELECT page_number, page_type, number_records, data_size
FROM information_schema.innodb_buffer_page
WHERE table_name like "%page_split_test4%" AND index_name = "PRIMARY";

It was found that only one page was created, and there were 14 pieces of data in the page, which was the data inserted above (note that this page is also the root node)

INSERT INTO test.page_split_test4 (id, payload1, payload2, payload3, payload4)
VALUES (17, REPEAT("A", 255), REPEAT("B", 255), REPEAT("C", 255), REPEAT("D", 255));

A page split occurred on insertion 17. Split into:

Page 4 and page 5 contain 15 pieces of data. The splitting is performed according to the above splitting method, and then page 5 is filled.

INSERT INTO test.page_split_test4 (id, payload1, payload2, payload3, payload4)
VALUES
(20, REPEAT("A", 255), REPEAT("B", 255), REPEAT("C", 255), REPEAT("D", 255)),
(21, REPEAT("A", 255), REPEAT("B", 255), REPEAT("C", 255), REPEAT("D", 255)),
(22, REPEAT("A", 255), REPEAT("B", 255), REPEAT("C", 255), REPEAT("D", 255)),
(23, REPEAT("A", 255), REPEAT("B", 255), REPEAT("C", 255), REPEAT("D", 255)),
(24, REPEAT("A", 255), REPEAT("B", 255), REPEAT("C", 255), REPEAT("D", 255)),
(25, REPEAT("A", 255), REPEAT("B", 255), REPEAT("C", 255), REPEAT("D", 255));

After page 5 is full, continue to insert. At this time, page split occurs and is split according to v1.0:

INSERT INTO test.page_split_test4 (id, payload1, payload2, payload3, payload4)
VALUES (50, REPEAT("A", 255), REPEAT("B", 255), REPEAT("C", 255), REPEAT("D", 255));

Note that 50 is inserted, not in order, just to verify how the above v1.0 problem is solved

For verification, insert data less than 50 and find that no new page is created, but primary key values less than 50 are placed on page 6:

INSERT INTO test.page_split_test4 (id, payload1, payload2, payload3, payload4)
VALUES
(45, REPEAT("A", 255), REPEAT("B", 255), REPEAT("C", 255), REPEAT("D", 255)),
(30, REPEAT("A", 255), REPEAT("B", 255), REPEAT("C", 255), REPEAT("D", 255)),
(31, REPEAT("A", 255), REPEAT("B", 255), REPEAT("C", 255), REPEAT("D", 255)),
(32, REPEAT("A", 255), REPEAT("B", 255), REPEAT("C", 255), REPEAT("D", 255)),
(33, REPEAT("A", 255), REPEAT("B", 255), REPEAT("C", 255), REPEAT("D", 255)),
(34, REPEAT("A", 255), REPEAT("B", 255), REPEAT("C", 255), REPEAT("D", 255));

Look at the picture above and put the six primary key values less than 50 on page 6.

Question: How do the values of non-leaf nodes change in this way? You can see that page 3 still has 3 records. Now, has the primary key of the record pointing to page 6 from page 3 been changed to the smallest one (30) on page 6?

Answer: No

Explanation: When a data page is not full, but the data to be inserted is larger than the maximum data on the previous page, the data will be inserted at the beginning of the page, and the minimum data of the page will be updated at the same time. data item. The record of the directory entry page will not change, because the page number and the value of the minimum data item of this page are still the same. Only when the data page is full and a new data page needs to be split, will the record of the directory entry page be updated.

That is to say, when data is inserted at the beginning of the data page, the minimum value will indeed become the primary key of the initial data, but the directory item page will not immediately update its stored minimum primary key value, because this will bring Comes with a lot of extra expenses. Instead, the directory entry page will wait until the data page is filled and a split operation is performed before updating its stored minimum primary key value.

The directory entry page will wait until the data page is filled and a split operation is performed before updating its stored minimum primary key value. In the split operation, the data page will be divided into two new data pages, and the minimum primary key value of each page will be stored in the new directory entry page, thus avoiding the overhead of frequently updating the directory entry page.

This concludes mysql’s optimization of B + Tree insertion operations.

Is the deletion operation the opposite of the above, that is, page movement or the like?

Big mistake. Regarding the deletion of on-page records, I said in a previous article that the deleted records are only marked for deletion, but are actually still in the storage space. You can read my previous article to explain that these deleted records form a garbage linked list. “How MySQL Runs: Understanding MySQL from the Fundamentals” Reading Notes–5.InnoDB Data Page Structure-CSDN Blog

verify:

delete from test.page_split_test4 where id > 0;

We found that only the records of the root node of page 3 were deleted, while the page storing user records was still retained. As for why 2 of each page were deleted, we will discuss this later. We just have to remember that deleted data is not necessarily deleted from the disk.