[MySQL] MySQL update data mechanism

MySQL update data mechanism

1. Problem description

Suppose we have a table like this and contains one record:

CREATE TABLE mytest (
id int(11) NOT NULL,
c1 int(11) DEFAULT NULL,
c2 int(11) DEFAULT NULL,
c3 int(11) DEFAULT NULL,
PRIMARY KEY (id),
KEY c1 (c1),
KEY c2 (c2)
Contains records:
±-±—–±—–±—– +
| id | c1 | c2 | c3 |
±-±—–±—–±—– +
| 1 | 11 | 12 | 13 |

This table actually contains 3 indexes:

Primary key index (and the value contains a block)

Index c1 (and the value contains a block)

Index c2 (and the value contains a block)

Then we consider the following statement:

A: update mytest set c1=11,c2=12,c3=13 where id=1 (the c1\c2\c3 fields are not changed)

B: update mytest set c1=11,c2=12,c3=14 where id=1 (the c1\c2 fields are not changed)

C: update mytest set c1=12,c2=12,c3=14 where id=1 (the c2 field does not change)

Then the question is as follows:

A. Whether the value of each index changes in scenario A, that is, whether the actual index block changes.

In scenario B, whether the data of index c1 and index c2 changes, that is, whether the actual blocks of index c1 and index c2 change.

In scenario C, whether the data of index c2 changes, that is, whether the block of actual index c2 changes.

2. Approximate half-step method and process

For the update statement, the modification process of the function mysql_update is roughly as follows:

Scan the data, obtain the data (rr_sequential), and store the data in mysql format into record[0]. Its representation is roughly as follows:

field1 | field2 | … | fieldN

Each field contains a pointer to the actual data.

Save the obtained data in mysql format to record[1], and then use the syntax parsed information to fill the obtained data in record[0] (fill_record_n_invoke_before_triggers->fill_record), here we use c1=,c2=,c3=* Filling data, the data and fields that need to be filled are actually stored in two lists, which are linked lists of type Item_feild and Item_int. We call them column_list and values_list here. They are expressed as follows in the bsion rule file:

 $$.column_list->push_back($1.column) ||
            $$.value_list->push_back($1.value))

Next, use the statement update mytest set c1=11,c2=12,c3=13 where id=1 to debug these two lists. We can put the breakpoint on fill_record_n_invoke_before_triggers.

(gdb) p fields
$67 = (List & amp;) @0x7fff30005da8: { = { = {}, first = 0x7fff300067f8, last = 0x7fff30006af8, elements = 3}, }
(gdb) p ((Item_field )(fields->first->info)).field_name
$68 = 0x7fff309316d4 “c1”
(gdb) p ((Item_field )(fields->first->next->info)).field_name
$69 = 0x7fff309316d7 “c2”
(gdb) p ((Item_field )(fields->first->next->next->info)).field_name
$70 = 0x7fff309316da “c3”
(gdb) p values
$73 = (List & amp;) @0x7fff30006e38: { = { = {}, first = 0x7fff30006808, last = 0x7fff30006b08, elements = 3}, }
(gdb) p ((Item_int
)(values->first->info)).value
$74 = 11
(gdb) p ((Item_int
)(values->first->next->info)).value
$75 = 12
(gdb) p ((Item_int
)(values->first->next->next->info)).value
$76 = 13

In this way, after modification, the value of the field that needs to be modified in record[0] becomes the value in this update statement.

Filter point 1, compare the data in record[0] and record[1] to see if there is any difference. If they are exactly the same, update will not be triggered. This corresponds to our scenario A, because the values of the previous and later records are exactly the same, so it will not be done. Any data changes are skipped directly *.

The data must be modified here, so compare the records of record[0] and record[1], and put the value and field number of the field that needs to be modified into the array m_prebuilt->upd_node->update (calc_row_difference), where The main comparison method between the new value that needs to be modified and the field_no that needs to be modified is:

Whether the length has changed (len)

The actual value has changed (memcmp comparison result)

Confirm whether the modified field contains a secondary index. Because the fields that need to be changed have been counted previously (the beginning of row_upd), the comparison method here is as follows:

If it is a delete statement, it will obviously contain all secondary indexes.

If it is an update statement, compare it based on the number of the field in the previous array and whether the fields in the dictionary are sorted, because the fields in the secondary index must be sorted. If neither condition is met,

If both conditions are not met, it means that no secondary index needs to be modified in this modification. Set the mark of this update to UPD_NODE_NO_ORD_CHANGE. UPD_NODE_NO_ORD_CHANGE means that no secondary index fields need to be modified. Note that this will also be converted to the innodb row format (row_mysql_store_col_in_innobase_format).

Filter point 2, first modify the primary key. If it is UPD_NODE_NO_ORD_CHANGE update, the secondary index will not be changed, that is, the row_upd_sec_step function will not be called. This is obvious, because no secondary index fields need to be changed (implemented in the function row_upd_clust_step), which corresponds here In scenario B, although the c3 field has modified the data, the values before and after the c1\c2 fields are the same, so the actual index c1 and index c2 will not change, only the primary key index will be modified.

If you need to change the secondary index, scan each secondary index in the dictionary in turn and start the cycle.

For filtering point 3, it is first necessary to confirm whether the modified secondary index field is in this index. If the modified field is not in this secondary index at all, obviously there is no need to modify the index of this cycle. This judgment is made in the function row_upd_changes_ord_field_binary by looping through each field of the secondary index in the dictionary.

If this field is not in the m_prebuilt->upd_node->update array, proceed directly to the next field, indicating that this field does not need to be modified.

If this field is in the m_prebuilt->upd_node->update array, call the function dfield_datas_are_binary_equal for comparison, that is, compare whether the actual value has changed.

This actually corresponds to our scenario 3, because the value of the c2 field has not changed, so the index c2 will not make actual changes, but the primary key index and index c1 need to change the value.

3. Conclusion

We can see from the code that in fact, in MySQL or innodb, only the indexes with data modifications will actually be changed. So the scenarios mentioned earlier are as follows:

A: update mytest set c1=11,c2=12,c3=13 where id=1 (the c1\c2\c3 fields are not changed) without any data modification.

B: update mytest set c1=11,c2=12,c3=14 where id=1 (c1\c2 field does not change) only changes the primary key index

C: update mytest set c1=12,c2=12,c3=14 where id=1 (the c2 field is not changed) only changes the primary key index and index c1

4. Verification

For verification, we verify scenario 3. Here we mainly verify through the last_modify_lsn of the block, because as long as a block modifies the data, its last_modify_lsn will definitely be modified after the dirty data is flushed. The steps are as follows:

Initialize data. The mytest table here is a test table, and the main function of the mytest2 table is to modify data and advance lsn.

CREATE TABLE mytest (
id int(11) NOT NULL,
c1 int(11) DEFAULT NULL,
c2 int(11) DEFAULT NULL,
c3 int(11) DEFAULT NULL,
PRIMARY KEY (id),
KEY c1 (c1),
KEY c2 (c2)
) ENGINE=InnoDB;
insert into mytest values(1,11,12,13);
insert into mytest values(2,14,15,16);
insert into mytest values(3,17,18,19);
insert into mytest values(4,20,21,22);
insert into mytest values(5,23,24,25);
insert into mytest values(6,26,27,28);
insert into mytest values(7,29,30,31);
insert into mytest values(8,32,33,34);
insert into mytest values(9,35,36,37);
insert into mytest values(10,38,39,40);
CREATE TABLE mytest2 (
id int(11) NOT NULL AUTO_INCREMENT,
c1 int(11) DEFAULT NULL,
PRIMARY KEY (id)
) ENGINE=InnoDB;

INSERT INTO mytest2(c1) values(1);

Record the current lsn

Because it is a test library, the lsn of the show engine is static as follows: Log sequence number 4806780238 Log flushed up to 4806780238 Pages flushed up to 4806780238 and Modified db pages 0 has no dirty pages

It all means that all the dirty data has been wiped off the disk.

Query the corresponding block of each index

mysql> select *from information_schema.INNODB_SYS_TABLES where NAME like testnew/mytest%’;
±———±—————-±—–±——-±——±- ———-±———-±————–±———- +
| TABLE_ID | NAME | FLAG | N_COLS | SPACE | FILE_FORMAT | ROW_FORMAT | ZIP_PAGE_SIZE | SPACE_TYPE |
±———±—————-±—–±——-±——±- ———-±———-±————–±———- +
| 19071 | testnew/mytest | 33 | 7 | 10854 | Barracuda | Dynamic | 0 | Single |
| 19072 | testnew/mytest2 | 33 | 5 | 10855 | Barracuda | Dynamic | 0 | Single |
±———±—————-±—–±——-±——±- ———-±———-±————–±———- +
2 rows in set (0.00 sec)

mysql> select * from information_schema.INNODB_SYS_INDEXES where space=10854;
±———±——–±———±—–±———±—- —-±——±—————- +
| INDEX_ID | NAME | TABLE_ID | TYPE | N_FIELDS | PAGE_NO | SPACE | MERGE_THRESHOLD |
±———±——–±———±—–±———±—- —-±——±—————- +
| 10957 | PRIMARY | 19071 | 3 | 1 | 3 | 10854 | 50 |
| 10958 | c1 | 19071 | 0 | 1 | 4 | 10854 | 50 |
| 10959 | c2 | 19071 | 0 | 1 | 5 | 10854 | 50 |
±———±——–±———±—–±———±—- —-±——±—————- +
3 rows in set (0.01 sec)

Here we find the INDEX_ID 10957 primary key, 10958 c1 index, and 10959 c2 index.

./innblock mytest.ibd scan 16
===INDEX_ID:10957
level0 total block is (1)
block_no: 3,level: 0||
===INDEX_ID:10958
level0 total block is (1)
block_no: 4,level: 0|
|
===INDEX_ID:10959
level0 total block is (1)
block_no: 5,level: 0|*|

Here we find that the block of 10957 is 3, the block of 10958 is 4, and the block of 10959 is 5. Let’s get their information respectively.

Use the blockinfo tool to view the lsn of each block of the current mytest

10957 PRIMARY block 3

./innblock mytest.ibd 3 16

==== Block base info ====
block_no:3 space_id:10854 index_id:10957

last_modify_lsn:4806771220 (note here)
page_type:B + _TREE level:0

10958 c1 block 4

./innblock mytest.ibd 4 16

==== Block base info ====
block_no:4 space_id:10854 index_id:10958

last_modify_lsn:4806771252 (note here)

10959 c2 block 5

./innblock mytest.ibd 5 16

==== Block base info ====
block_no:5 space_id:10854 index_id:10959

last_modify_lsn:4806771284 (note here)

Here we get the last_modify_lsn of three pages, which is around 4806771200.

Make some data modifications to mytest2 table to advance lsn

INSERT INTO mytest2(c1) select c1 from mytest2;
INSERT INTO mytest2(c1) select c1 from mytest2;

INSERT INTO mytest2(c1) select c1 from mytest2;
Query OK, 32768 rows affected (13.27 sec)
Records: 32768 Duplicates: 0 Warnings: 0
mysql> select count() from mytest2;
±——— +
| count(
) |
±——— +
| 65536 |
±——— +
1 row in set (1.46 sec)

Check the system lsn again

Log sequence number 4867604378
Log flushed up to 4867604378
Pages flushed up to 4867604378
Modified db pages 0

At this time, the lsn has changed, but the dirty data has been erased.

Modify the mytest table

Modify this line of record id c1 c2 c3 2 14 15 16

update t1 set c1=14,c2=115,c3=116 where id=2;