[SQL statement optimization]

Article directory

SQL statement optimization
- ①Insert data
- - ☆insert
  - ☆Insert data in bulk
- ②Primary key optimization
- - ☆Data organization
  - ☆page split
  - ☆page merge
  - ☆Primary key design principles
- ③**order by optimization**
- ④ group by optimization
- ⑤ limit optimization
- ⑥ count optimization
- ⑦update optimization

SQL statement optimization

①Insert data

☆insert

1) Insert data in batches
If you insert multiple pieces of data at one time, you should use one insert to implement it. If you want to insert in batches, it is not recommended to insert more than 1000 pieces of data at one time. If you want to insert tens of thousands of pieces of data, you can split them into multiple Insert statements to query.

2) Manually submit the transaction
Transactions in mysql are automatically submitted. Every time an Insert statement is executed, it will be submitted, and transactions will be opened and submitted frequently. At this time, transactions need to be submitted manually to optimize efficiency, for example:

start transaction;
insert into tb_test values(1,'Tom'),(2,'Cat'),(3,'Jerry');
insert into tb_test values(4,'Tom'),(5,'Cat'),(6,'Jerry');
insert into tb_test values(7,'Tom'),(8,'Cat'),(9,'Jerry');
commit;

3) Primary key order insertion
The performance of primary key sequential insertion is higher than that of out-of-order insertion. This depends on the data organization structure of mysql.

☆Insert data in bulk

If you need to insert a large amount of data (for example: several million records) at one time, the insertion performance of using the insert statement is low. At this time, you can use the load command provided by the MySQL database to insert. The operation is as follows:

You can execute the following commands to load the data in the data script file into the table structure:

-- When the client connects to the server, add the parameter --–local-infile
mysql --local-infile -u root -p

-- Set the global parameter local_infile to 1, and turn on the switch to import data from the local loading file
set global local_infile = 1;

-- Execute the load command to load the prepared data into the table structure
load data local infile '/root/sql1.log' into table tb_user fields terminated by ',' lines terminated by '\\
' ;

The above operations can be performed on the cmd command line.
Here I used to drag the sql file directly into datagrip, and it took 1min for 100w pieces of data.

②Primary key optimization

☆Data organization

In the InnoDB storage engine, table data is organized and stored according to the order of the primary key. The table in this storage mode is called index organized table (index organized table IOT). As shown below:

Row data is stored on the leaf nodes of the clustered index. The logical structure diagram of InnoDB is as follows:

In the InnoDB engine, data rows are recorded in the logical structure page, and the size of each page is fixed, the default is 16K. That also means that the rows stored in a page are also limited. If the inserted data row cannot be stored in the page, it will be stored in the next page, and the pages will be connected by pointers.

☆Page Split

Pages can be empty, half filled, or 100% filled. Each page contains 2-N rows of data (if a row of data is too large, it will overflow), arranged according to the primary key. (Why at least 2 rows of data are included: because if only one row of data is included, it is equivalent to just a linked list)

i. Primary key sequence insertion effect
①. Apply for pages from the disk, insert the primary key order

②. The first page is not full, continue to insert to the first page

③. When the first one is also full, then write the second page, and the pages will be connected by pointers

④. When the second page is full, write to the third page

ii. Primary key random insertion effect

①. Add 1#, 2# pages are full, and store the data as shown in the figure

②. At this time, inserting a record with an id of 50 will open a page again and write it into a new page?

Won’t. Because the leaf nodes of the index structure are in order. In order, it should be stored after 47:

But the 1# page where 47 is located is already full, and the data corresponding to 50 cannot be stored. Then a new page 3# will be created at this time.

But it will not directly store 50 into page 3#, but will move the second half of the data on page 1# to page 3#, and then insert 50 into page 3#.

After moving the data and inserting the data with id 50, then at this time, the data order among the three pages is problematic. The next page of 1# should be 3#, and the next page of 3# is 2#. Therefore, at this time, the linked list pointer needs to be reset.

The above-mentioned phenomenon is called “page splitting”, which is a relatively performance-intensive operation.

☆Page merge

The index structure (leaf nodes) of the existing data in the table is as follows:

When we delete existing data, the specific effect is as follows:
When a row is deleted, the record is not actually deleted physically, only the record is flagged for deletion and its space becomes available for use by other record claims.

When the deleted records on the page reach MERGE_THRESHOLD (the default is 50% of the page), InnoDB will start looking for the closest page (before or after) to see if the two pages can be merged to optimize space usage.

The phenomenon of merging pages that occurs in this is called “page merging”.

MERGE_THRESHOLD: The threshold for merging pages can be set by yourself and specified when creating a table or creating an index.

☆Primary key design principles

· When meeting business needs, try to reduce the length of the primary key. (If the length of the primary key is relatively long and there are many secondary indexes, it will take up a lot of disk storage space, and it will consume a lot of disk I/O when searching)
· When inserting data, try to choose sequential insertion, and choose to use the AUTO_INCREMENT auto-increment primary key. (Out-of-order insertion may cause page splitting)
· Try not to use UID as the primary key or other natural primary keys, such as ID number. (Because each generation is out of order, leading to the problem of out-of-order insertion of the primary key)
· During business operations, avoid modifying the primary key. (Modifying the primary key also needs to move the corresponding index structure, which is relatively expensive)

③order by optimization

MySQL sorting, there are two ways:

1) Using filesort: Through the table index or full table scan, read the data rows that meet the conditions, and then complete the sorting operation in the sort buffer sort buffer. All sorting that does not directly return the sorting results through the index is called FileSort sorting.

2) Using index: The ordered data is directly returned by sequential scanning of the ordered index. In this case, the using index is used, no additional sorting is required, and the operation efficiency is high.

For the above two sorting methods, the performance of using index is high, while the performance of using filesort is low. When optimizing the sorting operation, we should try to optimize it to using index.

Still in my test2 database, delete several indexes of the previous tb_user table:

drop index idx_user_phone on tb_user;
drop index idx_user_phone_name on tb_user;
drop index idx_user_name on tb_user;

Execute the sort SQL statement as follows:
explain select id,age,phone from tb_user order by age ; View execution plan:

Then execute explain select id,age,phone from tb_user order by age,phone ; to view the execution plan:

This is because there is no index for age and phone, so when sorting at this time, Using filesort appears, and the sorting performance is low.

Then execute create index idx_user_age_phone_aa on tb_user(age,phone); After creating the index, execute explain select id,age,phone from tb_user order by age; to view the execution plan:

It can be seen that after the index is established, the sorting query is performed again, and the original Using filesort is changed to Using index, and the performance is relatively high.

After creating the index, sort in descending order according to age, phone:
Execute the statement explain select id,age,phone from tb_user order by age desc , phone desc ; to view the execution plan:

Using index also appears, but at this time Backward index scan appears in Extra, which represents a reverse scan index, because in the index we created in MySQL, the leaf nodes of the default index are sorted from small to large, and at this time we query the sort When scanning, it is from large to small, so when scanning, it is a reverse scan, and a Backward index scan will appear. In the MySQL8 version, descending indexes are supported, and we can also create descending indexes.

Sort in ascending order according to phone, age, phone first, age last: explain select id,age,phone from tb_user order by phone , age;,

When sorting, the leftmost prefix rule also needs to be satisfied, otherwise filesort will also appear. Because when creating an index, age is the first field and phone is the second field, so when sorting, it should be in this order, otherwise Using filesort will appear.

Descending according to age, phone One ascending order, one descending order: explain select id,age,phone from tb_user order by age asc , phone desc ;

Because when creating an index, if the order is not specified, it is sorted in ascending order by default, and when querying, one ascending order and one descending order, then Using filesort will appear.

A here means ascending order and D means descending order.

Create a joint index (age ascending order, phone reverse order): create index idx_user_age_phone_ad on tb_user(age asc ,phone desc);, and then execute explain select id,age,phone from tb_user order by age asc , phone desc ; Check the execution plan:

Ascending/descending joint index structure illustration:

From the above test, you can use the order by optimization principle:
A. Establish a suitable index based on the sorting field. When sorting by multiple fields, it also follows the leftmost prefix rule.
B. Try to use covering indexes.
C. Multi-field sorting, one ascending and one descending. At this time, you need to pay attention to the rules (ASC/DESC) when creating the joint index.
D. If filesort is unavoidable, when sorting a large amount of data, you can appropriately increase the sort buffer size sort_buffer_size (default 256k).

④group by optimization

In the absence of an index, execute the following SQL to query the execution plan:
explain select profession , count(*) from tb_user group by profession ;

Temporary tables are used and the performance is very low.

Then, create a joint index for profession, age, status:
create index idx_user_pro_age_sta on tb_user(profession, age, status);

Then, execute the same SQL as before to view the execution plan:
explain select profession , count(*) from tb_user group by profession ;

If you group by age only, Using temporary will appear; but if you group by profession and age at the same time, Using temporary will not appear. The reason is because for the grouping operation, in the joint index, it also conforms to the leftmost prefix rule.

Therefore, in the grouping operation, we need to optimize the following two points to improve performance:

A. When grouping operations, you can use indexes to improve efficiency.

B. During grouping operations, the use of indexes also satisfies the leftmost prefix rule.

⑤limit optimization

When the amount of data is relatively large, if limit paging query is performed, the efficiency of paging query will be lower as the query goes further.
Execute the following three statements:

select * from tb_sku limit 0,10;
select * from tb_sku limit 10000,10;
select * from tb_sku limit 30000,10;

It is found that the execution time is getting longer and longer, and the efficiency of paging query is lower as it goes further, which is the problem of paging query.

Because, when performing pagination query, if limit 2000000,10 is executed, MySQL needs to sort the first 2000010 records at this time, only return 2000000 – 2000010 records, other records are discarded, and the cost of query sorting is very high.

Optimization idea: In general pagination query, the performance can be better improved by creating a covering index, which can be optimized by adding a subquery to the covering index.

⑥count optimization

Execute the statement select count(*) from tb_user ; and you will find that if the amount of data is large, it is very time-consuming to execute the count operation.

· The MyISAM engine stores the total number of rows in a table on the disk, so when executing count(*), it will directly return this number, which is very efficient; but if it is a conditional count, MyISAM is also slow.

· The InnoDB engine is troublesome. When it executes count(*), it needs to read the data from the engine line by line, and then accumulate the count.

If you want to greatly improve the count efficiency of the InnoDB table, the main optimization idea: count by yourself (you can use a database like redis to do it, but if it is a conditional count, it is more troublesome).

Usage of count: count() is an aggregation function. For the returned result set, it is judged line by line. If the parameter of the count function is not NULL, the cumulative value will be added by 1, otherwise it will not be added, and the cumulative value will be returned at last.

Usage: count(*), count(primary key), count(field), count(number)

In order of efficiency, count(field) < count(primary key id) < count(1) ≈ count( * ), so try to use count( * ).

⑦update optimization

We mainly need to pay attention to the precautions when the update statement is executed.

update course set name = 'javaEE' where id = 1 ;
When we execute the delete SQL statement, the data of the row with id 1 will be locked, and then the row lock will be released after the transaction is committed.

But when we execute the following SQL:
update course set name = 'SpringBoot' where name = 'PHP' ;

When we open multiple transactions and execute the above SQL, we find that row locks are upgraded to table locks. As a result, the performance of the update statement is greatly reduced.

Note: InnoDB’s row lock is a lock for the index, not a lock for the record, and the index cannot be invalidated, otherwise it will be upgraded from a row lock to a table lock.