Ideas and steps of SQL optimization

Database optimization

Create an index: create a suitable index to improve query speed

Sub-database sub-table: When a table has more data or some fields of a table have more values and is used, use horizontal sub-table and vertical sub-table to optimize

Read-write separation (cluster): When a service cannot meet the needs, the read-write separation method is used for clustering

Cache: use redis for caching

SQL tuning steps

1. Locate those SQL statements with low execution efficiency through slow query logs, etc. – slow SQL

1. Modify the my.cnf configuration file and add slow query configuration

[mysqld]
# log output to table
log_output=table

#Start slow query log 1-enable, 0-disable
slow_query_log=1

#Slow query definition, usually the query time exceeds 1 second is considered slow query
long_query_time=1

2. Use slow query, query slow query log

select * from mysql.slow_log

# Slow check log
#Time: 150530 15:30:58··-- The query occurred at 2015·5·30 15:30:58# #User@Host: root[root] @ 1oalhost [127.0.0.1]··--Who is it? , on what host the query occurred
# Query_time: 1.134065 Lock-time: 0.000000 Rows_sent: 8 Rows_examined: 4000000
Query_time:--how much time the query took in total, Lock_time: the time to lock the table during the query, Rows_sent: how many rows data to return, Rows_examined: the result obtained after the table scanned 400w rows of data;

2.explain analyze SQL execution plan

After finding the slow SOL, use explain to analyze the SQL execution plan. You need to focus on type, rows, filtered, extra, and index issues to check whether the index is invalid and whether the index is reasonable.

type from top to bottom, the efficiency is getting higher and higher

ALL full table scan

index index full scan

range index range scan, common words =, between, in and other operations

ref uses a non-unique index scan or a unique index prefix scan to return a single record, which often appears in associated queries

eq_ref is similar to ref, the difference is that it uses a unique index and uses the associated query of the primary key

const/system For a single record, the system will treat other columns in the matching row as constants, such as primary key or unique index query

null MySOL does not access any tables or indexes, and returns results directly. Although the efficiency is getting higher and higher from top to bottom, according to the cost horizontal type, suppose there are two indexes idx1(a, b, c), idx2(a, c), SOL For “select * from t where a = 1 and b in (1, 2) order by c”; if idx1 is used, then the type is range; if idx2 is used, then the type is ref; when the number of rows to be scanned, use idx2 When it is about 5 times or more than idx1, idx1 will be used, otherwise idx2 Extra will be used

Using filesort: MySQL requires an extra pass to figure out how to retrieve the rows in sorted order. Sorting is done by going through all rows according to the join type and saving sort keys and row pointers for all rows matching the WHERE clause. The keys are then sorted and the rows are retrieved in sorted order.

Using temporary: Temporary tables are used to save intermediate results, the performance is particularly poor, and key optimization is required

Using index: Indicates that the corresponding select operation uses a covering index (Coveing index) to avoid accessing the data rows of the table, which is very efficient! If using where appears at the same time, it means that the eligible data cannot be queried directly through index search.

Using index condition: ICP is newly added after MySOL5.6. The using index condition uses ICP (index pushdown) to filter data at the storage engine layer instead of filtering at the service layer, and use the existing data of the index to reduce the return table The data.

3.show profile analysis

Understand the status of the thread executed by SQL and the time consumed. The default is off, open the statement set profiling =1

SHOW PROFILES ;
SHOW PROFILE FOR QUERY #{id};

4. Determine the problem and take corresponding measures

optimize index

Optimize SQL statement: Modify SQL, IN query segmentation, time query segmentation, filtering based on the last data

Use other implementation methods: ES, data warehouse, etc.

Data Fragmentation

What are the scenarios of SQL tuning in the project, and how to do database tuning?

1. Leftmost match

Index: KEY 'idx_shopid_orderno' ('shop_id','order_no')
sql: select * from _t where order_no=''

Query matching is matched from left to right. To use order_no to go to the index, the query condition must carry shop_id or index (shop_id, order_no) to switch the order before and after

2. Implicit conversion

Index: KEY 'idx_mobile' ('mobile')
sql: select * from _user where mobile=12345678901

Implicit conversion is equivalent to performing operations on the index, which will invalidate the index. mobile is a character type, using numbers, you should use string matching, otherwise MySQL will use implicit replacement, causing the index to fail

3. Big page

Index: KEY 'idx_a_b_c' ('a', 'b', 'c')
sql: select * from _t where a = 1 and b = 2 order by c desc limit 10000, 10;

For large pagination, you can give priority to product optimization requirements. If there is no optimization, there are two optimization methods as follows. One is to pass the last piece of data from the previous time, that is, the above c, and then do “c< xxx "Processing, but this generally requires changing the interface protocol, which is not necessarily feasible. The other is to use delayed association to reduce the SOL return table, but remember that the index needs to be completely covered to be effective. The SOL changes are as follows

select t1.* from _t t1, (select id from _t where a = 1 and b = 2 order by c desc limit 10000,10) t2 where t1.id = t2.id;

4.in + order by

Index: KEY 'idx_shopid_status_created' ('shop_id', 'order_status', 'created_at')
sql:select * from _order where shop_id = 1 and order_status in (1, 2, 3) order by created_at desc limit 10

The in query is searched in the form of n*m at the bottom of MySOL, which is similar to union, but more efficient than union. When the in query calculates the cost (cost = number of tuples * average value of IO), the number of tuples is obtained by querying the values contained in in one by one, so the calculation process will be relatively slow, so the MVSOL setting A critical value (eg_range_index_dive_limit) is set. After 5.6, if the critical value is exceeded, the cost of the column will not be calculated. Therefore, it will lead to inaccurate execution plan selection. The default is 200, that is, the in condition exceeds 200 data, which will cause problems in the calculation of the cost of in, and may cause the index selected by Mysql to be inaccurate.

Processing method, you can (order_status,created_at) exchange the order before and after, and adjust the SQL to delayed association

5. Range query is blocked, subsequent fields cannot be indexed

Index: KEY 'idx_shopid_created_status' ('shop_id', 'created_at', 'order_status')
sql:select * from _order where shop_id = 1 and created_at > '2021-01-01 00:00:00' and order_status = 10;

Range query also has “IN, between”

6. Does not equal to, does not include quick search that cannot use the index

select * from _order where shop_id=1 and order_status not in (1,2);
select * from _order where shop_id=1 and order_status != 1;

On indexes, avoid NOT, !=, <>, !, NOT EXISTS, NOT IN, NOT LIKE, etc.

7. When the optimizer chooses not to use the index

If the amount of data required to be accessed is small, the optimizer will still choose the auxiliary index, but when the accessed data accounts for a large part of the data in the entire table (generally about 20%), the optimizer will choose to search through the clustered index data.

select * from _order where order_status = 1;

Query all unpaid orders. Generally, there are very few such orders. Even if the index is built, the index cannot be used.

8. Complex query

select sum(amt) from _t where a = 1 and b in (1, 2, 3) and c > '2020-01-01';
select * from _t where a = 1 and b in (1, 2, 3) and c > '2020-01-01' limit 10;

If it is to count certain data, it may be solved by using a data warehouse; if it is such a complicated query in business, it may not be recommended to continue to use SOL, but to use other methods to solve it, such as using ES to solve it

9.asc and desc mixed

select * from _t where a=1 order by b desc, c asc;

When desc and asc are mixed, the index will be invalid