004_Mysql index use and optimization

SQL performance analysis

profile analysis

-- Check whether the function is enabled
SELECT @@have_profiling;
SET profiling = 1;

-- View the time-consuming basic situation of each SQL
show profiles;
-- View the time consumption of each stage of the SQL statement with the specified query_id
show profile for query query_id;
-- View the CPU usage of the SQL statement with the specified query_id
show profile cpu for query query_id;

Explain

--Add the keyword explain / desc directly before the select statement
EXPLAIN SELECT field list FROM table name WHERE condition ;

The meaning of each field in the Explain execution plan

Using the index

1. Leftmost prefix rule
If multiple columns are indexed (joint index), the leftmost prefix rule must be followed. The leftmost prefix rule means that the query starts from the leftmost column of the index and does not skip columns in the index. If a column is skipped, the index will be partially invalidated (the subsequent field index will be invalidated).

Note: The leftmost column referred to in the leftmost prefix rule means that when querying, the leftmost field (that is, the first field) of the joint index must exist, which is the same as when we write SQL, the sequence of condition writing Order does not matter

2. Range query
In the joint index, a range query (>,<) appears, and the column index on the right side of the range query becomes invalid.

explain select * from tb_user where profession = 'Software Engineering' and age > 30 and status = '0';

& height=67 & amp; id=u7b096196 & amp; name=image.png & amp; originHeight=154 & amp; originWidth=1514 & amp; originalType=binary & amp; ratio=2 & amp; rotation=0 & amp; showTitle=false & amp;size=185895 &status=done &style=none &taskId=ud30353b7-bbed-4158-bf94-b6b67282201 &title= &width=655″ alt=” image.png”>

When the range query uses > or <, the joint index is used, but the length of the index is 49, which means that the status field on the right side of the range query is not indexed.

explain select * from tb_user where profession = 'Software Engineering' and age >= 30 and status = '0';

When the range query uses >= or <=, the joint index is used, but the length of the index is 54, which means that all fields are indexed.

**So, if the business allows, use range queries like >= or <= as much as possible, and avoid using > or < . **

3. Covering index
Try to use covering index and reduce select *. So what is a covering index? A covering index means that the query uses an index and all the columns that need to be returned can be found in the index.
Why use a covering index, which can largely avoid querying back to the table.

4. Prefix index
When the field type is a string (varchar, text, longtext, etc.), sometimes a very long string needs to be indexed, which will make the index large and waste a lot of disk IO during query, affecting query efficiency. At this time, only part of the prefix of the string can be indexed, which can greatly save the index space and improve the index efficiency

Index Design Principles

 - Create an index for a table with a large amount of data and frequent queries.
  - Build indexes for fields that are often used as query conditions (where), sorting (order by), and grouping (group by).
  - Try to choose columns with high discrimination as the index, and try to build a unique index. The higher the discrimination, the higher the efficiency of using the index.
  - If it is a field of string type, the length of the field is relatively long, and a prefix index can be established according to the characteristics of the field.
  - Use joint indexes as much as possible to reduce single-column indexes. When querying, the joint index can cover the index in many cases, saving storage space, avoiding returning to the table, and improving query efficiency.
  - To control the number of indexes, the more indexes the better, the more indexes, the greater the cost of maintaining the index structure, which will affect the efficiency of adding, deleting, and modifying.
  - If an indexed column cannot store NULL values, constrain it with NOT NULL when creating the table. When the optimizer knows whether each column contains NULL values, it can better determine which index is most efficient to use for queries.

Index invalidation

1. Index column operations
2. The string is not quoted. If the string is not quoted, it will have no effect on the query results, but there is an implicit type conversion in the database, and the index will be invalid.
3. For fuzzy query, in like fuzzy query, add % after the keyword, and the index will take effect. And if % is added in front of the keyword, the index will be invalid.
4. The impact of data distribution. When querying, MySQL will evaluate whether the index is faster or the full table scan is faster. If the full table scan is faster, the index is abandoned and the full table scan is performed.
5. Or conditional connection, when the condition of or connection, the left and right fields have indexes, the index will take effect

explain select * from tb_user where id = 10 or age = 23;
explain select * from tb_user where phone = '17799990017' or age = 23;

Since age has no index, even if id and phone have indexes, the index will be invalid. So it is necessary to build an index for age.

Optimization of the index

1. Primary key optimization
Index design principles:

 - When meeting business needs, try to reduce the length of the primary key.
        - When inserting data, try to choose sequential insertion, and choose to use AUTO_INCREMENT auto-increment primary key.
        - Try not to use UUID as the primary key or other natural primary keys, such as ID number.
        - During business operations, avoid modifying the primary key.

2. Order by optimization
There are two ways to sort MySQL:
Using filesort : Through the table index or full table scan, read the data rows that meet the conditions, and then complete the sorting operation in the sort buffer sort buffer, all not return the sorting result directly through the index The sorting of is called FileSort sorting.
**Using index**: The ordered data is returned directly through the ordered index sequential scan. This is the using index, no additional sorting is required, and the operation efficiency is high.
For the above two sorting methods, the performance of using index is high, while the performance of using filesort is low. When optimizing the sorting operation, we should try to optimize it to using index.

Special attention points:
1. After creating the index,
1) Sort in descending order according to age, phone

-- create index
create index idx_user_age_phone_aa on tb_user(age,phone);
explain select id,age,phone from tb_user order by age desc , phone desc ;

Using index also appears, but at this time Backward index scan appears in Extra, which represents a reverse scan index, because in the index we created in MySQL, the leaf nodes of the default index are sorted from small to large, and at this time we query the sort When scanning, it is from large to small, so when scanning, it is a reverse scan, and a Backward index scan will appear. In the MySQL8 version, descending indexes are supported, and we can also create descending indexes.

2) Sort in ascending order according to phone and age, with phone first and age last

explain select id,age,phone from tb_user order by phone , age;

When sorting, the leftmost prefix rule also needs to be satisfied, otherwise filesort will also appear. Because when creating an index, age is the first field and phone is the second field, so when sorting, it should be in this order, otherwise Using filesort will appear.

3) According to age, phone in descending order, one ascending order and one descending order
Because when creating an index, if the order is not specified, it is sorted in ascending order by default, and when querying, one ascending order and one descending order, then Using filesort will appear.
In order to solve the above problems, we can create an index in which age is sorted in ascending order and phone is sorted in reverse order

create index idx_user_age_phone_ad on tb_user(age asc ,phone desc);

**order by optimization principle: **
A. Establish a suitable index based on the sorting field. When sorting by multiple fields, it also follows the leftmost prefix rule.
B. Try to use covering indexes.
C. Multi-field sorting, one ascending and one descending. At this time, you need to pay attention to the rules (ASC/DESC) when creating the joint index.
D. If filesort is unavoidable, when sorting a large amount of data, you can appropriately increase the sort buffer size sort_buffer_size (default 256k).

3. group by optimization
In the grouping operation, we need to optimize the following two points to improve performance:
A. During grouping operations, indexes can be used to improve efficiency.
B. During grouping operations, the use of indexes also satisfies the leftmost prefix rule.

4. Limit optimization
Optimization idea: In general pagination query, the performance can be better improved by creating a covering index, which can be optimized by adding a subquery to the covering index.

explain select * from tb_sku t , (select id from tb_sku order by id limit 2000000,10) a where t.id = a.id;

5. count optimization
The MyISAM engine stores the total number of rows of a table on the disk, so when count() is executed, it will directly return this
The number is very efficient; but if it is a conditional count, MyISAM is also slow.
The InnoDB engine is troublesome. When it executes count(), it needs to read the data line by line from the engine.
Come on, and count up.
If you want to greatly improve the count efficiency of the InnoDB table, the main optimization idea: count by yourself (you can use a database like redis to do it, but if it is a conditional count, it is more troublesome).

Usage: count (*), count (primary key), count (field), count (number)

If sorted by efficiency, count(field) < count(primary key id) < count(1) ≈ count(), so try to use count().

6. Update optimization
InnoDB’s row lock is a lock for the index, not a lock for the record, and the index cannot be invalidated, otherwise it will be upgraded from a row lock to a table lock.