SQL optimization 21 combo + mind map

Click on “Yao Dao Source Code” above and select “Set as Star”

Does she care about the front wave or the back wave?

A wave that can wave is a good wave!

Update articles every day at 10:33, lose a million bits of hair every day…

Source code boutique column

Original | Java 2021 Super God Road, very liver~
An open source project with detailed annotations in Chinese
RPC framework Dubbo source code analysis
Network application framework Netty source code analysis
Message middleware RocketMQ source code analysis
Database middleware Sharding-JDBC and MyCAT source code analysis
Job Scheduling Middleware Elastic-Job Source Code Analysis
Distributed transaction middleware TCC-Transaction source code analysis
Eureka and Hystrix source code analysis
Java Concurrency Source Code

Source: blog.csdn.net/guorui_java/

article/details/126542005

1. Try not to use select * in query SQL, but specific fields
2. Avoid using or to connect conditions in the where clause
3. Try to use numeric values instead of string types
4. Use varchar instead of char
Five, technical extension, the difference between char and varchar2?
6. Use the default value instead of null in where
7. Avoid using the != or <> operator in the where clause
Eight, inner join, left join, right join, use inner join first
Nine, improve the efficiency of the group by statement
10. Use truncate first when clearing the table
11. Operate the delete or update statement, add a limit or delete in batches in a loop
Twelve, UNION operator
Thirteen, batch insert performance improvement
Fourteen, the table connection should not be too many, the index should not be too many, generally within 5
15. Avoid using built-in functions on indexed columns
Sixteen, combined index
Seventeen, the leftmost feature of the composite index
18. Optimize the like statement
19. Use explain to analyze your SQL execution plan
20. Some other optimization methods

1. Try not to use select * to query SQL, but specific fields

1. Counterexample

SELECT * FROM user

2. Positive example

SELECT id,username,tel FROM user

3. Reason

Save resources and reduce network overhead.
Covering indexes may be used to reduce table returns and improve query efficiency.

Note: To save time, the following sample fields are replaced by *.

Background management system + user applet based on Spring Boot + MyBatis Plus + Vue & amp; Element, supports RBAC dynamic permissions, multi-tenancy, data permissions, workflow, three-party login, payment, SMS, mall and other functions

Project address: https://gitee.com/zhijiantianya/ruoyi-vue-pro

Video tutorial: https://doc.iocoder.cn/video/

2. Avoid using or in the where clause to connect conditions

1. Counterexample

SELECT * FROM user WHERE id=1 OR salary=5000

2. Positive example

(1) Use union all

SELECT * FROM user WHERE id=1
UNION ALL
SELECT * FROM user WHERE salary=5000

(2) Separate two sql writes

SELECT * FROM user WHERE id=1

SELECT * FROM user WHERE salary=5000

3. Reason

Using or may invalidate the index, thus full table scan;
For salary with no index in or, suppose it goes to the index of id, but goes to salary When querying conditions, it has to scan the entire table;
That is to say, the whole process requires three steps: full table scan + index scan + merge. If it scans the whole table from the beginning, it can be done by scanning directly;
Although mysql has an optimizer, due to efficiency and cost considerations, the index may still fail when encountering or conditions;

Background management system + user applet based on Spring Cloud Alibaba + Gateway + Nacos + RocketMQ + Vue & amp; Element, supporting RBAC dynamic permissions, multi-tenancy, data permissions, workflow, three-party login, payment, SMS, mall and other functions

Project address: https://gitee.com/zhijiantianya/yudao-cloud

Video tutorial: https://doc.iocoder.cn/video/

3. Try to use numeric values instead of string types

1. Positive example

Primary key (id): primary key prefers the numeric type int, tinyint
Gender (sex): 0 represents female, 1 represents male; the database does not have Boolean type, mysql recommends using tinyint

2. Reason

Because the engine will compare each character in the string one by one when processing queries and connections;
For the digital type, only one comparison is enough;
Characters degrade query and join performance and increase storage overhead;

4. Use varchar instead of char

1. Counterexample

`address` char(100) DEFAULT NULL COMMENT 'address'

2. Positive example

`address` varchar(100) DEFAULT NULL COMMENT 'address'

3. Reason

The varchar variable-length field is stored according to the actual length of the data content, and the storage space is small, which can save storage space;
char is stored according to the declared size, and spaces are filled if it is insufficient;
Secondly, for queries, searching in a relatively small field is more efficient;

5. Technical extension, the difference between char and varchar2?

1. The length of char is fixed, but the length of varchar2 can be changed.

For example, to store the string “101”, for char(10), it means that the characters you store will occupy 10 bytes (including 7 null characters), in the database In it, it is occupied by spaces, while the same varchar2(10) only occupies 3 bytes in length, 10 is only the maximum value, when the characters you store are less than 10, according to the actual length storage.

2. The efficiency of char is slightly higher than that of varchar2.

3. When to use char and when to use varchar2?

char and varchar2 are a pair of contradictory unity, the two are complementary, varchar2 saves more than char Space is slightly less efficient than char. If you want to gain efficiency, you must sacrifice a little space. This is what we often say in database design, “trading space for efficiency”.

Although varchar2 saves space than char, if a varchar2 column is frequently modified, and the length of the modified data is different each time, this will cause Cause “row migration” phenomenon, and this causes redundant I/O, which should be avoided in database design. In this case, it is better to use char instead of varchar2 Some. char will also automatically fill in spaces, because you insert to a char field will automatically fill in spaces, but select is not deleted, so you must remember to use trim when querying char, which is the reason for writing this article.

If the developer refines the use of the rpad() trick to convert the bind variable to a type that can be compared to the char field (of course, unlike the truncated trimdatabase columns is better practice to populate bind variables, because applying the function trim to a column can easily prevent the use of existing indexes on the column), may have to take into account after The change in column length over time. If the size of the field changes, the application will suffer because it must modify the field width.

It is precisely because of the above reasons that the fixed-width storage space may cause the table and related indexes to be much larger than usual, and it is also accompanied by the problem of bind variables, so the char type should be avoided no matter what the occasion.

6. Where use the default value instead of null

1. Counterexample

SELECT * FROM user WHERE age IS NOT NULL

2. Positive example

SELECT * FROM user WHERE age>0

3. Reason

It does not mean that if is null or is not null is used, the index will not be used. This is related to mysql version and query cost;
If the mysql optimizer finds that the cost of using the index is higher than that of not using the index, it will give up the index. These conditions !=, <>, is null, is not null Often considered to invalidate the index;
In fact, it is because under normal circumstances, the query cost is high, and the optimizer automatically abandons the index;
If you replace the null value with the default value, it is often possible to go to the index, and at the same time, the expression is relatively clear;

7. Avoid using the != or <> operator in the where clause

1. Counterexample

SELECT * FROM user WHERE salary!=5000

SELECT * FROM user WHERE salary<>5000

2. Reason

Using != and <><> is likely to invalidate the index
You should try to avoid using != or <><> operators in the where clause, otherwise the engine will give up using the index and perform a full table scan
To achieve business priority, there is really no way, so you can only use it, not that it cannot be used

8. inner join, left join, right join, use inner join first

If the results of the three connections are the same, use the inner join first, and if you use the left join, the left table should be as small as possible.

Inner join Inner connection, only keep the result sets that match exactly in the two tables;
left join will return all the rows of the left table, even if there is no matching record in the right table;
right join returns all rows from the right table, even if there are no matching records in the left table;

Why?

If the inner join is an equivalent connection, the number of returned rows is relatively small, so the performance will be relatively better;
If left join is used, the data result of the left table should be as small as possible, and the conditions should be placed on the left as much as possible, which means that the number of returned rows may be relatively small;
This is the mysql optimization principle, that is, small tables drive large tables, and small data sets drive large data sets, so that performance is better;

9. Improve the efficiency of the group by statement

1. Counterexample

Group first, then filter

select job, avg (salary) from employee
group by job
having job ='develop' or job = 'test';

2. Positive example

filter first, then group

select job，avg（salary） from employee
where job ='develop' or job ='test'
group by job;

3. Reason

Unnecessary records can be filtered out before the statement is executed

10. Use truncate first when clearing the table

truncate table is functionally identical to the delete statement without the where clause: both delete all rows in the table. But truncate table is faster than delete and uses less system and transaction log resources.

The delete statement deletes one row at a time and records an entry in the transaction log for each row deleted. truncate table deletes data by releasing the data pages used to store table data, and only records the page release in the transaction log.

truncate table deletes all the rows in the table, but the table structure and its columns, constraints, indexes, etc. remain unchanged. The count value used for new row identification is reset to the column’s seed. If you want to preserve the identity count value, use DELETE instead. If you want to drop a table definition and its data, use the drop table statement.

For a table referenced by a foreign key constraint, you cannot use truncate table. Instead, use the DELETE statement without the where clause. Since truncate table is not logged, it cannot activate the trigger.

truncate table cannot be used for tables that participate in indexed views.

Eleven, operate the delete or update statement, add a limit or delete in batches in a loop

1. Reduce the cost of writing wrong SQL

Empty table data is not a trivial matter, a hand shake is gone, delete the database and run away? If the limit is added, the wrong deletion will only lose part of the data, which can be quickly recovered through the binlog log.

2. SQL efficiency is likely to be higher

limit 1 is added to SQL, if the first item hits the target return, if there is no limit, it will continue to scan the table.

3. Avoid long transactions

When delete is executed, if age is indexed, MySQL will add write locks and gap locks to all related rows, and all execution-related rows will be locked. If it is large, it will directly affect the unusability of related services.

4. If the amount of data is large, it is easy to fill up the CPU

If you delete a large amount of data, if you do not add a limit to limit the number of records, it is easy to fill up the cpu, resulting in slower deletion.

5. Lock table

Deleting too much data at one time may cause table lock, and there will be errors of lock wait timeout exceed, so it is recommended to operate in batches.

Twelve, UNION operator

UNION will filter out duplicate records after table linking, so after table linking, it will sort the result set generated, delete duplicate records and return the result. In most applications, duplicate records will not be generated, and the most common one is UNION of process table and history table. like:

select username,tel from user
union
select departmentname from department

When this SQL is running, it first fetches the results of the two tables, then uses the sorting space to sort and delete duplicate records, and finally returns the result set. If the table has a large amount of data, the disk may be used for sorting. Recommended solution: use the UNION ALL operator instead of UNION, because the UNION ALL operation simply combines the two results and returns.

Thirteen, batch insert performance improvement

1. Multiple submissions

INSERT INTO user (id, username) VALUES(1,'Nezha Programming');

INSERT INTO user (id, username) VALUES(2,'Daji');

2. Batch submission

INSERT INTO user (id, username) VALUES(1,'Nezha Programming'),(2,'Daji');

3. Reason

By default, the newly added SQL has transaction control, which requires transaction opening and transaction submission for each entry, while batch processing is a transaction opening and submission, and the efficiency is significantly improved, reaching a certain level, and the effect is remarkable, which is usually invisible.

Fourteen, table connections should not be too many, index should not be too many, generally less than 5

1. There should not be too many table connections, generally within 5

The more associated tables, the greater the compilation time and overhead
A temporary table is generated in each associated memory
The connection table should be split into several smaller executions, which are more readable
If you must join many tables to get the data, it means that this is a bad design
In the Ali specification, it is recommended to join multiple tables to check three tables or less

2. There should not be too many indexes, generally within 5

The more indexes the better, although it improves the efficiency of queries, it will reduce the efficiency of inserting and updating;
An index can be understood as a table, which can store data, and its data will take up space;
The data in the index table is sorted, and sorting also takes time;
When insert or update, the index may be rebuilt. If the amount of data is huge, the reconstruction will reorder the records, so the index needs to be carefully considered, depending on the specific situation;
It is best not to have more than 5 indexes in a table. If there are too many, it is necessary to consider whether some indexes exist;

15. Avoid using built-in functions on index columns

1. Counterexample

SELECT * FROM user WHERE DATE_ADD(birthday, INTERVAL 7 DAY) >=NOW();

2. Positive example

SELECT * FROM user WHERE birthday >= DATE_ADD(NOW(),INTERVAL 7 DAY);

3. Reason

Using built-in functions on indexed columns, the index is invalidated.

Sixteen, composite index

When sorting, it should be sorted according to the order of the columns in the composite index, even if only one column in the index needs to be sorted, otherwise the sorting performance will be poor.

create index IDX_USERNAME_TEL on user(deptid,position,createtime);
select username,tel from user where deptid= 1 and position = 'java development' order by deptid,position,createtime desc;

In fact, only the records that meet the deptid= 1 and position = 'java development' condition are queried and sorted in descending order by createtime, but the performance of writing order by createtime desc is poor.

Seventeen, the leftmost feature of composite index

1. Create a composite index

ALTER TABLE employee ADD INDEX idx_name_salary (name, salary)

2. Satisfy the leftmost characteristic of the composite index, even if it is only part, the composite index will take effect

SELECT * FROM employee WHERE NAME='Nezha Programming'

3. If the field on the left does not appear, the leftmost property is not satisfied, and the index becomes invalid

SELECT * FROM employee WHERE salary=5000

4. Composite indexes are fully used, name and salary appear in the order on the left, and the index takes effect

SELECT * FROM employee WHERE NAME='Nezha Programming' AND salary=5000

5. Although it violates the leftmost feature, MySQL will optimize when executing SQL, and the bottom layer will perform reverse optimization

SELECT * FROM employee WHERE salary=5000 AND NAME='Nezha Programming'

6. Reason

A composite index is also called a joint index. When we create a joint index, such as (k1, k2, k3), it is equivalent to creating three indexes (k1), (k1, k2) and (k1, k2, k3) , which is the leftmost matching principle.

The joint index does not satisfy the leftmost principle, and the index will generally fail.

Eighteen, optimize the like statement

For fuzzy queries, programmers like to use like, but like is likely to invalidate your index.

1. Counterexample

select * from cities where name like '%Dalian' (do not use index)
select * from cities where name like '%Dalian%' (do not use index)

2. Positive example

select * from cities where name like 'Dalian%' (using index).

3. Reason

First of all, try to avoid fuzzy query. If you must use it, instead of using full fuzzy query, you should try to use right fuzzy query, that is, like …%’, which will use the index;
Left fuzzy like '%...' cannot directly use the index, but can use the form of reverse + function index to change to like '…%' code>;
Fully fuzzy query cannot be optimized, if you must use it, it is recommended to use a search engine.

Nineteen, use explain to analyze your SQL execution plan

1. type

system: The table has only one row, which is basically not used;
const: The table can match at most one row of data, and the primary key query triggers more;
eq_ref: For each combination of rows from the previous table, read a row from this table. This is probably the best join type, except for const types;
ref: For each combination of rows from the previous table, all rows with matching index values will be read from this table;
range: retrieves only the given range of rows, using an index to select rows. Range can be used when using =, <>, >, >=, <, <=, IS NULL, <=>, BETWEEN or IN operators to compare key columns with constants;
index: This join type is the same as ALL, except that only the index tree is scanned. This is usually faster than ALL, since index files are usually smaller than data files;
all: full table scan;
Performance ranking: system > const > eq_ref > ref > range > index > all.
In the actual sql optimization, the ref or range level is finally reached.

2. Extra common keywords

Using index: Only obtain information from the index tree without querying back to the table;
Using where: The WHERE clause is used to limit which row matches the next table or is sent to the client. Unless you specifically request or check all rows from the table, the query may have some errors if the Extra value is not Using where and the table join type is ALL or index. Need to return to the table query.
Using temporary: mysql often builds a temporary table to accommodate the results. Typical cases include when the query contains GROUP BY and ORDER BY clauses that can list columns according to different situations;

Twenty, some other optimization methods

1. When designing tables, add corresponding comments to all tables and fields.

2. SQL writing format, keyword size should be consistent, and indentation should be used.

3. Before modifying or deleting important data, make a backup first.

4. In many cases, it is a good choice to use exists instead of in

5. For the field after where, pay attention to the implicit conversion of its data type.

index not used

SELECT * FROM user WHERE NAME=110

(1) Because when the single quotation marks are not added, it is a comparison between a string and a number, and their types do not match;

(2) MySQL will do implicit type conversion, convert them to numeric types and compare them;

6. Try to define all columns as NOT NULL

The NOT NULL column is more space-saving, and the NULL column needs an extra byte as a flag to judge whether it is NULL. The NULL column needs to pay attention to the null pointer problem, and the NULL column needs to pay attention to the null pointer problem when calculating and comparing.

7. Pseudo-delete design

8. The character set of the database and the table should be uniformly use UTF8 as far as possible

(1) Can avoid garbled characters;

(2) It can avoid the index invalidation problem caused by the comparison and conversion of different character sets;

9. select count(*) from table;

Such a count without any conditions will cause a full table scan, and has no business meaning, so it must be eliminated.

10. Avoid performing expression operations on fields in where

(1) When parsing SQL, if the field is related to an expression, perform a full table scan;

(2) The field is clean and has no expressions, and the index takes effect;

11. About temporary tables

(1) Avoid frequent creation and deletion of temporary tables to reduce the consumption of system table resources;

(2) When creating a new temporary table, if you insert a large amount of data at one time, you can use select into instead of create table to avoid creating a large number of logs;

(3) If the amount of data is not large, in order to ease the resources of the system table, create table first, and then insert;

(4) If temporary tables are used, all temporary tables must be explicitly deleted at the end of the stored procedure. First truncate table, then drop table, so as to avoid long-term locking of system tables;

12. Indexes are not suitable for building on fields with a large amount of repeated data, such as gender, and indexes should be created for sorting fields

13. Fewer distinct filter fields for deduplication

Statements with distinct take up cpu time higher than statements without distinct
When querying many fields, if you use distinct, the database engine will compare the data and filter out duplicate data
However, this comparison and filtering process will occupy system resources, such as cpu time

14. Try to avoid large transaction operations and improve system concurrency

15. All tables must use the Innodb storage engine

Innodb “supports transactions, supports row-level locks, and better recovery”, and has better performance under high concurrency, so there are no special requirements (that is, those that Innodb cannot meet) Functions such as column storage, storage of spatial data, etc.), all tables must use the Innodb storage engine.

16. Try to avoid using cursors

Because the efficiency of the cursor is poor, if the data operated by the cursor exceeds 10,000 rows, then rewriting should be considered.

Welcome to join my knowledge planet, discuss architecture and exchange source code together. How to join, Long press the QR code below:

The source code has been updated on Knowledge Planet and the analysis is as follows:

The recently updated series “Introduction to Taro SpringBoot 2.X” has more than 101 articles, covering MyBatis, Redis, MongoDB, ES, sub-database and sub-table, read-write separation, SpringMVC, Webflux, permissions, WebSocket, Dubbo, RabbitMQ, RocketMQ , Kafka, performance testing, etc.

Provides a SpringBoot example with nearly 3W lines of code, and an e-commerce microservice project with more than 4W lines of code.

How to get it: Click “Looking“, follow the official account and reply to 666 to receive, more content will be provided one after another.

If the article is helpful, please read it and forward it.
Thank you for your support (*^__^*）