8 special “pit” SQL writing methods, the performance is reduced by 100 times

Let me share with you some common SQL “bad problems” and optimization techniques.

The execution sequence of SQL statements:

1. LIMIT statement

Pagination query is one of the most commonly used scenarios, but it is also usually the place where problems are most likely to occur. For example, for the following simple statement, the general DBA thinks of adding a composite index to the type, name, and create_time fields. In this way, conditional sorting can effectively use the index, and the performance is rapidly improved.

SELECT *</code><code>FROM operation</code><code>WHERE type = 'SQLStats'</code><code> AND name = 'SlowLog'</code><code>ORDER BY create_time </code><code>LIMIT 1000, 10;

Well, maybe more than 90% of DBAs solve this problem and stop here. But when the LIMIT clause becomes LIMIT 1000000,10, the programmer will still complain: I only fetch 10 records, why is it still slow?

You must know that the database does not know where the 1000000th record starts, even if there is an index, it needs to be calculated from scratch. This kind of performance problem occurs, in most cases, the programmer is lazy.

In scenarios such as front-end data browsing and page turning, or batch export of big data, the maximum value of the previous page can be used as a parameter as a query condition. The SQL is redesigned as follows: ?

SELECT *</code><code>FROM operation</code><code>WHERE type = 'SQLStats'</code><code>AND name = 'SlowLog'</code><code>AND create_time > '2017-03-16 14:00:00'</code><code>ORDER BY create_time limit 10;

Under the new design, the query time is basically fixed and will not change as the amount of data grows.

2. Implicit conversion

Type mismatch between query variable and field definition in SQL statement is another common mistake. For example, the following statement: ?

mysql> explain extended SELECT *</code><code> > FROM my_balance b</code><code> > WHERE b.bpn = 14000000123</code><code> > AND b.isverified IS NULL ;</code><code>mysql> show warnings;</code><code>| Warning | 1739 | Cannot use ref access on index 'bpn' due to type or collation conversion on field 'bpn'

Among them, the field bpn is defined as varchar(20), MySQL’s strategy is to convert strings to numbers before comparison. The function acts on the table field, and the index is invalid.

The above cases may be auto-filled parameters of the application framework, rather than the original intention of the programmer. Nowadays, there are many application frameworks that are very complicated. While it is convenient to use, be careful that it may dig a hole for yourself.

3. Association update and delete

Although MySQL 5.6 introduces the materialization feature, special attention needs to be paid to its current optimization only for query statements. For updating or deleting, it needs to be manually rewritten as JOIN.

For example, in the following UPDATE statement, MySQL actually executes a loop/nested subquery (DEPENDENT SUBQUERY), and its execution time can be imagined. ?

UPDATE operation o</code><code>SET status = 'applying'</code><code>WHERE o.id IN (SELECT id</code><code> FROM (SELECT o.id,</code><code> o.status</code><code> FROM operation o</code><code> WHERE o.group = 123</code><code> AND o.status NOT IN ( 'done' )</code><code> ORDER BY o.parent,</code><code> o.id</code><code> LIMIT 1) t);

Execution plan:

 + ---- + -------------------- + ------- + ------- + ---- ----------- + --------- + --------- + ------- + ------ + --- -------------------------------------------------- + </code><code>| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |</code><code> + ---- + -------- ------------ + ------- + ------- + --------------- + ----- ---- + --------- + ------- + ------ + ----------------- --------------------------------- + </code><code>| 1 | PRIMARY | o | index | | PRIMARY | 8 | | 24 | Using where; Using temporary |</code><code>| 2 | DEPENDENT SUBQUERY | | | | | | | | Impossible WHERE noticed after reading const tables |</code><code>| 3 | DERIVED | o | ref | idx_2,idx_5 | idx_5 | 8 | const | 1 | Using where; |</code><code> + ---- + -------------------- + ------- + ------- + --------------- + --------- + --------- + ------- + ----- - + ------------------------------------------------ ----- + 

After rewriting to JOIN, the selection mode of the subquery is changed from DEPENDENT SUBQUERY to DERIVED, and the execution speed is greatly accelerated, from 7 seconds to 2 milliseconds.

UPDATE operation o</code><code> JOIN (SELECT o.id,</code><code> o.status</code><code> FROM operation o</code><code> WHERE o. group = 123</code><code> AND o.status NOT IN ( 'done' )</code><code> ORDER BY o.parent,</code><code> o.id</code><code> LIMIT 1) t</code><code> ON o.id = t.id</code><code>SET status = 'applying'

The execution plan simplifies to: ?

 + ---- + ------------- + ------- + ------ + ------------ --- + ------- + --------- + ------- + ------ + ---------- ---------------------------------------- + </code><code>|id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |</code><code> + ---- + ------------- + ---- --- + ------ + --------------- + ------- + --------- + ----- -- + ------ + ---------------------------------------- ------------- + </code><code>| 1 | PRIMARY | | | | | | | Impossible WHERE noticed after reading const tables |</code><code>| 2 | DERIVED | o | ref | idx_2,idx_5 | idx_5 | 8 | const | 1 | Using where; Using filesort |</code><code> + ---- + ------------ - + ------- + ------ + --------------- + ------- + --------- + ------- + ------ + ---------------------------------- ------------------- + 

4. Mixed sorting

MySQL cannot use indexes for mixed sorting. But in some scenarios, there are still opportunities to use special methods to improve performance. ?

SELECT *</code><code>FROM my_order o</code><code> INNER JOIN my_appraise a ON a.orderid = o.id</code><code>ORDER BY a.is_reply ASC,</code><code> code><code> a.appraise_time DESC</code><code>LIMIT 0, 20

The execution plan shows a full table scan: ?

 + ---- + ------------- + ------- + -------- + ---------- --- + --------- + --------- + --------------- + --------- + - + </code><code>| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra</code><code> + ---- + -------- ----- + ------- + -------- + ------------- + --------- + --- ------ + --------------- + --------- + - + </code><code>| 1 | SIMPLE | a | ALL | idx_orderid | NULL | NULL | NULL | 1967647 | Using filesort |</code><code>| 1 | SIMPLE | o | eq_ref | PRIMARY | PRIMARY | 122 | > + ---- + ------------- + ------- + -------- + --------- + -- ------- + --------- + ----------------- + --------- + - + </ pre>
<p>Since <code>is_reply</code> only has two states of 0 and 1, after we rewrite it according to the following method, the execution time is reduced from 1.58 seconds to 2 milliseconds. ?</p>
<pre>SELECT *</code><code>FROM ((SELECT *</code><code> FROM my_order o</code><code> INNER JOIN my_appraise a</code><code> ON a.orderid = o.id</code><code> AND is_reply = 0</code><code> ORDER BY approve_time DESC</code><code> LIMIT 0, 20)</code><code> UNION ALL</code> <code> (SELECT *</code><code> FROM my_order o</code><code> INNER JOIN my_appraise a</code><code> ON a.orderid = o.id</code><code> AND is_reply = 1</code><code> ORDER BY approve_time DESC</code><code> LIMIT 0, 20)) t</code><code>ORDER BY is_reply ASC,</code><code> approvetime DESC</code><code>LIMIT 20;

5. EXISTS statement

When MySQL treats the EXISTS clause, it still uses the execution method of nested subqueries. Such as the following SQL statement: ?

SELECT *</code><code>FROM my_neighbor n</code><code> LEFT JOIN my_neighbor_apply sra</code><code> ON n.id = sra.neighbor_id</code><code> AND sra .user_id = 'xxx'</code><code>WHERE n.topic_status < 4</code><code> AND EXISTS(SELECT 1</code><code> FROM message_info m</code><code> WHERE n .id = m.neighbor_id</code><code> AND m.inuser = 'xxx')</code><code> AND n.topic_type <> 5

The execution plan is: ?

 + ---- + -------------------- + ------- + ------ + ----- + ------------------------------------------ + ------ --- + ------- + --------- + ----- + </code><code>| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |</code><code> + ---- + -------------------- + ------- + -- ---- + ----- + --------------------------------------- --- + --------- + ------- + --------- + ----- + </code><code>| 1 | PRIMARY | n | ALL | | NULL | NULL | NULL | 1086041 | Using where |</code><code>| 1 | PRIMARY | >| 2 | DEPENDENT SUBQUERY | m | ref | | idx_message_info | 122 | const | 1 | Using index condition; Using where |</code><code> + ---- + ---------- ---------- + ------- + ------ + ----- + ------------------ ------------------------ + --------- + ------- + ------- -- + ----- + 

Removing exists and changing to join can avoid nested subqueries and reduce the execution time from 1.93 seconds to 1 millisecond. ?

SELECT *</code><code>FROM my_neighbor n</code><code> INNER JOIN message_info m</code><code> ON n.id = m.neighbor_id</code><code> AND m .inuser = 'xxx'</code><code> LEFT JOIN my_neighbor_apply sra</code><code> ON n.id = sra.neighbor_id</code><code> AND sra.user_id = 'xxx'</code><code>WHERE n.topic_status < 4</code><code> AND n.topic_type <> 5

New execution plan:?

 + ---- + ------------- + ------- + -------- + ----- + ---- -------------------------------------- + --------- + - ---- + ------ + ----- + </code><code>|id|select_type|table|type|possible_keys|key|key_len|ref|rows|Extra|</code> <code> + ---- + ------------- + ------- + -------- + ----- + ---- -------------------------------------- + --------- + - ---- + ------ + ----- + </code><code>| 1 | SIMPLE | m | ref | | idx_message_info | 122 | const | 1 | Using index condition |</code><code>| 1 | SIMPLE | n | eq_ref | | PRIMARY | 122 | ighbor_id | 1 | Using where |</code><code>| 1 | SIMPLE | Using where |</code><code> + ---- + ------------- + ------- + -------- + --- -- + ------------------------------------------ + ---- ----- + ----- + ------ + ----- + 

6. Conditional push down

Situations where external query conditions cannot be pushed down to complex views or subqueries include:

  • aggregation subquery;

  • A subquery with LIMIT;

  • UNION or UNION ALL subqueries;

  • subquery in the output field;

As in the following statement, it can be seen from the execution plan that its condition is applied after the aggregation subquery: ?

SELECT *</code><code>FROM (SELECT target,</code><code> Count(*)</code><code> FROM operation</code><code> GROUP BY target) t</code><code>WHERE target = 'rm-xxxx'</code><code> + ---- + ------------- + --------- --- + ------- + --------------- + ------------- + -------- - + ------- + ------ + ------------- + </code><code>|id|select_type|table|type|possible_keys|key | key_len | ref | rows | Extra |</code><code> + ---- + ------------- + ------------ + - ------ + --------------- + ------------- + --------- + --- ---- + ------ + ------------- + </code><code>| 1 | PRIMARY | <derived2> | ref | <auto_key0> | > | 514 | const | 2 | Using where |</code><code>| 2 | DERIVED | operation | index | idx_4 | idx_4 | 519 | NULL | 20 | Using index |</code><code> + -- -- + ------------- + ------------ + ------- + ------------ --- + ------------- + --------- + ------- + ------ + ------- ------ + 

After confirming that the semantic query condition can be directly pushed down, rewrite it as follows: ?

SELECT target,</code><code>Count(*)</code><code>FROM operation</code><code>WHERE target = 'rm-xxxx'</code><code>GROUP BY target

The execution plan becomes:?

 + ---- + ------------- + ----------- + ------ + -------- ------- + ------- + --------- + ------- + ------ + --------- ----------- + </code><code>| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |</code><code> + -- -- + ------------- + ----------- + ------ + -------------- - + ------- + --------- + ------- + ------ + --------------- ----- + </code><code>| 1 | SIMPLE | operation | ref | idx_4 | idx_4 | 514 | const | 1 | Using where; Using index |</code><code> + ---- + ------------- + ----------- + ------ + --------------- + ------- + --------- + ------- + ------ + ----------------- --- + 

For a detailed explanation about the fact that MySQL external conditions cannot be pushed down, please refer to:

http://mysql.taobao.org/monthly/2016/07/08

7. Narrow down the scope in advance

First the initial SQL statement: ?

SELECT *</code><code>FROM my_order o</code><code> LEFT JOIN my_userinfo u</code><code> ON o.uid = u.uid</code><code> LEFT JOIN my_productinfo p</code><code> ON o.pid = p.pid</code><code>WHERE ( o.display = 0 )</code><code> AND ( o.ostaus = 1 )</code><code>ORDER BY o.selltime DESC</code><code>LIMIT 0, 15

The original meaning of the SQL statement is: first do a series of left joins, and then sort the first 15 records. It can also be seen from the execution plan that the estimated number of sorted records in the last step is 900,000, and the time consumption is 12 seconds. ?

 + ---- + ------------- + ------- + -------- + ---------- ----- + --------- + --------- + ----------------- + ------ -- + ----------------------------------------------- ----- + </code><code>|id|select_type|table|type|possible_keys|key|key_len|ref|rows|Extra|</code><code> + ---- + --- ---------- + ------- + -------- + --------------- + ------ --- + --------- + ----------------- + -------- + --------- ------------------------------------------- + </code><code>| 1 | SIMPLE | o | ALL | NULL | NULL | NULL | NULL | 909119 | Using where; Using temporary; Using filesort |</code><code>| 1 | SIMPLE | u | eq_ref | PRIMARY | | o.uid | 1 | NULL |</code><code>| 1 | SIMPLE | p | ALL | PRIMARY | NULL | NULL | NULL | 6 | Using where; Using join buffer (Block Nested Loop) |</code><code> + ---- + ------------- + ------- + -------- + --------------- + --------- + --------- + ----------- --- + -------- + ------------------------------------- --------------- + 

Since the last WHERE condition and sorting are all for the leftmost main table, you can sort my_order first to reduce the amount of data in advance and then do the left join. After the SQL is rewritten as follows, the execution time is reduced to about 1 millisecond. ?

SELECT *</code><code>FROM (</code><code>SELECT *</code><code>FROM my_order o</code><code>WHERE ( o.display = 0 )</code><code> AND ( o.ostaus = 1 )</code><code>ORDER BY o.selltime DESC</code><code>LIMIT 0, 15</code><code>) o</code> <code> LEFT JOIN my_userinfo u</code><code> ON o.uid = u.uid</code><code> LEFT JOIN my_productinfo p</code><code> ON o.pid = p.pid</code> code><code>ORDER BY o.selltime DESC</code><code>limit 0, 15

Check the execution plan again: After the subquery is materialized (select_type=DERIVED) participate in JOIN. Although the estimated row scan is still 900,000, the actual execution time becomes very small after using the index and the LIMIT clause. ?

 + ---- + ------------- + ------------ + -------- + ----- ---------- + --------- + --------- + ------- + -------- + -- -------------------------------------------------- + </code><code>| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |</code><code> + ---- + -------- ----- + ------------ + -------- + --------------- + ------ --- + --------- + ------- + -------- + ---------------- --------------------------------- + </code><code>| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 15 | Using temporary; Using filesort |</code><code>| 1 | PRIMARY | u | eq_ref | PRIMARY | code><code>| 1 | PRIMARY | p | ALL | PRIMARY | NULL | NULL | NULL | 6 | Using where; Using join buffer (Block Nested Loop) |</code><code>| 2 | DERIVED | o | index | NULL | idx_1 | 5 | NULL | 909112 | Using where |</code><code> + ---- + ------- ------ + ------------ + -------- + --------------- + ----- ---- + --------- + ------- + -------- + ------------------ ---------------------------------- + 

8. Push down the intermediate result set

Let’s look at the following example that has been preliminarily optimized (the main table in the left join takes priority as the query condition): ?

SELECT a.*,</code><code> c.allocated</code><code>FROM (</code><code> SELECT resourceid</code><code> FROM my_distribute d</code> <code> WHERE isdelete = 0</code><code> AND cusmanagercode = '1234567'</code><code> ORDER BY salecode limit 20) a</code><code>LEFT JOIN</code><code> (</code><code> SELECT resourcesid, sum(ifnull(allocation, 0) * 12345) allocated</code><code> FROM my_resources</code><code> GROUP BY resourcesid) c</code><code>ON a.resourceid = c.resourcesid

Are there any other problems with this statement? It is not difficult to see that the subquery c is a full-table aggregation query, which will lead to a decrease in the performance of the entire statement when the number of tables is particularly large.

In fact, for subquery c, the final result set of the left join only cares about the data that can match the resourceid of the main table. So we can rewrite the statement as follows, and the execution time drops from 2 seconds to 2 milliseconds. ?

SELECT a.*,</code><code> c.allocated</code><code>FROM (</code><code> SELECT resourceid</code><code> FROM my_distribute d</code> <code> WHERE isdelete = 0</code><code> AND cusmanagercode = '1234567'</code><code> ORDER BY salecode limit 20) a</code><code>LEFT JOIN</code><code> (</code><code> SELECT resourcesid, sum(ifnull(allocation, 0) * 12345) allocated</code><code> FROM my_resources r,</code><code> (</code><code> SELECT resourceid</code><code> FROM my_distribute d</code><code> WHERE isdelete = 0</code><code> AND cusmanagercode = '1234567'</code><code> ORDER BY salecode limit 20) a</code><code> WHERE r.resourcesid = a.resourcesid</code><code> GROUP BY resourcesid) c</code><code>ON a.resourceid = c.resourcesid

But the subquery a occurs multiple times in our SQL statement. This way of writing not only has additional overhead, but also makes the entire statement significantly complicated. Rewrite again using the WITH statement: ?

WITH a AS</code><code>(</code><code> SELECT resourceid</code><code> FROM my_distribute d</code><code> WHERE isdelete = 0</code><code> AND cusmanagercode = '1234567'</code><code> ORDER BY salecode limit 20)</code><code>SELECT a.*,</code><code> c.allocated</code><code>FROM a</code><code>LEFT JOIN</code><code> (</code><code> SELECT resourcesid, sum(ifnull(allocation, 0) * 12345) allocated</code><code> FROM my_resources r ,</code><code> a</code><code> WHERE r.resourcesid = a.resourcesid</code><code> GROUP BY resourcesid) c</code><code>ON a.resourceid = c. resource_id

Summarize

The database compiler generates an execution plan, which determines how SQL is actually executed. But the compiler is just doing its best, and not all database compilers are perfect. Most of the scenarios mentioned above also have performance problems in other databases. Only by understanding the characteristics of the database compiler can we avoid its shortcomings and write high-performance SQL statements. When designing data models and writing SQL statements, programmers should bring in the idea or awareness of algorithms.

To write complex SQL statements, get into the habit of using the WITH statement. Concise and clear SQL statements can also reduce the burden on the database.

Source: developer.aliyun.com/article/72501

The knowledge points of the article match the official knowledge files, and you can further learn related knowledge MySQL entry skill tree SQL advanced skillsCTE and recursive query 44710 people are studying systematically