MySQL index failure cannot escape these scenarios~

Foreword

In daily work, SQL is an indispensable skill, but many people don’t pay much attention to SQL issues. As a result, they can’t write SQL well, and overtime is indispensable. There are two reasons, one is that the amount of data is too small, and the other is that the importance of indexes is not realized. This article mainly sorts out the scenarios where SQL fails. If you already know these details, then you must be a person with strong learning ability, worship! After I finished writing this article, I felt that I only knew the “directory” before, but didn’t really understand the contents. If you can follow the rhythm and read this article, you will definitely gain something. At least after I finished writing, I feel that my thinking has become more transparent, and I can face 90% of SQL index questions and interview questions with ease in the future.

Basic data preparation

Prepare a data table as a data demonstration. There are three indexes created in total.

Union Index sname, s_code, address
Primary key index id
Normal index height

SET NAMES utf8mb4;
SET FOREIGN_KEY_CHECKS = 0;

-- ----------------------------
-- Table structure for students
-- ----------------------------
DROP TABLE IF EXISTS `student`;
CREATE TABLE `student` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `sname` varchar(20) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
  `s_code` int(100) NULL DEFAULT NULL,
  `address` varchar(100) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,
  `height` double NULL DEFAULT NULL,
  `classid` int(11) NULL DEFAULT NULL,
  `create_time` datetime(0) NOT NULL ON UPDATE CURRENT_TIMESTAMP(0),
  PRIMARY KEY (`id`) USING BTREE,
  INDEX `common index`(`height`) USING BTREE,
  INDEX `joint index`(`sname`, `s_code`, `address`) USING BTREE
) ENGINE = InnoDB AUTO_INCREMENT = 5 CHARACTER SET = utf8 COLLATE = utf8_general_ci ROW_FORMAT = Dynamic;

-- ----------------------------
-- Records of student
-- ----------------------------
INSERT INTO `student` VALUES (1, 'Student 1', 1, 'Shanghai', 170, 1, '2022-11-02 20:44:14');
INSERT INTO `student` VALUES (2, 'Student 2', 2, 'Beijing', 180, 2, '2022-11-02 20:44:16');
INSERT INTO `student` VALUES (3, 'become Patrick', 3, 'Jingdong', 185, 3, '2022-11-02 20:44:19');
INSERT INTO `student` VALUES (4, 'Student 4', 4, 'Unicom', 190, 4, '2022-11-02 20:44:25');

Text

In the above SQL, we have created the basic data. Before the verification, we have a few questions

Let’s verify from the top down first

Leftmost matching principle

Written in front: I heard about the leftmost matching principle of the database a long time ago, and I learned about it through various blog forums, but the limitation of these blogs is that their description of the leftmost matching principle is like some mathematical definitions , 123 points are often listed, and if these 123 points are met, the index can be matched, otherwise it cannot.

The leftmost matching principle means that in the joint index, if your SQL statement uses the leftmost index in the joint index, then this SQL statement can use this joint index for matching. The joint index we built above can Used to test the leftmost matching principle sname, s_code, address

Please look at the following SQL statement and think about whether it will use the index

-- joint index sname,s_code,address

1. select create_time from student where sname = "become Pai Daxing" -- will it go to the index?

2. select create_time from student where s_code = 1 -- will it take the index?

3. select create_time from student where address = "Shanghai" -- Will it go to the index?

4. select create_time from student where address = "Shanghai" and s_code = 1 -- Will it go to the index?

5. select create_time from student where address = "Shanghai" and sname = "become Pai Daxing" -- will it go to the index?

6. select create_time from student where sname = "become Pai Daxing" and address = "Shanghai" -- will it go to the index?

7. select create_time from student where sname = "Become Pai Daxing" and s_code = 1 and address = "Shanghai" -- will it go to the index?

Based on your experience, which ones use indexes? You can think about it first and write down the numbers in your mind

Example of walking index

EXPLAIN select create_time from student where sname = "become Pai Daxing" -- Will it go to the index?

Example without index

EXPLAIN select create_time from student where address = "Shanghai" and s_code = 1 -- Will the index be used?

Full table scan rows = 4

If your inner answer is not all right, then read on

The leftmost matching principle is just as the name implies: the leftmost is first, and any continuous index starting from the leftmost can be matched. When range queries (>, <, between, like) are encountered at the same time, matching will stop.

For example: s_code = 2 If you create an index in the order of (sname, s_code), it will not match (sname, s_code) code>) indexed;


But if the query condition is sname = "become Pai Daxing" and s_code = 2 or a=1 (or s_code = 2 and sname = "become Pai Daxing" ), because the optimizer The order of sname, s_code will be adjusted automatically.
Another example is sname = "become Pai Daxing" and s_code > 1 and address = "Shanghai" address is not indexed, because the s_code field is a range query, after it fields will stop matching.
Query without range Index usage type

Use type with range

According to the explanation in the previous article, it can be understood that the meanings of ref and range are still quite different.

think
Why must the left link follow the most left-fixed principle?
verify
I saw a more interesting answer:

 "
You can think that the joint index is the design of the game. For example, if your joint index is state/city/zipCode, then the state is the first level, the city is the second level, and the zipCode is the third level. You must match the first level to match the second level. Off, match the first level and the second level, to match the third level

This description is not entirely accurate, but it is indeed the idea
To understand the leftmost matching principle of the joint index, first understand the underlying principle of the index. The bottom layer of the index is a B + tree, so the bottom layer of the joint index is also a B + tree, but the B + tree nodes of the joint index store key values. Since building a B + tree can only determine the index relationship based on one value, the database relies on the leftmost field of the joint index to build the text is more abstract Let's take a look
Join us to build A, B joint index, what do they look like in the underlying storage?


Orange represents field A


Light green represents field B


Illustration:

We can see several features


A is in sequence 1, 1, 2, 2, 3, 4


B is in no order 1, 2, 1, 4, 1, 2 this is hashed


If A is equivalent, B is ordered. For example, (1, 1), (1, 2) where B is ordered (2, 1), (2,4) B is also ordered


Here you should be able to see that if there is no support for A, the index of B is hashed and not continuous
To be more detailed, we recreate a table
DROP TABLE IF EXISTS `leftaffix`;

CREATE TABLE `leftaffix` (

  `a` int(11) NOT NULL AUTO_INCREMENT,

  `b` int(11) NULL DEFAULT NULL,

  `c` int(11) NULL DEFAULT NULL,

  `d` int(11) NULL DEFAULT NULL,

  `e` varchar(11) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,

  PRIMARY KEY (`a`) USING BTREE,

  INDEX `joint index`(`b`, `c`, `d`) USING BTREE

) ENGINE = InnoDB AUTO_INCREMENT = 8 CHARACTER SET = utf8 COLLATE = utf8_general_ci ROW_FORMAT = Dynamic;
 
-- ----------------------------
-- Records of leftaffix
-- ----------------------------
INSERT INTO `leftaffix` VALUES (1, 1, 1, 1, '1');

INSERT INTO `leftaffix` VALUES (2, 2, 2, 2, '2');

INSERT INTO `leftaffix` VALUES (3, 3, 2, 2, '3');

INSERT INTO `leftaffix` VALUES (4, 3, 1, 1, '4');

INSERT INTO `leftaffix` VALUES (5, 2, 3, 5, '5');

INSERT INTO `leftaffix` VALUES (6, 6, 4, 4, '6');

INSERT INTO `leftaffix` VALUES (7, 8, 8, 8, '7');
SET FOREIGN_KEY_CHECKS = 1;



 "
When creating the index tree, the data will be sorted according to the leftmost suffix principle. It will be sorted by B first, that is, if the values are the same, it will be sorted according to C. If C is the same, it will be sorted according to D. After sorting the order, it will be as follows:




 "
The index generation will be generated according to the order in Figure 2. Let's take a look at what the generated tree data looks like



 "
Explain some of this dendrogram First, according to the sorting in Figure 2, we know that the order is 1111a 2222b, so in the third layer we can see that 1111a is in the first layer, 2222b is in the second layer, because 111 < 222, so 111 enters the second layer and then we get level one


Simplified like this
But this order is relative. This is because the rule for MySQL to create a joint index is to first sort the leftmost first field of the joint index, based on the sorting of the first field, and then sort the second field . So there is no way to use the index for the query condition of B=2.
Seeing this, we can also understand why we don’t recommend creating fields that change frequently when we build an index, because in this case, our index structure will change with your changes, so it consumes performance
Replenish
Tips from the brother in the comment area The leftmost suffix principle can be broken by skip scanning. Simply sort out the knowledge in this area
This is optimized in 8.0
MySQL version 8.0 began to add the function of index skip scanning. When the unique value of the first column index is less, even if the where condition does not have the first column index, the joint index can also be used in the query .
For example, the joint index we use is bcd, but there are fewer fields in b. We did not use b when using the joint index, but we can still use the joint indexMySQL joint index sometimes follows the leftmost prefix matching principle, and sometimes does not. 
small summary
Prerequisites If the b,c,d joint index surface is created


If the condition behind my where is c = 1 and d = 1 why can’t I use the index? If there is no b, the value you query is equivalent to  *11 We all know * means all means that I can match all the data


If my where is followed by  b = 1 and d =1 why is it indexed? The data you are querying is 1*1, I can use the previous 1 for index matching, so I can go to the index


The most important part of the leftmost prefix matching principle is the first field


Let's move on to the next failure scenario
select *
think
Here is a misunderstanding of my thinking before select * will not cause the index to fail. The previous test found that the failure is because the query range behind where is too large, causing the index to fail. It is not caused by Select *, but why is it not recommended to use select *
explain


Increase query analyzer parsing cost.


Adding or subtracting fields is easy to be inconsistent with the resultMap configuration.


Useless fields increase network consumption, especially text type fields.


In Ali's development manual, the above points are summarized in a big way.
When using Select * index use normal

Although the index is gone, this way of writing is not recommended. Why?
First of all, we created a joint index in the previous verification. We use B=1 to go to the index, but it is different from directly querying the index field. Using SELECT* to obtain unnecessary data, first filter through the auxiliary index Data, and then get all the columns through the clustered index, which adds a b + tree query, and the speed will inevitably be much slower. Reducing the use of select * is to reduce the loss caused by returning to the table.


That is, Select * will use the index in some cases. If it does not use the index, the where query range is too large, causing MySQL to choose the full table scan optimally, and it is not a problem of Select *

The above picture is the case of index failure
The range search does not necessarily cause the index to fail

small summary


select * will walk the index


Range search has a probability of index failure, but it will take effect under certain circumstances. If the range is small, it will be used. It can also be understood that if the returned result set is small, the index will be used.


The principle of connection query in mysql is to query the driving table first, and then use the data obtained from the driving table as a condition to query the driven table one by one.



Every time the driving table loads a piece of data into the memory, then all the data in the driven table needs to be loaded into the memory for comparison. The efficiency is very low, so you can specify the size of a buffer pool in mysql. If the buffer pool is large, you can load multiple pieces of data from the driver table at the same time for comparison. The more data you put, the less performance io operations and the better the performance. . Therefore, if you use select * to put some useless columns at this time, it will only occupy the buffer space in vain. Wasted opportunities to improve performance.


According to the brother in the comment area, select * is not the direct cause of index failure. Most of the reasons are the conditions behind where, but it is better to use select * as little as possible, and it will still have an impact


Using functions
Using the function behind Select can use the index, but the following method cannot


Because the index saves the original value of the index field, not the value calculated by the function, naturally there is no way to use the index.
However, starting from MySQL 8.0, the index feature has added a function index, that is, an index can be created for the value after the function calculation, that is to say, the value of the index is the value after the function calculation, so the data can be queried by scanning the index.
I have never used this kind of writing method. I feel that it is relatively rare and it is easier to notice this kind of writing method.
Calculation operations
This situation is the same as above. The reason why the index is invalid is because the original value of the index has been changed. The corresponding data cannot be found in the tree and can only be scanned in full.

Because the index saves the original value of the index field, not the value calculated by the b - 1 expression, it is impossible to go through the index. It can only be done by taking out the values of the index field, and then calculating the expression in turn. Conditional judgment, so the method of full table scan is adopted.
The following calculation will use the index

Those who are more familiar with Java may have some doubts. This kind of simple expression calculation on the index should be able to perform index scanning under the special treatment of the code. For example, changing b - 1 = 6 to b = 6 - 1. Yes, it can be realized, but MySQL still stole this laziness and did not realize it.
small summary
All in all, all in all, as long as it affects the value of the index column, the index is invalid
Like %
1. This is really uncomfortable because I often use this, so I still have to be careful. Before looking at why it fails, let’s take a look at the explanation of Like %


%% wildcard: means that any character appears any number of times (can be 0 times).


_Underscore wildcard: means that only a single character can be matched, neither more nor less, just one character.


like operator: The role of LIKE is to indicate that the search pattern behind mysql is to use wildcards instead of direct equality matching for comparison.


Note: If the like operator is used, the effect of not using the universal matcher is consistent with =,
SELECT * FROM products WHERE products.prod_name like '1000';

2. Match records containing "Li" (including records "Li"):
SELECT* FROM products WHERE products.prod_name like '%Li%';

3. Match records ending with "Li" (including records "Li", excluding records "Li", that is, records with spaces behind Li, which need to be noted here)
SELECT * FROM products WHERE products.prod_name like '%Li';

don't go left, go right
Right: Although it is going, the index level is relatively low, mainly because the scope of fuzzy query is relatively large, so the index level is relatively low

Left: This range is very large, so there is no need to use indexes. This may not be very optimized. Fortunately, it is not always splicing the above

Summary
When indexing, it has a lot to do with the scope of the query. There are many cases where the index is meaningless and invalid because the scope is too large.
Using Or causes the index to fail
The reason is simpler
In the WHERE clause, if the condition column before the OR is an index column, but the condition column after the OR is not an index column, then the index will be invalid. For example, in the following query statement, b is the primary key and e is an ordinary column , Judging from the results of the execution plan, it is a full table scan.

optimization
The optimization method of this is to add indexes on both sides when Or
Indexes are used to avoid full table scans

in improper use
First of all, using In does not necessarily cause a full table scan. IN will definitely use the index, but when the value range of IN is large, the index will become invalid, and the full table scan will be used


in The index fails when the result set is greater than 30%
The failure scenarios for not in and in are the same
order By

This is mainly a problem of Mysql’s own optimization. We all know that OrderBy is sorting, which means that I need to sort the data. If I go to the index, the index is sorted, but I need to go back to the table to consume time. Another way is to directly scan and sort the entire table. No need to return the form


Walk the index + return to the table


Direct full table scan without indexing


Mysql thinks that the speed of direct full table scan is faster than the speed of returning to the table, so it directly uses the index. In the case of Order By, it is a better choice to use the full table scan
Will the subquery go to the index?
The answer is yes, but if you don’t use it well, you won’t
Big summary