Who said that the MySQL index has no pitfalls, I almost “stepped on” all the data and lost it

When you add an index to a MySQL table, but do not use the index when executing SQL queries, you will be very puzzled, why is this? Here I will tell you about several reasons why the index fails, so as to help you avoid stepping on the pit.

  1. Data preparation

First, let’s look at an example. Suppose we have a user table with three fields: id, name, and age. We add an index on the name field.

CREATE TABLE `user` (
  `id` int NOT NULL AUTO_INCREMENT COMMENT 'primary key',
  `name` varchar(255) DEFAULT NULL COMMENT 'name',
  `age` int DEFAULT NULL COMMENT 'age',
  PRIMARY KEY (`id`),
  KEY `idx_name` (`name`)
) ENGINE=InnoDB COMMENT='user table';
  • 2. Explain in detail

Want to check whether a SQL uses an index? What type of index is used? You can use the explain keyword to view the SQL execution plan. For example:

EXPLAIN SELECT * FROM user WHERE id=1;

The execution result will display a type field, indicating which type of index is used. The common type types are as follows:

1) system: the table has only one row of records, which is a special case of the const type;

2) const: When using primary key or unique index query, only one row matches, which is the most efficient;

3) eq_ref: When using primary key or unique index query, only one row matches, but multi-table joint query is required;

4) ref: ref or in operation is used for non-unique index query;

5) range: when using range index query, such as between, in, etc.;

6) index: full table scan, but only need to read the index tree;

7) all: full table scan, all data needs to be read.

3. Failure reason

Now, let’s look at why indexes fail.

(1) Data type implicit conversion

First of all, we need to pay attention to data type conversion. Suppose we want to query users by name, but the query condition is a numeric type, then data type conversion will occur. Although no error will be reported, the index cannot be used.

For example:

EXPLAIN SELECT * FROM user WHERE name = 'JoJo';
EXPLAIN SELECT * FROM user WHERE name = 18;

The first SQL can use the index, but the second cannot use the index because the query condition is not a string type.

(2) Fuzzy query like starts with %

The second problem is fuzzy query. If we use the like operation on the name field and start with %, the index cannot be used.

EXPLAIN SELECT * FROM user WHERE name LIKE '张%';
EXPLAIN SELECT * FROM user WHERE name LIKE '%张';

Neither of these two SQLs can use indexes, because the like operation requires a full table scan.

(3) The index is not used at the same time before and after or

The third problem is the or operation. If we use the or operation in the query condition, and the index is not used before and after the or, it will lead to a full table scan.

For example:

EXPLAIN SELECT * FROM user WHERE name = 'JoJo' OR age = 18;

Although the name field is indexed, the age field is not indexed, and the entire table will be scanned when using or.

(4) The joint index does not use the first column index

The fourth problem is the joint index. If we build a joint index on (name, age), but only the age field is used in the query condition, the index cannot be used.

When using a joint index, you must follow the leftmost matching principle, first use the first column field, and then use the second column field.

For example:

CREATE TABLE `user` (
  `id` int NOT NULL AUTO_INCREMENT COMMENT 'primary key',
  `name` varchar(255) DEFAULT NULL COMMENT 'name',
  `age` int DEFAULT NULL COMMENT 'age',
  PRIMARY KEY (`id`),
  KEY `idx_name_age` (`name`, `age`)
) ENGINE=InnoDB COMMENT='user table';

If we execute the following SQL:

EXPLAIN SELECT * FROM user WHERE age = 18;

Although there is a joint index on (name, age), MySQL does not use it, but uses a full table scan.

(5) Perform calculation operations on the index field

The fifth problem is to perform calculation operations on the index column, and the index cannot be used.

For example:

EXPLAIN SELECT * FROM user WHERE id + 1 = 2;

Calculations are performed on the primary key index, resulting in a full table scan.

(6) Use functions on indexed fields

The sixth problem is to use functions in index columns, and indexes cannot be used.

For example:

EXPLAIN SELECT * FROM user WHERE SUBSTRING(name, 1, 1) = 'Zhang';

Using a function on an indexed column causes a full table scan.

(7) The optimizer chooses the wrong index

The last problem is that the optimizer chooses the wrong index. Sometimes the same SQL query uses the index, but sometimes it doesn’t. What’s going on?

This may be the result of the choice of the optimizer, which will choose whether to use indexes according to the amount of data in the table. When most of the names in the table are Qiao Qiao, at this time, if name=’Qiao Qiao’ is used to query, will the index be used?

The index optimizer will think that using an index is not as fast as a full table scan, so it doesn’t use the index at all.

Of course, we think that the optimizer is wrong, and we can also use FORCE INDEX to force the use of indexes.

For example:

EXPLAIN SELECT * FROM user FORCE INDEX(idx_name) WHERE name = 'Jojo';

Forces the use of the idx_name index.

To sum up, when we use indexes, we need to pay attention to issues such as data type conversion, fuzzy query, or operation, joint index, calculation operation, function operation and wrong choice of optimizer. In order to avoid stepping on the pit, we need to use the index reasonably according to the actual situation to improve the query efficiency.

Here are some practical examples to help you better understand these issues:

Data type implicit conversion

Suppose we have a table with an id field and a value field, where id is an integer type and value is a string type. We created an index on the value field, and now we want to query the records whose value is equal to 1.

CREATE TABLE `test` (
  `id` int(11) NOT NULL,
  `value` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_value` (`value`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

INSERT INTO `test` VALUES (1,'1'),(2,'2'),(3,'3');

If we execute the following SQL:

EXPLAIN SELECT * FROM test WHERE value = 1;

MySQL will perform data type conversion, convert 1 to a string type, and then query. Since the index is created on the value field and is of string type, the index cannot be used here, and the result is a full table scan.

In order to avoid this problem, we need to ensure that the data types of the query conditions and index columns are consistent, and avoid data type conversion.

Fuzzy query

Suppose we have a table with a name field, and we have built an index on the name field, and now we want to query all records starting with “Zhang”.

CREATE TABLE `test` (
  `id` int(11) NOT NULL,
  `name` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_name` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

INSERT INTO `test` VALUES (1,'Zhang San'),(2,'Li Si'),(3,'Wang Wu');

If we execute the following SQL:

EXPLAIN SELECT * FROM test WHERE name LIKE '张%';

Since the fuzzy query requires a full table scan, the index cannot be used here, and the result is a full table scan.

In order to avoid this problem, we need to avoid using fuzzy queries as much as possible. If you must use fuzzy queries, you can consider putting wildcards at the end of the string.

or operation

Suppose we have a table with a name field and an age field, we have built an index on the name field, and now we want to query all records whose name is “Zhang San” or whose age is 18 years old.

CREATE TABLE `test` (
  `id` int(11) NOT NULL,
  `name` varchar(255) DEFAULT NULL,
  `age` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_name` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

INSERT INTO `test` VALUES (1,'Zhang San',18),(2,'Li Si',20),(3,'Wang Wu',25);

If we execute the following SQL:

EXPLAIN SELECT * FROM test WHERE name = 'Zhang San' OR age = 18;

Since there is no index on the age field, the index cannot be used here, and the result is a full table scan.

In order to avoid this problem, we need to avoid using the or operation as much as possible. If the or operation must be used, the query condition can be split into multiple queries, and then the results can be combined.

Joint Index

Suppose we have a table with a name field and an age field, we have built a joint index on (name, age), and now we want to query all records whose age is 18 years old.

CREATE TABLE `test` (
  `id` int(11) NOT NULL,
  `name` varchar(255) DEFAULT NULL,
  `age` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_name_age` (`name`, `age`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

INSERT INTO `test` VALUES (1,'Zhang San',18),(2,'Li Si',20),(3,'Wang Wu',25);

If we execute the following SQL:

EXPLAIN SELECT * FROM test WHERE age = 18;

Although there is a joint index on (name, age), MySQL does not use it, but uses a full table scan.

In order to avoid this problem, we need to place the query conditions on the leftmost of the joint index as much as possible.

Calculation operations

Suppose we have a table with an id field, we have built an index on the id field, and now we want to query all records with id equal to 2.

CREATE TABLE `test` (
  `id` int(11) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

INSERT INTO `test` VALUES (1),(2),(3);

If we execute the following SQL:

EXPLAIN SELECT * FROM test WHERE id + 1 = 3;

Calculations are performed on the primary key index, resulting in a full table scan.

To avoid this problem, we need to avoid computational operations on indexed columns.

Function Operation

Suppose we have a table with a name field, and we have built an index on the name field, and now we want to query all records starting with “Zhang”.

CREATE TABLE `test` (
  `id` int(11) NOT NULL,
  `name` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_name` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

INSERT INTO `test` VALUES (1,'Zhang San'),(2,'Li Si'),(3,'Wang Wu');

If we execute the following SQL:

EXPLAIN SELECT * FROM test WHERE SUBSTRING(name, 1, 1) = 'Zhang';

Using a function on an indexed column causes a full table scan.

To avoid this problem, we need to avoid using functions on indexed columns.

The optimizer picked the wrong index

Suppose we have a table with a name field, and we have built an index on the name field. Now most of the names in the table are “Jojo”, at this time, use name=’Jojo’ to query, Will the index be used?

If we execute the following SQL:

EXPLAIN SELECT * FROM test WHERE name = 'JoJo';

MySQL will think that using an index is not as fast as a full table scan, so it doesn’t need an index at all.

This problem is more difficult, sometimes the optimizer will choose the wrong index, resulting in slow query. If we think that the optimizer optimizes incorrectly, we can use FORCE INDEX to force the use of indexes.

EXPLAIN SELECT * FROM test FORCE INDEX (idx_name) WHERE name = 'Jojo';

Using FORCE INDEX to specify the idx_name index can prevent the optimizer from choosing the wrong index.

The following are some interesting examples to make you understand the use and precautions of MySQL indexes more easily and happily.

(1) The Lure of Sharks

Suppose we have a fish table with a name field and a length field, we have built an index on the name field, and now we want to query all records of sharks whose length is greater than 1 meter.

CREATE TABLE `fish` (
  `id` int(11) NOT NULL,
  `name` varchar(255) DEFAULT NULL,
  `length` decimal(10,2) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_name` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

INSERT INTO `fish` VALUES (1,'tiger shark',2.5),(2,'cat shark',1.8),(3,'clown shark',0.3);

If we execute the following SQL

EXPLAIN SELECT * FROM fish WHERE length > 1;

Since there is no index on the length field, the index cannot be used here, and the result is a full table scan. To avoid this problem, we need to create an index on the length field.

ALTER TABLE fish ADD KEY idx_length (`length`);

Then execute the following SQL:

EXPLAIN SELECT * FROM fish WHERE length > 1;

Now we can use the idx_length index and avoid a full table scan.

(2) Salesman’s confusion

Suppose we have a sales table with an id field and a name field, where the name field stores the name of the salesman. We set up a primary key index on the id field. Now we want to query the records whose salesman’s name starts with “Zhang”.

CREATE TABLE `sales` (
  `id` int(11) NOT NULL,
  `name` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

INSERT INTO `sales` VALUES (1,'Zhang San'),(2,'Li Si'),(3,'Wang Wu');

If we execute the following SQL:

EXPLAIN SELECT * FROM sales WHERE name LIKE '张%';

Since there is no index on the name field, the index cannot be used here, and the result is a full table scan. To avoid this problem, we need to create an index on the name field.

ALTER TABLE sales ADD KEY idx_name (`name`);

Then execute the following SQL:

EXPLAIN SELECT * FROM sales WHERE name LIKE '张%';

Now we can use the idx_name index and avoid a full table scan.

(3) Santa’s gift

Suppose we have a gift table with an id field and a name field, where the name field stores the name of the gift. We created a primary key index on the id field, and now we want to query all gift records and sort them by the name field.

CREATE TABLE `gift` (
  `id` int(11) NOT NULL,
  `name` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
INSERT INTO `gift` VALUES (1,'Christmas hat'),(2,'gift bag'),(3,'Elk cookies');

If we execute the following SQL:

EXPLAIN SELECT * FROM gift ORDER BY name;

Since there is no index on the name field, the index cannot be used here. As a result, file sorting is used, which is less efficient. To avoid this problem, we need to create an index on the name field.

ALTER TABLE gift ADD KEY idx_name (`name`);

Then execute the following SQL:

EXPLAIN SELECT * FROM gift ORDER BY name;

Now we can use the idx_name index and avoid file sorting.

(4) Doubts about the World Cup

Suppose we have a football table with an id field and a name field, where the name field stores the name of the football team. We built an index on the name field, now we want to query all records that are not “England” team.

CREATE TABLE `football` (
  `id` int(11) NOT NULL,
  `name` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_name` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

INSERT INTO `football` VALUES (1,'Argentina'),(2,'England'),(3,'Portugal');

If we execute the following SQL:

EXPLAIN SELECT * FROM football WHERE name != 'England';

MySQL will think that using an index is not as fast as a full table scan, so it doesn’t use the index at all, and the result is a full table scan. To avoid this problem, we need to change the query condition to use not like.

EXPLAIN SELECT * FROM football WHERE name NOT LIKE 'England';

This way MySQL will use the idx_name index.

Summarize

Using indexes can improve query efficiency, but if used improperly, it will cause the index to fail and even slow down the query. In order to avoid this problem, we need to avoid using fuzzy queries, or operations, calculation operations and function operations as much as possible. At the same time, we need to pay attention to the implicit conversion of data types, the order of joint indexes, and the problem that the optimizer chooses wrong indexes.

Finally, we need to use the EXPLAIN command to view the SQL execution plan to ensure that the SQL can use the index. If SQL cannot use indexes, we can use FORCE INDEX to force the use of indexes.