MySQL sorting, grouping, and Limit optimization strategies

Table of Contents

1. Two sorting methods in MySQL

2. Sorting optimization strategy

2.1 Add index to sorting field

2.2 You can create a joint index with the WHERT field

2.3 Optimize FilerSort sorting method

3. Group optimization strategy

3.1 Can WHERE but not HAVING

3.2 Reduce ORDER BY, GROUP BY, DISTINCT

3.3 Follow the leftmost prefix rule

4. Limit optimization strategy

1. Two sorting methods in MySQL

In MySQL, there are two main sorting methods supported, namely FileSort and Index.

Index: Index sorting means that we add an index to the sorted field. Because the index itself is ordered, we save a lot of time when sorting. There is no need to reorder, we can just take out the data directly, which is very efficient. high.

FileSort: File sorting. After querying the data, because no index is set, the CPU needs to sort in the memory. After sorting, the data will be returned. And if the amount of data is large, the sorting time will also become longer. ; Moreover, if the amount of data is very large and cannot fit in the memory, multiple IO operations are required. First, read part of the data for sorting, and then read part of the data for sorting, which is inefficient.

2. Sorting optimization strategy

2.1 Add index to sort field

It is not difficult to see from the above two sorting methods that Index index sorting is obviously more efficient than FileSort memory sorting, so it is best for us to add an index to the sorting field, so that the ordered data is retrieved during query , saving sorting time;

As shown below, I query the employees table and sort by the salary field. At this time, the index has not been set for the salary field, and 107 pieces of data are queried, which takes 0.024 seconds;

I now set up a normal index for the salary field, and then do the query again

It can be seen that after adding the index, 107 pieces of data were queried, and the time was shortened to 0.017 seconds. Maybe students think there is no difference. This is just because the amount of data is small. There are only more than a hundred records. If there are Thousands of pieces of data, the time it takes suddenly widens the distance.

2.2 You can create a joint index with the WHERT field

In SQL statements, sorting usually also appears in the WHERE filter field. In this case, we can consider establishing a joint index for the WHERE filter field and the ORDER BY sorting field. If they are the same field, it would be even better. Create an independent index for this field; if there are two fields, create a joint index, but be aware that if WHERE filtering is a range search, it will cause subsequent indexes in the joint index to fail. , then even if the sorting field index is set, it will not be used. When setting up a joint index, you must pay attention to meeting the leftmost prefix principle to ensure that the index can take effect.

As shown below, at this time, the department_id and salary fields have indexes, but there is no joint index, so only the index of the department_id field is used in the query.

At this point I create a joint index for department_id and salary and query again

The query obtains the same results, and the time of using the joint index is 0.017 seconds, which is 0.02 seconds faster than using the department_id index alone;

2.3 Optimize FilerSort sorting method

Sometimes, we cannot avoid FileSort memory sorting. In fact, there are two ways of memory sorting, namely dual-way sorting and single-way sorting.

Dual-way sorting: Scan the disk twice. The database will first IO load the fields that need to be sorted into the memory for sorting. After sorting, the complete data will be queried through IO again based on the sorted fields;

Single-way sorting: The database will load all the data into the memory at one time, and then sort it. During IO, it will read sequentially IO, and then sort after reading, which is better than dual-way sorting. Because dual-way sorting reads data according to the sorted order when reading data for the second time IO, it is random IO, which is obviously not as fast as sequential IO. But if the amount of data is large, the memory requirements are high. However, memory technology is developing rapidly now, and memory is no longer valuable, so it is usually recommended to use single-way sorting.

3. Group optimization strategy

3.1 Can WHERE but not HAVING

HAVING is also a filter keyword. You can use the aggregate function to filter again after it. However, it is recommended that the filter conditions that can be written after WHERE should not be written after HAVING. The small amount of data remaining after WHERE filtering will only cost no matter whether it is sorted or grouped. It takes very little time, so don’t use HAVING for data that can be WHERE filtered.

3.2 Reduce ORDER BY, GROUP BY, DISTINCT

For databases, operations such as sorting, grouping, and deduplication are relatively cumbersome and consume resources. If all operations are placed in the database, it is very easy to have slow queries, so we can consider doing these operations on the program side. , after the database queries the data, use the program code to sort, group, and remove duplication;

3.3 Follow the leftmost prefix rule

The rules for using indexes in GROUP BY are almost the same as those in ORDER BY. Try to follow the leftmost prefix principle of the index;

4. Limit optimization strategy

There are some extreme cases, as follows, I fetch the ten records after the 100,000 record. In this case, the database will load all the data into the memory. After paging sorting, only the ten records after the 10,000 record will be fetched. A lot of useless work was done.

SELECT * FROM employees ORDER BY employee_id LIMIT 10000,10;

Then we can modify the above SQL and directly use WHERE to filter the first 10,000 data, starting from the 10001st record. Improve efficiency, but in fact this rarely happens. If there is such a need, it is recommended to directly use 10000 as a filter condition of WHERE;

SELECT * FROM employees WHERE employee_id > 10000 LIMIT 10000,10;