10 Index optimization and query optimization

Article directory

  • Index failure case
  • Related query optimization
      • For left outer join
      • For inner join
      • JOIN statement principle
        • Simple nested loop connection SNLJ
        • Indexed nested loop join INLJ
        • Block nested loop join BNLJ
        • Hash Join
  • subquery optimization
  • Sorting optimization
      • filesort algorithm: two-way sorting and single-way sorting
  • Group optimization
  • Pagination optimization
  • Prioritize covering indexes
  • Index push down ICP
      • Conditions of Use
  • Other query optimization strategies

The most effective way to improve performance in MySQL is to design a reasonable index for the data table. Using the index can quickly locate a record in the table, thereby increasing the speed of database queries and improving database performance. If no index is used in the query, the query statement will scan all records in the table. When the amount of data is large, the query speed will be very slow.

In most cases, B+ trees are used by default to build indexes.

Whether to use an index is determined by the cost-based optimizer. In addition, whether to use an index is related to the database version, data volume, and data selectivity.

Index failure case

  1. Full value matches my favorite

  2. Best Left Prefix Rule
    An index in MySQL can include 16 fields. It follows the best left prefix matching principle when retrieving data, that is, starting from the leftmost field of the joint index.
    When a user creates an index, for a multi-column index, the filter conditions to use the index must be met in the order in which the index is created. Once a field is skipped, the fields behind the index will become invalid. If the first field is not used in the query condition, the joint index will not be used

  3. Primary key insertion order
    If the inserted primary key value is suddenly large or small, it may cause page splits and record shifts.

  4. Calculations and functions will cause index failure

  5. Type conversion (automatic or manual) causes index failure

  6. RangeThe column index on the right side of the condition is invalid

  7. is not equal to (≠ or <>) index invalid

  8. is null can use index, is not null index is invalid

  9. LIKE starts with wildcard % index invalid

  10. ORThe non-indexed column indexes before and after are invalid.

  11. The character sets of databases and tables use utf8mb4 uniformly.

Related query optimization

For left outer join

SELECT * FROM `type` LEFT JOIN book ON type.card = book.card;

type is equivalent to the driving table, book is equivalent to the driven table
If you can only add an index to one field in a left join, it must be added to the driven table. The reason is that there must be one on the left side of the left join. The key lies in how to search for rows from the right table.

For inner joins

It is up to the query optimizer to decide who appears as the driving table and which appears as the driven table.
If the join condition of a table can only have one field with an index, the table where the indexed field is located will be the driven table.
When both tables have indexes, the small table will be selected as the driving table

JOIN statement principle

Simple nested loop connection SNLJ
Index Nested Loop Join INLJ

The optimization idea is to reduce the number of matches of inner table data, so the requirement is that there must be an index on the driven table

Block nested loop connection BNLJ

If there are no indexes in the driven table, the number of scans is too many and the number of IOs is also high.
The optimization idea is to reduce the number of IOs of the driven table and obtain the data of the driver table piece by piece. The join buffer is introduced to buffer some data columns related to the driver table into the join buffer, and then the driven table is fully scanned. Each record in the driven table is matched with all driver table records in the buffer at once, which reduces the cost Frequency of access to the driven table

Hash Join

It is a common way to connect large data sets. The optimizer creates a hash value in the memory for a relatively small table, and then scans the larger table and detects the hash value to find out the difference with the Hash table. matching rows

  1. This method is suitable for situations where the smaller table can fit entirely in memory, so that the total cost is the sum of the costs of accessing the two tables.

  2. If the table is too large to be completely put into the memory, the optimizer will divide it into several different partitions, and the part that cannot be put into the memory will be written to the temporary segment of the disk.

  3. It can work well in large tables without indexes and parallel query environments, and provides good performance.

  4. It can only be applied to equal value connections, which is determined by the characteristics of hash.

Subquery optimization

Concept: The result of a SELECT query serves as the condition of another SELECT statement. Nested queries of query statements can be implemented using subqueries.

The reasons why the subquery execution efficiency is not high: 1. MySQL needs to create a temporary table for the query results of the inner query statement, and then the outer query statement queries the records from the temporary table, and then Drop these temporary tables. This will consume too much CPU and IO resources and generate a large number of slow queries.
2. The temporary table stored in the subquery result set will not have an index, so query performance will be affected.
3. For subqueries that return larger results, their impact on query performance will be greater.

Use join query instead of subquery

SELECT * FROM student stu
WHERE stu.stuno IN (
  SELECT monitor FROM class c
  WHERE monitor IS NOT NULL
);

SELECT stu.* FROM student stu
JOIN class c ON stu.stuno=c.monitor
WHERE c.monitor IS NOT NULL
SELECT stu.* FROM student stu
WHERE stu.stuno NOT IN (
  SELECT monitor FROM class c
  WHERE monitor IS NOT NULL
);

SELECT stu.* FROM student stu
LEFT OUTER JOIN class c ON stu.stuno=c.monitor
WHERE c.monitor IS NOT NULL

Sort optimization

The reason why indexes are used for sorting: Indexes can ensure the orderliness of data and are more efficient. Filesort sorting is generally performed in memory and takes up more CPU. If the results to be sorted are large, temporary files will be IO to disk for sorting. , lower efficiency.

  1. Add LIMIT filter conditional index to be effective

  2. Ensure that field order index is valid

  3. The ascending and descending order are consistent or the order is inconsistentThe index is valid

filesort algorithm: two-way sorting and single-way sorting

Two-way sorting: scan the disk to read the order column, sort in the buffer, and then read other fields from the disk according to the sorted list

Single-pass sorting: Read all columns from disk, sort them according to order in the buffer, and then output. It is more efficient, avoids reading data a second time, and turns random IO into sequential IO, but it will use more space.

Group optimization

Similar to sorting optimization

Paging optimization

  1. Complete the sorting and paging operation on the index, and then associate it back to the other column contents required by the original table query based on the primary key.

  2. Convert LIMIT query to query at a certain location

Prioritize covering indexes

Concept: The indexed fields cover the fields involved in the query conditions. The leaf nodes of the index store the required data, and the required data can be obtained by reading the index No need to return the table

Benefits: Avoid table backing; can turn random IO into sequential IO

Disadvantages: Index field maintenance comes at a cost

Index push down ICP

It is a new feature of MySQL 5.6. It is an optimization method that uses indexes to filter data at the storage engine layer. ICP can reduce the number of table returns and the number of times the MySQL server accesses the storage engine.

After ICP is enabled, if the WHERE condition can be filtered using the columns in the index, the MySQL server will use the index entries to filter the data for this part of the condition, and finally read the data back to the table.

Conditions of use

  1. Can only be used for secondary indexes

  2. The type values in the execution plan displayed by EXPLAIN are range, ref, eq_ref and ref_or_null

  3. Only the WHERE condition field in the index column can be filtered by ICP

  4. ICP can be used with MyISAM and InnoDB storage engines

  5. ICP optimization methods are not supported when SQL uses covering indexes

  6. Correlated subquery conditions cannot use ICP

Other query optimization strategies

  1. The difference between EXISTS and IN
    SELECT * FROM A WHERE cc IN (SELECT cc FROM B), whichever table is smaller will be used to drive it. If table A is smaller, EXISTS will be used. If table B is smaller, IN will be used.

  2. Efficiency of COUNT(), COUNT(1) and COUNT(specific fields)
    · COUNT(
    ) and COUNT(1) both count all results. There is essentially no difference. If there is a WHERE clause, all qualified data rows will be counted. If Without the WHERE clause, the number of data rows in the data table is counted.
    · If it is MyISAM, the number of rows in the statistical data table only requires O(1) complexity, because each data table has a meta information that stores the row_count value. Its consistency is guaranteed by table-level locks; if it is InnoDB, because it supports transactions and uses row-level locks and MVCC mechanisms, it cannot maintain a row_count variable, so it needs to scan the entire table, which is O(n) The complexity of
    · In InnoDB, if you use COUNT (specific fields) to count the number of specific data rows, try to use secondary index. Because the clustered index contains a lot of information. For COUNT(*) and COUNT(1), there is no need to search for specific rows. It just counts the number of rows. The system will automatically use a secondary index with smaller space to count.

  3. SELECT(*)
    Try not to use **, because needs to be converted to all column names by querying the data dictionary, which will consume resources and time; and covering indexes cannot be used

  4. The impact of LIMIT 1 on optimization
    For full scan, if it is determined that there is only one result set, and LIMIT 1 is added, it will not scan again when a result is found, which will speed up the query. If a unique index has been established for the field, a full scan will not be performed and there is no need to add LIMIE 1.

  5. Use COMMIT more
    Program performance will be improved and demand will be reduced due to the resources released by COMMIT. The resources released by COMMIT include: information used to restore data on the rollback segment, locks obtained by program statements, space in the redo/undo log buffer, and the internal costs of managing the above three resources.