mysql series – super detailed index explanation (below)

The difference between primary key and unique index:

There can be only one primary key for a table, but multiple unique indexes can be created.
Primary keys can act as foreign keys to other tables. (The foreign key is not necessarily the primary key, as long as it is unique; the primary key and foreign key of SQL are used as constraints. When inserting a non-null value, if there is no such value in the primary key table, it cannot be inserted. There can be Composite primary key, but cannot have more than one primary key.)
Primary keys cannot be null, and unique indexes can be null.

Hash index:

? Based on the hash table implementation, only the memory engine supports the hash index (using the chain address method to resolve hash conflicts), InnoDB has a special function **”adaptive hash index”**, when certain index values are References are so frequent that it creates a hash index on top of the B-Tree based index in memory.

? Hash index search speed is very fast, but the hash index data is not stored in the order of the index, so it cannot be used for sorting; only equivalent search is supported, and range search is not supported (such as greater than how much); if the hash conflicts Many (such as low selectivity), when searching or deleting a row, it is necessary to traverse each row of the linked list corresponding to the hash value, the more conflicts, the greater the cost

Index creation

Create a simple index on the table. Duplicate values are allowed:

CREATE INDEX index_name
ON table_name (column_name)

Create a unique index on the table. A unique index means that no two rows can have the same index value.

CREATE UNIQUE INDEX index_name
ON table_name (column_name)

If you wish to index more than one column, you can list the column names in parentheses, separated by commas:

CREATE INDEX PersonIndex
ON Person (LastName, FirstName)

Creating an index can also be achieved with alter

InnoDB clusters according to the primary key. If no primary key is defined, InnoDB will try to use a unique non-empty index instead.

If there is no such index, InnoDB will define a hidden primary key (row_id size is 6 bytes) and then aggregate on it. mysqlClustered indexes cannot be created manually.

A primary key index is a special kind of unique index that does not allow null values. Generally, the primary key index is created at the same time when the table is built.

Implementation of the index (After learning the data structure, you can go into more detail)

Video material

? Generally speaking, The index itself is too large to be stored in memory, so the index is often stored on the disk in the form of an index file. In this case, the disk I/O consumption will be generated during the index search process. Compared with the memory access, the I/O access consumption is several orders of magnitude higher, and the height of the B-Tree is low (multi-fork tree), which can reduce the number of I/O

? The data file of InnoDB (clustered index) is itself an index file (the index and data are stored in one file idb). From the above, the MyISAM (non-clustered index) index file (MYI) and data file (MYD) are separated, and the index file only saves the address of the data record.

? Each table in mysql has a clustered index (clustered index), Every non-clustered index on other tables is a secondary index (ordinary index, unique index) , also known as secondary indexes.

Make a difference

? MyISAM engine uses B + Tree as the index structure, and the data field of the leaf node stores the address of the data record. The indexing method of MyISAM is also called “non-clustered”.

? The left picture of MyISAM is the main index, and the right picture is the auxiliary index (secondary index). There is no difference in structure between the two, both are B + trees.

Although InnoDB also uses B+Tree as the index structure, the specific implementation method is quite different from MyISAM.

InnoDB’s upper figure is the main index, and the lower figure is the auxiliary index. The auxiliary index structure is also a B + tree

? The first major difference is that InnoDB’s data files are themselves index files. As we know from the above, the MyISAM index file and data file are separated, and the index file only saves the address of the data record. In InnoDB, the table data file itself is an index structure organized by B + Tree, **The leaf node data field of this tree stores complete data records. **The key of this index is the primary key of the data table, so the InnoDB table data file itself is the primary index.

? The second difference from the MyISAM index is that InnoDB’s auxiliary index data field stores the value of the corresponding record’s primary key instead of the address. In other words, all of InnoDB’s secondary indexes refer to the primary key as the data field.

? Here, the ASCII code of English characters is used as the comparison criterion (sorting). The implementation of the clustered index makes the search by the primary key very efficient, but the auxiliary index search needs to retrieve the index twice:first retrieve the auxiliary index to obtain the primary key, and then use the primary key to retrieve the records in the primary index.

? Since the actual data pages can only be sorted according to a B + tree, each table can only have one clustered index.

Joint index (combined index) and covering index

? Joint index is also called compound index. For composite indexes: Mysql uses the fields in the index from left to right, and a query can only use part of the index, but only the leftmost part. For example, the index is key index (a,b,c). It can support three combinations of a | a,b| a,b,c for searching (does not include several types that can be used by the optimizer) , but does not support b, c to search. When the leftmost field is a constant reference, the index is very effective. conform to the leftmost principle

? Joint index implementation: Each node contains multiple keywords, and sorting is performed according to the order of multiple keywords. And this order is the order when you create the index

? If you often use multi-condition queries with multiple fields, you can consider building a joint index. Creating a joint index is equivalent to creating multiple indexes

? The joint index sql will first filter out the records with last_name that meet the conditions, and then filter the records with first_name that meet the conditions. Then if we create two column indexes on last_name and first_name respectively, the processing method of mysql is different. It will choose the most stringent index for retrieval, which can be understood as the index with the strongest retrieval ability for retrieval. In addition One cannot be used, so the effect is not as good as a multi-column index. Although there are two single-column indexes at this time, MySQL can only use the one that it thinks seems to be the most efficient single-column index. If you often use a single column as a query condition, you should use a single-column index. (If there are two single-column indexes a and b, only use a or only b when querying)

? Building an index on multiple columns is more advantageous than building an index on each column separately, because the more indexes are built, the more disk space will be occupied, and the speed will be slower when updating data. In addition, when creating a multi-column index, the order also needs to be paid attention to. You should put the strict (highly selective) index first, so that the screening will be stronger and more efficient

? Covering index must be this joint index covering all the data you need in select! That is, the data columns of select can be obtained only from the index, without reading from the data table. In other words, the query column must be covered by the index used (an index contains the values of all fields that need to be queried).

? Covering indexes are especially useful for innodb tables, because innodb is a clustered cache. innodb’s secondaryindex saves the value of the primary key in the leaf node, so it covers the secondary index of the query and avoids another index lookup in the primary key

? There is basically no difference between a covering index and a joint index. The covering index is only a joint index specific to a specific select quotation. That is to say, for a select statement, a joint index can directly obtain the query result through the index without going back to the table for query. It is said that the joint index covers the select statement.

? Index coverage means that the indexed field is exactly the field involved in the coverage query condition. It should be noted here that it must be covered from the first one, such as

index field	condition field	Is there coverage
a,b,c	a,b	covered
a,b,c	b,c	Not covered

? Benefits of the covering index: If the secondary index is a covering index, it avoids a lookup of the clustered index, and the size of the covering index only contains the required data, while the clustering index contains all the data, and the covering index can be better placed into memory.

Possible interview questions about indexing (supplement)

Question 1: The underlying data structure of MySQL

The MyISAM engine index file and data file are separated, and the InnoDB engine index data file itself is an index file (clustered index).

Question 2: Why must an InnoDB table have a primary key, and it is recommended to use an integer auto-incrementing primary key

InnoDB aggregates data through the primary key. If there is no primary key, InnoDB will choose a unique non-null index. If there is no unique non-null index, InnoDB implicitly defines a primary key as a clustered index.
The self-incrementing primary key can ensure the order of data increase. If it is not increased in order, it will cause the consumption of data retrieval, page splitting and order maintenance of B + Tree leaf nodes due to insertion, and the order increase only needs to be in B + The last leaf node of Tree can be added.

Question 3: Why does the leaf node of the non-primary key index structure store the primary key value

This design ** reduces the maintenance cost of the secondary index when row movement or page splitting occurs, ** but requires two index lookups when the secondary index is accessed.