8 Index 2 in mysql

1. Types of indexes

1. B + tree index
1.**Each index is a B + tree**, the secondary index does not contain all the data of the row records
2. In addition to the key value, the leaf node also contains a bookmark in the index row of each leaf node.
3.B+ balanced tree is a search tree. The leaf nodes of the B+ tree are used to store data, and all leaf nodes are on the same level.

What data should be placed in leaf nodes?

1. The index must be placed naturally, because the purpose of the B + tree is to quickly retrieve data
2. The table data in the database is the data we really need, and the index is just auxiliary data.
3.B + Tree is implemented based on B Tree and sequential access pointers of leaf nodes. It has the balance of B Tree and improves the performance of interval queries through sequential access pointers.

2. Clustered index/clustered index
1.InnoDB uses a clustered index, which uses the primary key of the table to construct a B+ tree, and stores the row record data of the entire table in the leaf nodes of the B+ tree. That is to say, the index is the data, and the data is the index.
2. Since the clustered index is built using the primary key of the table, each table can only have one clustered index.
3. The leaf nodes of the clustered index are the data pages. In other words, the data page stores a complete record of each row. Therefore, one advantage of the clustered index is that the complete row of data can be obtained through the clustered index. Another advantage is that sort searches and range searches for primary keys are very fast.
4. What if we do not define a primary key? MySQL will use a unique index. Without a unique index, MySQL will also create an implicit column RowID as the primary key, and then use this primary key to build a clustered index.
3. Auxiliary index/secondary index/non-clustered index

Every time an index is created, there is a B+ tree, and the leaf nodes do not contain all the data of the row record

1. The clustered index can only work when the **search condition is the primary key value**, because the data in the B + tree is sorted according to the primary key
2. What if we want to use other columns as search criteria? We generally create multiple indexes, which are called auxiliary indexes/secondary indexes.
3. In addition to the key value, the leaf node also contains a bookmark in the index row of each leaf node. This bookmark is used to tell the InnoDB storage engine where to find the row data corresponding to the index. Therefore, the bookmark of the auxiliary index of the InnoDB storage engine is the clustered index key of the corresponding row of data.

4. Return to form
1. The existence of auxiliary indexes does not affect the organization of data in the clustered index, so each table can have multiple auxiliary indexes.
2. When looking for data through the auxiliary index, the InnoDB storage engine will traverse the auxiliary index and obtain the primary key pointing to the primary key index through the leaf-level pointer, and then find a complete row record through the primary key index (clustered index). This process is also called table return.
That is to say, querying a complete user record based on the value of the auxiliary index requires the use of two B+ trees - one auxiliary index and one clustered index.


Why do we still need a table return operation?

1. If you put the complete user records in the leaf nodes, you don’t need to return the table, but it takes up too much space. It is equivalent to copying all the user records again every time you build a B + tree. This is It's a bit of a waste of storage space. Moreover, every change to the data must be modified in all indexes containing the data, and the performance is also very low.
2. Obviously, the fewer records that need to be returned to the table, the higher the performance improvement. The more records that need to be returned to the table, the lower the performance of using secondary indexes. Some queries would even rather use full table scans than secondary indexes. level index.

When to use full table scan, and when to use secondary index + table return to execute queries?

1. This is what the query optimizer does. The query optimizer will calculate some statistical data for the records in the table in advance, and then use these statistical data to calculate the number of records that need to be returned to the table according to the query conditions. It is necessary The greater the number of records returned to the table, the more likely it is to use a full table scan, and conversely, the more likely it is to use a secondary index + table return method.
5. Joint index/composite index

In the above description, there is a hidden condition, that is, there is only one field to build the index. In the actual production environment, the index fields to be built are multiple fields.
Combining multiple columns on the table for indexing is called joint index or composite index

Note:

1. Creating a joint index will only create one B + tree
2. If multiple columns are indexed separately, a B+ tree will be created for each column. There will be several B+ trees for as many columns as there are.

Example: Introducing the Best Left Prefix rule
If it is index(note,b), it contains two meanings in index construction.

1. First sort each record according to the note column
2. When the note columns of the records are the same, use column b for sorting.
//From the principle, we can see why there is an optimal left prefix rule. This is the reason

6. Covering index/index coverage

That is, the queried records can be obtained from the auxiliary index without returning to the table.

7. Hash index
1. The number of B + tree searches depends on the height of the B + tree. In a production environment, the height of the B + tree is generally 3 or 4 layers, so 3 or 4 IO queries are required.
2. Therefore, the InnoDB storage engine monitors the index table itself. If an index is monitored to be frequently used, it is considered to be hot data, and then a hash index is created internally, which is called an adaptive hash index.
3. After creation, if the index is queried next time, the address of the record can be directly deduced through the hash algorithm, and the data can be found directly in one go, which is more efficient than repeatedly querying the node in the B + tree index three or four times. Quite a few.
//Note that the adaptive hash index is only created and used by the database itself, and we cannot interfere with it.
4. Hash indexes can only be used to search for equivalent queries, such as SELECT* FROM table WHERE index co=xxx. For other search types, such as range search, hash index cannot be used.
8. Full-text index (this is not actually used)
1. Technology to find any content information in the entire book or article stored in the database.
2. It can obtain relevant information on chapters, sections, paragraphs, sentences, words, etc. in the full text as needed.
3. Various statistics and analysis can also be performed. The well-known Elasticsearch, Solr, etc. are full-text search engines.

Inverted index
All keywords contained in the document are extracted and processed, then the correspondence between the keywords and the document is saved, and finally the keywords themselves are indexed and sorted.
When users search for a certain keyword, they first search the index of the keyword, and then find the document through the corresponding relationship between the keyword and the document.

1. Starting from InnoDB version 1.2.x, the InnoDB storage engine begins to support full-text retrieval, and the corresponding MySQL version is the 5.6.x series.
2. However, MySQL is a relational database from the beginning. Although the storage engine supports full-text retrieval, the overall architecture does not support full-text retrieval well and has many restrictions. For example, each table can only have one index for full-text retrieval. If it is not supported, there is no index. Language of word delimiter (delimiter), such as Chinese, Japanese, Korean, etc.

2. Use of index

1. Use of index in query
1. An index is a B+ tree. The index allows our query to quickly locate and scan the data records we need, speeding up the query.
2. A select query statement can generally use at most one secondary index during execution, even if multiple secondary indexes are used in the where condition.
2. Index creation strategy->Keep the column type as small as possible
1. The type size mentioned here refers to the size of the data range represented by the type.
2. Because the smaller the data type, the faster the comparison operation during query
3. The smaller the data type, the less storage space the index takes up, and more records can be placed in one data page, thereby reducing the performance loss caused by disk /0, which means that more data can be Pages are cached in memory, thus speeding up read and write efficiency

This advice is more applicable to the primary key of the table.

3. Index creation strategy -> Index selectivity
1. When creating an index, columns with high selectivity/discreteness should be selected.
2. Index selectivity/discreteness refers to the ratio of unique index values (also called cardinality) to the total number of records in the data table (N), ranging from 1/N to 1. The higher the selectivity of the index, the higher the query efficiency, because a highly selective index allows MySQL to filter out more rows when searching.
3. The selectivity of the unique index is 1, which is the best index selectivity and the best performance.
4. Prefix index
1. For blob, text, and very long varchar fields, mysql does not support indexing their entire length, and a prefix index needs to be established.
//shortcoming
2. Prefix index is an effective way to make the index smaller and faster, but on the other hand, it also has its shortcomings. MySQL cannot use prefix index for ORDER BY and GROUP BY, nor can it use prefix index for coverage scan.
3. Sometimes a suffix index is also useful (for example, finding all email addresses for a domain name)

MySQL does not natively support inverted indexes, but you can reverse the string and store it, and build a prefix index based on this. Indexes can be maintained through triggers or by the application itself.

5. Create indexes on columns used for searching, sorting or grouping
1. That is to say, create indexes only for columns that appear in the WHERE clause and join columns in the join clause.
6. Multi-column index

Index column order:

1. The correct order depends on the query using the index, and also needs to consider how to better meet the needs of sorting and grouping.
2. In a multi-column B-Tree index, the order of the index columns means that the index is sorted by the leftmost column first, then the second column, and so on.
3. Therefore, the index can be scanned in ascending or descending order to meet the query requirements of ORDER BY, GROUP BY, and DISTINCT clauses that accurately comply with the column order.

Rule of thumb for multi-column indexes:

1. Put the column with the highest selectivity at the front of the index.
  When sorting and grouping don't need to be considered, it's often good to put the most selective columns first.
  At this time, the role of the index is only to optimize the search for WHERE conditions.
7. Samsung Index

The following conditions are met to be considered as three stars:

1. Indexes that put related records together get one star (27% weight)
2. If the order of the data in the index is consistent with the order in the search, you will get two stars (sorting stars) (proportion of 27%)
3. If the columns in the index contain all the columns required in the query, you will get three stars (wide index stars) (50% weighting)

One star:
One star means: if the index rows related to a query are adjacent or at least close enough, the width of the index slice that must be scanned will be shortened to the shortest, that is, the index slice should be made as narrow as possible, that is, we The smaller the scan range of the index, the better.
Two stars (sorted stars):
In the case of satisfying one star, when the query needs to be sorted, group by, order by, if the order required by the query is consistent with the index (the index itself is ordered), is it not necessary to sort separately? Generally Sorting is a key factor affecting performance.
Three stars (wide index stars):
When two stars are met, if the index contains all the columns required by the query (including the columns required in the where clause and select clause, that is, the covering index), then the query will no longer When it is necessary to return the table, the query steps and the number of IO requests are reduced, and the performance can be almost doubled.

Practical practice in designing Samsung index

CREATE TABLE customer (
cnoINT,
lname VARCHAR (10),
fname VARCHAR (10),
sex INT,
weight INT,
city VARCHAR (10)
);
CREATE INDEX idx_cust ON customer (city, lname, fname, cno);
-- For the following query, it is Samsung index
select cno,
       fname
  from customer
 where lname=’xx’
   and city =’yy’
   order by fname;

explain:
The first star: All the columns of equal value predicates are the first columns of the combined index, which can narrow the index slice and comply with it.
Second star: The fname field of order by is in the combined index and the index is automatically sorted, which is consistent.
Third star: The cno field and fname field in select exist in the combined index and are consistent.