Thoughts on MySQL optimization [storage engine, index]

For SQL optimization, today we first look at the basic knowledge:

  • ? Storage engines. Understanding the characteristics and applicable scenarios of MySQL’s different storage engines can help you make wise decisions in database design and application development.
  • ? Index, when the table does not have an index, querying the data may be a full table query; when the index is created, the index is first searched, and the data is retrieved based on the index, which improves the efficiency of data acquisition. How you build the index is critical.

1 Storage Engine

1.1 MySQL Architecture Layer

  • ? Connection layer, which is the client and connection service, mainly completes tasks such as connection processing, authorization authentication, etc.
  • ? The service layer is responsible for completing most core service functions, such as SQL interfaces, cached queries, SQL analysis and optimization, and the execution of some built-in functions. All cross-storage engine functions are also implemented in this layer, such as procedures, functions, etc.
  • ? Engine layer, the storage engine is responsible for data storage and retrieval, and the server communicates with the storage engine through API. Different storage engines have different functions. You can choose the appropriate storage engine according to your own needs.
  • ? Storage layer, stores data on the file system and completes interaction with the storage engine

Picture

1.2 Introduction to storage engines

The storage engine is the implementation of technologies such as storing data, creating indexes, and updating/querying data. The storage engine is based on tables and can be specified in the dimensions of the table.

View the storage engine of the current table

--View the table creation statement and storage engine. The default is InnoDB.
show create table emp;

Specify storage engine when creating a new table

create table table name(
  Field field type [comment field comment]
  ...
  
)engine=innodb [comment table comment];

Check the storage engines supported by the current database

show engines;

1.3 Storage engine features

1.3.1 InnoDB

  • introduce

    • InnoDB is a general-purpose storage engine that takes into account high reliability and high performance. After MySQL5.5, it serves as the default storage engine of MySQL.
  • Features

    • DML operations (add, delete, modify) follow the ACID model and support transactions
    • Row-level locks to improve concurrent access performance
    • Support foreign key constraints to ensure data integrity and correctness
  • document

    • xxx.idb: xxx represents the table name. Each table of the InnoDB storage engine will correspond to such a table space file, which stores the table structure (frm, sdi), data and indexes of the table.

Picture

Picture

1.3.2 MyISAM

  • introduce

    • MyISAM is the early default storage engine of MySQL
  • Features

    • Does not support transactions, does not support foreign keys

    • Supports table locks, but does not support row locks

    • Fast access

  • document

    • xxx.sdl: stores table structure information
    • xxx.MYD: store data
    • xxx.MYI: Storage index

1.3.3 Memory

  • introduce
    • The table data of the Memory engine is stored in the memory. Due to hardware problems or power outages, these tables can only be used as temporary tables or cache tables.
  • Features
    -Memory storage
    • hash index
  • document
    • xxx.sdi: stores table structure information

1.4 Storage engine selection

When selecting a storage engine, you need to select an appropriate storage engine based on the characteristics of the application system. For complex application systems, you can also choose a combination of multiple storage engines based on the actual situation.

  • InnoDB: It is the default storage engine of MySQL and supports transactions and foreign keys. If the application has relatively high requirements for transaction integrity and requires data consistency in concurrent scenarios, and data operations include many update and delete operations in addition to insertion and query, all InnoDB storage engines are good choices.
  • MyISAM: If the application is mainly based on read operations and insert operations, there are only a few update and delete operations, and the requirements for transaction integrity and concurrency are not very high. But the actual scenario will use NoSQL
  • Memory: Save all data in memory, with fast access speed. It is usually used for temporary tables and caches. The disadvantage is that there is a limit on the size of the table. Tables that are too large cannot be cached in memory, and data security cannot be guaranteed. Using NoSQL (Redis) in actual scenarios

2 Index

Index is a data structure (ordered) that helps MySQL obtain data efficiently. When the table does not have an index, the query data may be a full table query; after the index is created, the index is first searched and the data is retrieved based on the index, which improves the efficiency of data acquisition.

2.1 Advantages and Disadvantages of Index

td>

Advantages Disadvantages
Improve the efficiency of data retrieval and reduce the cost of database IO cost The index column also takes up space
Sort the data through the index column to reduce the cost of data sorting and reduce CPU consumption The index greatly improves the query efficiency, and also reduces the speed of updating the table. For example, when inserting, updating, and deleting the table, the efficiency is reduced

2.2 Index structure

MySQL’s index is implemented at the storage engine layer, and different storage engines have different structures.

  • B + Tree index is a common index type. Most storage engines support B + tree index, such as the common InnoDB, MyISAM, and Memory storage engines.
  • Hash index, the underlying data structure is implemented using a hash table. Only queries that exactly match the index are valid. Range queries are not supported.
  • R-tree (spatial index), spatial index is a special index type of the MyISAM engine, mainly used for geospatial data types, rarely used
  • Full-tree (full-text index) is a way to quickly match documents by building an inverted index, similar to Lucene, Solr, and ES

Binary tree data structure

The leaf node on the left is smaller than the parent node

Picture

When the binary tree is inserted sequentially, a linked list will be formed, and the query performance will be greatly reduced. In the case of large amounts of data, the hierarchy will be deep and the retrieval speed will be slow. Red-black trees and balanced binary trees can solve the problem of forming a linked list, but there are still problems with deep levels and slow retrieval speed when there is a large amount of data.

B-Tree (multi-path balanced search tree) Take a b-Tree with a maximum degree (max-degree) of 5 (order 5) as an example (each node stores up to 4 keys and 5 pointers), which can be found at https View the B-Tree data structure at ://www.cs.usfca.edu/~galles/visualization/BTree.html

picture

Features of B+Tree

  • All elements will appear in leaf nodes
  • The paging nodes above only serve as indexes, and the leaf nodes are used to store data.
  • All leaf nodes form a one-way linked list

Take a B + Tree with a maximum degree of 4 as an example,

The data structure can be viewed at https://www.cs.usfca.edu/~galles/visualization/BPlusTree.html

Picture

The MySQL index data structure optimizes the classic B + Tree. Based on the original B + Tree, a linked list pointer pointing to adjacent child nodes is added to form a sequential B + Tree, which improves the efficiency of interval access. performance.

Hash structure, hash index uses a certain hash algorithm to replace the key value with a new hash value, map it to the corresponding slot, and then store it in the hash table

  • Hash index can only be used for peer comparison (=, in) and does not support range query (between, >, <,...)
  • Unable to use index to complete sort operation
  • Query efficiency is high, usually only one retrieval is required, and the efficiency is usually higher than B + Tree index

Why InnoDB storage engine uses B+Tree index structure

  • Compared with binary trees, there are fewer levels and high search efficiency.
  • For B-Tree, data will be saved regardless of whether it is a leaf node or a non-leaf node. This will reduce the key values stored in a page and the pointers will also be reduced. To save a large amount of data, you can only increase the height of the tree, resulting in reduced performance.
  • Compared with Hash index, B + Tree supports range matching and sorting operations

2.3 Index classification

  • Primary key index, an index established for the primary key in the table, can only have one, primary key
  • Unique index, there can be multiple, unique
  • Regular indexes, there can be multiple, to quickly locate specific data
  • Full-text index, full-text search searches for keywords in the text, rather than comparing values in the index. There can be multiple, fulltext

Classification in InnoDB

  • Clustered index puts data storage and index together. The leaf nodes of the index structure store row data. There must be one and only one
  • Secondary Index (Second Index) stores data and indexes separately. The leaf nodes of the index structure are associated with the corresponding primary keys, and there can be multiple

Clustered index selection rules

  • If a primary key exists, the primary key index is a clustered index
  • If there is no primary key, the first unique index will be used as the clustered index
  • If there is no primary key and no suitable unique index, InnoDB will automatically generate a rowid as a hidden clustered index

The difference between clustered index and secondary index

  • Clustered index hangs row data
  • The secondary index is linked to the id of the data.

Picture

Return table query

  • Secondary index to query the ID
  • Query the corresponding data through clustered index through ID

Picture

Example:

--In the user table, id is the primary key, and an index is added to name
– This query can be queried directly using the clustered index (id as the primary key)
select * from user where id = 10;

– This query needs to use the secondary index first, then return to the table and use the clustered index query

select * from user where name = Xiao Ming’;

How high is the B+Tree height of InnoDB primary key index?

Assume that the size of a row of data is 1k, and 16 rows of such data can be stored in one page. InnoDB pointers occupy 6 bytes of space, and even if the primary key is bigint, the number of bytes occupied is 8. Since the size of each page is 16K.

  • The height is 2: that is, the stored row data is 117116 = 18736, n*8 + (n + 1)6=161024, and the calculated n is about 1170
    • That is, the stored row data is 1171*16 = 18736
  • The height is 3: 1171117116=21939856
  • If the data in the table exceeds tens of millions, you need to consider optimization methods such as sub-database and sub-table.

2.4 Index syntax

Create index

create [unique|fulltext] index index_name on table_name(Field 1, Field 2,...);

View index

show index from table_name;

Delete index

drop index index_name on table_name;
-- Create database
CREATE TABLE `user` (
  `id` int NOT NULL AUTO_INCREMENT,
  `name` varchar(20) NOT NULL,
  `age` int NOT NULL,
  `city` varchar(20) DEFAULT NULL,
  PRIMARY KEY (`id`)
);

--Create index
create index index_name on user(name);

-- View index
show index from user;

--Create index
create index index_name_age_city on user(name,age,city);

-- Delete index
drop index index_name_age_city on user;

2.5 Rules for using indexes

2.5.1 Leftmost Prefix Rule

If multiple columns are indexed (joint index), the leftmost prefix rule must be followed. The leftmost prefix rule means: the query starts from the leftmost column of the index and does not skip columns in the index. If a column is skipped, the index will be partially invalid (the subsequent field indexes will be invalid)

For example, in the user table, the index created is id + name + age

  • Situations where indexes can be used
    • The column to be queried is id
    • The column to be queried is id + name
  • When multiple indexes cannot be used
    • The column to be queried is name + age

Note: After testing, it was found that the indexes on the MySQL8.0 version did not fail! ! !

2.5.2 Range query

In the joint index, a range query (<, >) occurs, and the column index on the right side of the range query is invalid.

--That is, the column index of city will be invalid.
explain select * from user where name='Xia Ming' and age > 30 and city ='Beijing';

-- In queries >= and <=, the city column index will not be invalid.
explain select * from user where name='Xia Ming' and age >= 30 and city ='Beijing';

2.5.3 Operations on index columns will invalidate the index

Do not perform operations on index columns, the index will become invalid.

create index index_age on user(age);


explain select * from user where substr(user.age,1,1)='3';

2.5.4 If the string is not quoted, the index will be invalid

When using string type fields without quotation marks, the index will be invalid.

create index index_age on user(name,age,status);

-- Indexes can be used
explain select * from test.user where name='Xia Ming' and age = 30 and status ='1';

-- Some indexes are invalid
explain select * from test.user where name='Xia Ming' and age = 30 and status =1;

2.5.5 Fuzzy query

If there is only a tail fuzzy match, the index will not be invalidated. If it is a header fuzzy match, the index will be invalid.

create index index_name on test.user(name);

explain select * from test.user where user.name like 'Zhang%';

-- index invalid
explain select * from test.user where user.name like '%三';

-- index invalid
explain select * from test.user where user.name like '%三%';

2.5.6 Conditions for or connection

For query conditions separated by or, if there is no index on one side of or (there is one in the front but not in the back; there is no one in the front and one in the back), the index will be invalid.

-- Only id is the index. When querying using or connection, the index will be invalid.
explain select * from test.user where id = 1 or name='Xia Ming';

2.5.7 Impact of data distribution

If MySQL evaluates that using all will be slower than a full table scan, the index will not be used

create index index_age on test.user(age);

-- Since age is greater than 10 in all data,
explain select * from test.user where age >= 10;


-- After modifying the conditions, the index is used
explain select * from test.user where age >= 30;

2.5.8 Prompt the engine

Prompts the storage engine which index to use:

  • use index
  • ignore index
  • force index
-- Create 2 indexes
create index index_name_age_city on user(name,age,city);

create index index_name on user(name);

explain select * from test.user where name='Xia Ming';

-- It is recommended to use index
explain select * from test.user use index (index_name) where name='Xia Ming';
-- Ignore using index
explain select * from test.user ignore index (index_name) where name='Xia Ming';
-- Force the use of indexes
explain select * from test.user force index (index_name) where name='Xia Ming';

2.5.9 Covering index

Try to use covering indexes (the query uses indexes, and the queried fields contain index fields), and reduce the use of information in select(*) Extra

  • using index condition: The index is used in the query, but the data needs to be returned to the table.
  • using where; using index: The index is used in the query, and the fields to be searched are all on the index. There is no need to return the table to query the data.

2.5.10 Single column index and joint index

  • Single column index: that is, an index contains only a single column
  • Joint index: that is, an index contains multiple columns

In actual business scenarios, if there are multiple queries, when considering creating an index for the query field, it is recommended to establish a joint index. When querying with multiple conditions, the MySQL optimizer will evaluate which field has a higher indexing efficiency and select that index to complete the query. Inquire

2.6 Index design principles

  1. Create indexes for tables with large amounts of data and frequent queries
  2. Create indexes for fields that are often used as query conditions (where), sorting (order by), and grouping (group by) operations
  3. Try to choose columns with high differentiation as indexes, and try to build unique indexes. The higher the differentiation, the more efficient the use of indexes.
  4. If it is a string type field and the length of the field is very long, you can create a prefix index based on the characteristics of the field.
  5. Try to use joint indexes and reduce singleton indexes. When querying, joint indexes can often cover the index, save storage space, avoid table backs, and improve query speed.
  6. It is necessary to control the number of indexes. The more indexes, the better. The more indexes, the greater the cost of maintaining the index structure, which will affect the efficiency of additions, deletions and modifications.
  7. If the index column cannot store null values, the not null constraint will be used when creating the table. When the optimizer knows whether each column contains a null value, it can better determine which index is most efficient.