MySQL5: MySQL data storage file; the index and data in MylSAM are two independent files. How does it find the data through the index? Clustered index/clustered index, why does the secondary index in InnoDB not store addresses but key values? How to understand row ID?

MySQL data storage file
How to implement different storage engine indexes in MySQL
- MYlSAM
- - The index and data in MylSAM are two independent files. How does it find the data through the index?
- InnoDB
- - There are multiple indexes in InnoDB. Will each index put a copy of the complete record in the leaf node?
  - Which index leaf node should be placed for complete records in InnoDB?
  - clustered index/clustered index
  - In other indexes (secondary indexes) in InnoDB, how can I retrieve complete data if there is no data on the leaf nodes?
  - return table
  - Why does the secondary index in InnoDB store key values instead of addresses?
  - What should I do if a table does not have a primary key in InnoDB? Which index leaf node is the complete record placed on? Or, does this table have no index at all? Where is the data placed?
  - How to understand row ID?
MySQL Collection

MySQL is a database that supports plug-in storage engines. In MySQL, each table can specify the storage engine it uses when it is created.

Mainly focus on the implementation of the indexes of the two most commonly used storage engines, MylSAM and InnoDB.

MySQL data storage file

MySQL data is stored on the disk in the form of files. The location of the MySQL database file is determined by the datadir parameter. These locations may vary depending on the specific MySQL version and installation method. Users can also use the MySQL configuration file to (my.cnf) to set the location of the MySQL database file.

The following command can view the location where the data files are stored.

show global variables like 'datadir%';

I’m using Windows here

You can see that in the Windows system, the data file is in C:\ProgramData\MySQL\MySQL Server 5.7\Data

In the data directory of MySQL, each database corresponds to a folder, and the table files of the corresponding database exist in the corresponding named folder. Take the test database as an example, click to enter the folder, and you can find the data table files.

Here we can see that each InnoDB table has two files (.frm and .ibd), and the MylSAM table
There are three files (.frm, .MYD, .MYI).
Data file:

.frm file: It is a table structure definition file, which stores the metadata information of the table, including the fields, indexes, etc. of the table. .frm is a file that defines the table structure in MySQL. It will be generated no matter which storage engine you choose when creating the table.
.MYD file: It is a data file that stores the actual table data.
.MYI file: It is an index file, which stores the index information of the table and is used to speed up queries.

The .frm file is a table structure definition file, which stores the metadata information of the table, including the fields, indexes, etc. of the table. .frm is a file that defines the table structure in MySQL. It will be generated no matter which storage engine you choose when creating the table. Let’s mainly look at how the other two files implement the indexes of different MySQL storage engines.

How to implement different storage engine indexes in MySQL

MylSAM

Inside MylSAM, there are two other files:

.MYD file: D stands for Data, which is the data file of MylSAM. Store actual table data records, such as all table data in the stu_myisam.MYD table.
.MYI file: I stands for Index, which is the index file of MylSAM. To store the index, for example, if we create a primary key index on the id field, then the primary key index is in this index file. An index will have a B + Tree, and all B + Trees are in this .MYI file.

The index and data in MylSAM are two independent files. How does it find the data through the index?

In MylSAM’s B + Tree, the leaf nodes store the disk addresses corresponding to the data files. Therefore, after finding the key value from the index file .MYI, the corresponding data record will be obtained from the data file .MYD.

In MylSAM, other indexes are also in this .MYI file. ?
There is no difference in the way non-primary key indexes store and retrieve data from primary key indexes. There is no primary or secondary distinction. Both of them find the disk address in the index file and then obtain the data in the data file.
This is the implementation form of the index in MylSAM. But it’s different in InnoDB.

InnoDB

In InnoDB, apart from the .frm file, there is only one .ibd file, which means that its indexes and data are stored in this file. That is to say, on the leaf node of the index, it directly stores our data, not the storage address. This is why it is said that in InnoDB the index is the data and the data is the index. This is the reason.

With multiple indexes in InnoDB, will each index put a copy of the complete record in the leaf node?

No, because this will bring additional waste of storage space and computational consumption. An InnoDB table may have many multiple indexes, and it is impossible to store a copy of data on the leaf node of each index, because each additional index will cause the disk space occupied to increase exponentially; and when new data is added, Sometimes, you need to operate on multiple B + Trees, which wastes space and time.
There must be only one copy of the complete record.

Which index leaf node should be placed for complete records in InnoDB?

There is a special index, which is called 'Clustered Index (Clustered Index)'. The so-called clustered index means that the logical order of the index key values is consistent with the physical storage order of the table data rows. , complete record rows are stored on its leaf nodes.

Clustered index/clustered index

'Clustered index (clustered index)' means that the logical order of the index key values is consistent with the physical storage order of the table data rows, and its leaf nodes store complete record rows. For example, the dictionary directory is sorted by Pinyin, and the content is also sorted by Pinyin. This directory sorted by Pinyin is called a clustered index.

The way InnoDB organizes data is (clustered index organize table). If a table creates a primary key index, then this primary key index is a clustered index, which determines the physical storage order of data rows.

In other indexes (secondary indexes) in InnoDB, how can I retrieve complete data if there is no data on the leaf nodes?

In InnoDB, there is a primary and secondary distinction between primary key indexes and auxiliary indexes. If there is a primary key index, then the primary key index is a clustered index, and other indexes are collectively called 'secondary index'.

The leaf nodes of the secondary index do not store complete data, so retrieving data must go to the leaf nodes of the primary key index to get all its records. In other words, it needs to store something that can complete it and reach the leaf nodes of the primary key index. Something to get data from.

The secondary index stores the key value of the secondary index. For example, if an index is created on name, the value of name is stored on the node, 'Susan', etc. (Obviously, its key The logical order of values is inconsistent with the order of physical rows). The leaf node of the secondary index stores the value of the primary key corresponding to this record. For example, Susan id = 3.

The secondary index needs to scan its own B + Tree first, and then find the leaf nodes of the primary key index based on the leaf nodes of the secondary index to get the data. From this perspective, because the primary key index scans one less B + Tree than the secondary index (avoiding table backing), it is relatively faster.

Reply to form

Table return: Non-primary key index. We first find the key value of the primary key index through the index, and then use the primary key value to find out the data that is not in the index. It scans one more tree than the query based on the primary key index. Index tree, this process is called table return.

The process of retrieving data through secondary index is as follows:
When we use the name index to query a record, it will find name=Susan in the leaf node of the secondary index, get the primary key value, which is id = 3, and then Then go to the leaf node of the primary key index to get the data.

Why does the secondary index in InnoDB store key values instead of addresses?

Because the address will change. When your data is constantly added and modified, B + Tree is also constantly split and merged, and the address of the data will also change accordingly. If other indexes record addresses, the B + of other indexes will also change accordingly. It is too troublesome to change the Tree all over again, so other indexes store an unchanged value, which is the value of the clustered index.

What should I do if a table does not have a primary key in InnoDB? Which index leaf node is the complete record placed on? Or, does this table have no index at all? Where is the data placed?

MySQL official website introduction: https://dev.mysql.com/doc/refman/5.7/en/innodb-index-types.html

If we define a primary key (PRIMARY KEY), then InnoDB will choose the primary key as the clustered index
If the primary key is not explicitly defined, InnoDB will select the first unique index that does not contain a NULL value as the primary key index.
If there is no such unique index, InnoDB will choose the built-in 6-byte long row ID as a hidden clustered index, which will increment the primary key as row records are written.

How to understand row ID?

When there is a numeric single-column primary key in the table, the row ID refers to this primary key column.
When there is no primary key in the table, but there is a non-null unique column of numeric type, the row ID actually refers to the corresponding non-null unique column.

It can be understood like this:
The row ID is not a real column. The essence of the row ID is an alias of a non-empty, unique column.

Row ID query cannot be used in the following three situations:

The type of the primary key column or the non-null unique column is not a numeric type – the primary key is a clustered index
The primary key is the joint primary key – the primary key is the clustered index
The only column is not non-null, and the row ID does not point to any column. Instead, a 6-byte hidden column is created as a clustered index (cannot be found explicitly, does not mean it does not exist)

You can query the row ID information of a table with the following statement

select _rowid from stu_innodb;

If there is no index on this table, the query _rowid will not be found. Although it cannot be found explicitly, it does not mean that it does not exist.

At this time, add a unique index to the id and do not allow it to be null.

Query again

Here you can see that the query result of row ID is the result set of the unique index id just added, so row ID is essentially an alias of a non-empty and unique column.

MySQL Collection

MySQL1: MySQL development history, popular branches of MySQL and their corresponding storage engines
MySQL2: How is a SQL query executed in MySQL?
MySQL3: How is an update SQL executed in MySQL?
MySQL4: What is an index; index type; index storage model development: 1. Binary search, 2. Binary search tree, 3. Balanced binary tree, 4. Multi-way balanced search tree, 5. B + tree, 6. Why not use indexes Red and black trees? 7.What does InnoDB’s hash index refer to?
MySQL5: MySQL data storage file; the index and data in MylSAM are two independent files. How does it find the data through the index? Clustered index/clustered index, why does the secondary index in InnoDB not store addresses but key values? How to understand row ID?
MySQL6: Index usage principles, joint indexes, joint primary keys/composite primary keys, covering indexes, what is table return? Index condition push down, index creation and use, index creation and use, index failure