Region, segment, fragmented area and table space structure

Area, segment, fragmented area and table space structure

Structural diagram

In addition, in the database, there are also concepts of extent, segment and tablespace. The relationship between rows, pages, areas, segments, and table spaces is as shown in the figure below:

Regions, segments and fragmented areas

Why should there be zones?

The pages in each level of the B+ tree will form a doubly linked list. If storage space is allocated in units of pages, the physical locations between two adjacent pages in the doubly linked list may be very far apart. When we introduced the applicable scenarios of the B + tree index, we specifically mentioned that the range query only needs to locate the leftmost record and the rightmost record, and then scan along the doubly linked list. If the two adjacent records in the linked list Pages whose physical locations are very far apart are so-called random I/O. Once again, the speed of the disk is several orders of magnitude different from the speed of the memory. Random I/O is very slow, so we should try to make the physical locations of adjacent pages in the linked list adjacent when performing range queries. Only then can you use so-called sequential I/O.

Introducing the concept of area, an area is 64 consecutive pages in a physical location. Because the default page size in innoDB is 16KB, the size of an extent is 64*16KB= 1MB. When the amount of data in the table is large, when allocating space for an index, it is no longer allocated in units of pages, but in units of zones. Even when there is a lot of data in the table, it can be allocated in one go. Multiple contiguous zones. Although it may cause a little waste of space (the data is not enough to fill the entire area), from a performance perspective, it can eliminate a lot of random IO, and the work outweighs the disadvantages!

Why do we need paragraphs?

For range queries, the records in the leaf nodes of the B + tree are actually sequentially scanned. If the leaf nodes and non-leaf nodes are not distinguished, and all the pages represented by the nodes are placed in the applied area, the effect of range scanning will be It’s a big discount. Therefore, innoDB treats leaf nodes and leaf nodes of the B + tree differently, that is to say, leaf nodes have their own unique areas, and non-leaf nodes also have their own unique areas. The set of areas that store leaf nodes is considered a segment, and the set of areas that store non-leaf nodes is also considered a segment. In other words, an index will generate 2 segments, a leaf node segment and a non-leaf node segment.

In addition to the leaf node segments and non-leaf node segments of the index, InnoDB also has segments defined for storing some special data, such as rollback segments. Therefore, common segments include data segment, index segment, and rollback segment. The data segment is the leaf node of the B + tree, and the index segment is the non-leaf node of the B + tree.
In the innoDB storage engine, the management of segments is completed by the engine itself, and the DBA cannot and does not need to control it. This simplifies the DBA’s management of segments to a certain extent.
A segment does not actually correspond to a continuous physical area in the table space, but is a logical concept consisting of several scattered pages and some complete areas.

Why is there a fragmentation area?

By default, a table using the InnoDB storage engine has only one clustered index. One index will generate 2 segments, and the segments apply for storage space in units of zones. By default, one zone occupies 1M (64*16Kb = 1024Kb) storage Space, we also have leaf node areas and non-leaf node areas, so by default, does a small table that only stores a few records also need 2M of storage space? In the future, every time you add an index, you will need to apply for 2M of more storage space. What? This is a huge waste for tables that store relatively few records. The crux of this problem is that the areas we have introduced so far are very pure, that is, an entire area is allocated to a certain segment, or all pages in the area exist to store data in the same segment. Even if the data in the segment does not fill all the pages in the zone, the remaining pages cannot be used for other purposes.
In order to consider the situation that allocating a complete area to a certain segment is too wasteful of storage space for tables with small data volumes, lnnoDB proposes the concept of a fragment area. In a fragmented area, not all pages exist to store data of the same segment, but the pages in the fragmented area can be used for different purposes. For example, some pages are used for segment A, and some pages are used for segment A. Segment B, some pages do not even belong to any segment. The fragmented area belongs directly to the table space and does not belong to any segment.
So the strategy for allocating storage space for a certain segment is as follows:

When data is first inserted into the table, the segment is allocated storage space in single page units from a fragmented area.
When a segment has occupied 32 fragment area pages, it will apply to allocate storage space in complete area units.
So now a segment cannot only be defined as a collection of certain areas, but more accurately should be a collection of some scattered pages and some complete areas.

District classification

Districts can generally be divided into 4 types:

Free area (FREE): No pages in this area are currently used.
Fragmentation area with remaining space (FREE_FRAG): Indicates that there are still available pages in the fragmentation area.
Fragmentation area with no remaining space (FULL_FRAG): Indicates that all pages in the fragmentation area are used and there are no free pages.
Extent attached to a segment (FSEG): Each index can be divided into leaf node segments and non-leaf node segments.

Areas in the three states of FREE, FREE_FRAG and FULL_FRAG are independent and belong directly to the table space. The area in FSEG state is attached to a certain segment.

If the table space is compared to a group army, sections are equivalent to divisions and districts are equivalent to regiments. Generally, groups belong to a certain division, just like the areas in FSEG all belong to a certain segment, but the areas in the three states of FREE, FREE_FRAG and FULL_FRAG are directly affiliated with the table space, just like independent groups. It’s like taking orders directly from the military.

Table space

The table space can be regarded as the highest level of the logical structure of the InnoDB storage engine. All data is stored in the table space.

Table space is a logical container. The objects stored in table space are segments. There can be one or more segments in a table space, but a segment can only belong to one table space. Table space database consists of one or more table spaces. Table spaces can be divided into system table space (Systemtablespace), independent table space (File-per-table tablespace), undo table space (Undo Tablespace) and temporary table space from the management point of view. (Temporary Tablespace) etc.

Independent table space

Independent table space, that is, each table has an independent table space, that is, the data and index information will be stored in its own table space. Independent table spaces (ie: single tables) can be migrated between different databases.

The space can be reclaimed (the DROP TABLE operation can automatically reclaim the table space. In other cases, the table space cannot be reclaimed by itself). For statistical analysis or log tables, after deleting a large amount of data, you can use: alter table TableName engine=innodb; to reclaim unused space. For tables that use independent table spaces, no matter how they are deleted, the fragmentation of the table space will not seriously affect the performance, and there is still a chance to deal with it.

Independent table space structure
An independent table space consists of segments, areas, and pages. This has been explained before.

The file size corresponding to the real table space
When we look in the data directory, we will find that the .ibd file corresponding to a newly created table only occupies 96K, which is only 6 pages in size (in MySOL5.7). This is because the space occupied by the table space is very small at the beginning. There is no data in it. But don’t forget that these ibd files are self-expanding. As the data in the table increases, the files corresponding to the table space also gradually increase.

After mysql 8.0, the frm file and the ibd file have been merged, and the new table will be 7 pages in size.

Check the table space type of InnoDB:
Check whether independent table space is used

mysql> show variables like 'innodb_file_per_table';

You can see innodb_file_per_table=ON, which means that each table will be saved as a separate ibd file.

System table space

The structure of the system table space is basically similar to that of the independent table space, except that since the entire MySQL process has only one system table space, some additional pages about the entire system information will be recorded in the system table space, which is not available in the independent table space.

InnoDB Data Dictionary
Whenever we insert a record into a table, the MySQL verification process is as follows
First, you need to check whether the table corresponding to the insert statement exists, and whether the inserted columns match the columns in the table. If there is no problem with the syntax, you also need to know the root page corresponding to the clustered index of the table and all secondary indexes. Which table space and which page, and then insert the record into the B + tree of the corresponding index. Therefore, in addition to saving the user data we inserted, MySQL also needs to save a lot of additional information, for example:

Which table space a certain table belongs to and how many columns are there in the table?
What is the type of each column corresponding to the table?
How many indexes does the table have, which fields do each index correspond to, and which page in which table space is the root page corresponding to the index?
What foreign keys does the table have? Which columns of which table do the foreign keys correspond to?
What is the file path on the file system corresponding to a certain table space?

The above data is not the user data we inserted using the INSERT statement. In fact, it is some additional data that has to be introduced in order to better manage our user data. This data is also called metadata. The lnnoDB storage engine specifically defines a number of internal system tables to record these metadata:

These system tables are also called data dictionaries. They are stored in certain pages of the system table space in the form of B+ trees, among which SYS_TABLES SYS_COLUMNS

SYS_INDEXES SYS_FIELDS These four tables are particularly important and are called basic system tables.

Note: Users cannot directly access these internal system tables of innoDB unless you directly parse the files on the file system corresponding to the system table space. However, considering that viewing the contents of these tables may help you analyze the problem, some tables starting with innodb_sys are provided in the system database information_schema:

The tables starting with INNODB_SYS in the information_schema database are not real internal system tables (the internal system tables are the tables starting with SYS above), but these system tables starting with SYS are read when the storage engine starts, and then Populate into these tables starting with INNODB_SYS. The fields in the table starting with INNODB_SYS and the table starting with SYS are not exactly the same, but it is enough for your reference.

Appendix: Three ways to load data pages

The smallest unit of data InnoDB reads from disk is the data page. The data with id =xxx you want is one of the many rows in this data page. For the data stored in MySQL, we call it a table in a logical concept. On the physical level such as a disk, it is stored in the form of a data page. When it is loaded into MySQL we call it a cached page.

If there is no data for this page in the buffer pool, the buffer pool has the following three ways of reading data, and the reading efficiency of each method is different.

Memory reading

If the data exists in memory, the execution time is basically about 1ms, and the efficiency is still very high.

Random reading

If the data is not in the memory, the page needs to be searched on the disk. The overall time is expected to be about 10ms. Of these 10ms, 6ms is the actual busy time of the disk (including seek and half-turn time), and 3ms is an estimate of the queuing time that may occur, plus the 1ms transfer time to transfer the page from the disk server buffer to the database buffer. This 10ms seems very fast, but in fact it takes a very long time for the database, because it is only the time to read one page.

Sequential reading

Sequential reading is actually a batch reading method, because the data we request are often stored adjacent to each other on the disk. Sequential reading can help us read pages in batches, so that they can be loaded into the buffer pool at once. There is no need to perform separate disk IO operations on other pages. If the throughput of a disk is 40MB/S, then for a 16KB page, 2560 (40MB/16KB) pages can be read sequentially at one time, which is equivalent to a page read time of 0.4ms. Using batch reading, even reading from disk is more efficient than reading a single page from memory.