How to judge how much data is best to put in a table?

This is what happened

Interviewer: Tell me about what you did as an intern.

Friend: During my internship, I built a function to store user operation records. It mainly obtains user operation information sent from the upstream service from MQ, and then stores this information in MySQL for use by colleagues in the data warehouse.

Friend: Since the amount of data is relatively large, there are probably more than 40 to 50 million items every day, so I also performed sub-table operations for it. Three tables are generated regularly every day, and then the data is modulated and stored in these three tables respectively to prevent excessive data in the table from slowing down the query speed.

There seems to be nothing wrong with this expression, right? Don’t worry, let’s continue reading:

Interviewer: Then why do you divide it into three tables? Can’t two tables work? Wouldn’t four tables work?

Friends: Because each MySQL table should not exceed 20 million pieces of data, otherwise the query speed will be reduced and performance will be affected. Our daily data is about 50 million pieces, so it is safer to divide it into three tables.

Interviewer: Any more?

Friend: No more… What are you doing, ouch?

Interviewer: Then go back and wait for the notification.

After speaking, do you see anything? Do you think there are any problems with this friend’s answer?

Foreword

Many people say that each MySQL table should not exceed 20 million pieces of data, otherwise it will lead to performance degradation. Alibaba’s Java development manual also states that it is recommended to split databases and tables only if the number of rows in a single table exceeds 5 million or the capacity of a single table exceeds 2GB.

But in fact, this 20 million or 5 million is just an approximate number and does not apply to all scenarios. If you blindly think that as long as the table data does not exceed 20 million, there will be no problem. It will cause the performance of the system to drop significantly.

In actual situations, each table has different fields and the space occupied by the fields, so the amount of data they can store under optimal performance is also different.

So, how to calculate the appropriate amount of data for each table? Don’t worry, look down slowly.

Readers suitable for this article

To read this article, you need to have a certain basic knowledge of MySQL. It is best to have a certain understanding of InnoDB and B + trees. You may need to have more than one year of MySQL learning experience (about one year?) and know that “the height of B + trees in InnoDB is generally It would be better to keep it within three levels.” This piece of theoretical knowledge.

This article mainly explains the topic “How much data can be stored in a B + tree with a height of 3 in InnoDB?” Moreover, the calculation of data in this article is relatively strict (at least stricter than more than 95% of related blog posts on the Internet). If you care about these details and are not clear at the moment, please continue reading.

It will take you about 10-20 minutes to read this article. If you check the data while reading, it may take about 30 minutes.

Mind map of this article

Quick review of basic knowledge

As we all know, the storage structure of InnoDB in MySQL is B + tree. Everyone is familiar with B + tree, right? The features are roughly as follows, let’s quickly review them together!

Note: The following content is the essence. Students who cannot read or understand it are recommended to save this article first and come back to read it later when they have a knowledge base.

A data table generally corresponds to the storage of one or more trees. The number of trees is related to the number of indexes. Each index will have a separate tree.
Clustered index and non-clustered index:

Primary key indexes are also clustered indexes, and non-primary key indexes are non-clustered indexes. Except for format information, the non-leaf nodes of both indexes only store index data. For example, if the index is id, then the non-leaf node stores the id data. The differences between leaf nodes are as follows:
- The leaf nodes of the clustered index generally store all field information of this piece of data. So when we select * from table where id = 1, we always go to the leaf nodes to get the data.
- The leaf nodes of the non-clustered index store the primary key and index column information corresponding to this piece of data. For example, if this non-clustered index is username, and the primary key of the table is id, then the leaf nodes of the non-clustered index store username and id, but not other fields. It is equivalent to first finding the value of the primary key from the non-clustered index, and then checking the data content based on the primary key index. Generally, it needs to be checked twice (unless the index is covered). This is also called table return , a bit like storing a pointer, pointing to the real address where the data is stored.

B + tree queries are queried layer by layer from top to bottom. Generally speaking, we think it is better to keep the height of the B + tree within 3 layers, that is, the upper two layers are indexes, and the last layer stores data. In this way, only three disk IOs are required when looking up the table (actually one less time, because the root node will be resident in memory), and the amount of data that can be stored is also considerable. If the amount of data is too large and the B+ tree becomes 4 layers, each query will require 4 disk IOs, which will degrade performance. That’s why we calculate the maximum number of pieces of data that InnoDB’s 3-layer B + tree can store.
The default size of each MySQL node is 16 KB, that is, each node can store up to 16 KB of data, which can be modified, with a maximum of 64 KB and a minimum of 4 KB.

Expansion: What if the data in a certain row is particularly large and exceeds the size of the node?

The explanation from the MySQL5.7 documentation is:

For the 4KB, 8KB, 16KB, and 32KB settings, the maximum row length is slightly less than half the database page. For example: for the default 16KB page size, the maximum row length is slightly less than 8KB, and for the default 32KB page size, the maximum row length is slightly less than 16KB.

For 64KB pages, the maximum row length is slightly less than 16KB.

If a row exceeds the maximum row length, variable-length columns are stored in external pages until the row meets the maximum row length limit. That is to say, varchar and text with variable lengths are stored in external pages to reduce the data length of this row.

Document address: MySQL:: MySQL 5.7 Reference Manual:: 14.12.2 File Space Management

MySQL query speed mainly depends on the read and write speed of the disk, because MySQL only reads one node into the memory at a time when querying. It uses the data of this node to find the location of the next node to be read, and then reads the data of the next node. Data until the required data is queried or the data does not exist.

Someone must be asking, doesn’t the data in each node need to be queried? Why isn’t the time taken calculated here?

This is because after reading the entire node data, it will be stored in the memory. Querying the node data in the memory actually takes a very short time. Coupled with the MySQL query method, the time complexity is almost O(logN). Compared with disk IO, it can be ignored.

MySQL InnoDB node storage content

In Innodb’s B + tree, the nodes we often talk about are called pages. Each page stores user data. All pages together form a B + tree (of course In reality, it will be much more complicated, but we just need to calculate how many pieces of data can be stored, so we can understand it this way).

Page is the smallest disk unit used by the InnoDB storage engine to manage the database. We often say that each node is 16KB, which actually means that the size of each page is 16KB.

This 16KB space needs to store page format information and row format information. The row format information also contains some metadata and user data. Therefore, when we calculate, we must include all these data.

Page format

The basic format of each page, that is, some information contained on each page, is summarized in the following table:

Name	Space	Meaning and function, etc.
File Header	38 bytes	File header, used to record some header information of the page. Including checksum, page number, two pointers to the preceding and following nodes, page type, table space, etc.
Page Header	56 bytes	Page header, used to record page status information. Including the number of slots in the page directory, the address of the free space, the number of records on this page, the number of bytes occupied by deleted records, etc.
Infimum & supremum	26 bytes	Used to limit the boundary value of the current page record, including a minimum value and a maximum value.
User Records	Not fixed	User records, the data we insert is stored here.
Free Space	Unfixed	Free space, take space from here when user records are added.
Page Directort	Unfixed	The page directory is used to store the location information of user data in the page. Each slot will hold 4-8 pieces of user data. Each slot occupies 1-2 bytes. When one slot exceeds 8 pieces of data, it will be automatically divided into two slots.
File Trailer	8 bytes	File end information, mainly used to verify page integrity.

Schematic:

I searched the official website for a long time about the page format, but couldn’t find it. . . . I don’t know if it’s because I didn’t write it or because I’m blind. If anyone has found it, I hope they can help me post it in the comment area Therefore, the table content in the format of the above page is mainly based on learning and summary from some blogs.

Additionally, when new records are inserted into an InnoDB clustered index, InnoDB attempts to leave 1/16 of the pages free for future insertions and updates of index records. If the index records are inserted in order (ascending or descending order), the resulting page has approximately 15/16 of the available space. If records are inserted in random order, approximately 1/2 to 15/16 of the page space is available. Reference documentation: MySQL :: MySQL 5.7 Reference Manual :: 14.6.2.2 The Physical Structure of an InnoDB Index

In addition to User Records and Free Space, the memory occupied is 38 + 56 + 26 + 8 = 128 bytes. Each page leaves only space for user data. That leaves 16 × × 1024 ? 128 = 15232 bytes (1/16 reserved). Of course, this is a minimum because we don’t take the page directory into account. The page directory is left to be considered later. This has to be calculated based on the table fields.

Row format

First of all, I think it is necessary to mention that the default row format of MySQL 5.6 is COMPACT (compact), and the default row format of 5.7 and later is DYNAMIC (dynamic). Different row formats are stored in different ways, and For the other two row formats, the subsequent content of this article is mainly explained based on DYNAMIC.

Official document link: MySQL :: MySQL 5.7 Reference Manual :: 14.11 InnoDB row format (most of the following row format content can be found in it)

Each line of record contains the following information, most of which can be found in official documentation. What I wrote here is not very detailed. I only wrote some knowledge that can help us calculate space. For more detailed information, you can search for “MySQL row format” online.

Name	Space	Meaning and function, etc.
Line record header information	5 bytes	Line record header information. Contains some flag bits, data type and other information, such as: deletion flag, minimum record flag, sorting record, data type, position of the next record in the page, etc.
Variable The length field list	is not fixed	to save the number of bytes occupied by those variable-length fields, such as varchar, text, blob, etc. If the length of the variable-length field is less than 255 bytes, it is represented by 1 byte; if it is greater than 255 bytes, it is represented by 2 bytes. If there are several variable-length fields in the table field, there will be several values in the list. If there are none, they will not be stored.
Null value list	Not fixed	Used to store whether a field that can be null is null. Each nullable field occupies one bit here, which is the idea of bitmap. The space occupied by this list grows in bytes. For example, if there are 9 to 16 nullable columns, two bytes are used instead of 1.5 bytes.
Transaction ID and pointer fields	6 + 7 bytes	Friends who know MVCC should know that the data row contains A 6-byte transaction ID and a 7-byte pointer field are provided. If the primary key is not defined, there will be an additional 6-byte row ID field. Of course, we all have a primary key, so we do not calculate this row ID.
Actual data	Not fixed	This part is our real data.

Schematic diagram:

There are a few more points to note:

Storage of overflow pages (external pages)

Note: This is a feature of DYNAMIC. When using DYNAMIC to create a table, InnoDB will strip out the values of longer variable-length columns (such as VARCHAR, VARBINARY, BLOB, and TEXT types) and store them on an overflow page, leaving only a 20-byte page on the column. The pointer points to the overflow page.

The COMPACT row format (MySQL5.6 default format) stores the first 768 bytes and 20-byte pointers in the records of the B + tree nodes, and the rest is stored on the overflow page.

Whether a column is stored off-page depends on the page size and the total size of the rows. When a row is too long, the longest column is selected for off-page storage until the clustered index record fits on the B + tree page (the document does not say how many). TEXT and BLOBs less than or equal to 40 bytes are stored directly within the row and are not paged.

Advantages

The DYNAMIC row format avoids the problem of filling B+ tree nodes with large amounts of data, resulting in long columns.

The idea behind the DYNAMIC row format is that if part of a long data value is stored off-page, it is usually most efficient to store the entire value off-page.

With the DYNAMIC format, shorter columns are kept in B+ tree nodes whenever possible, minimizing the number of overflow pages required for a given row.

Storage under different character encodings

Char, varchar, text, etc. need to set the character encoding type. When calculating the occupied space, the space occupied by different encodings needs to be considered.

Types such as varchar and text will have a length field list to record the length they occupy, but char is a fixed-length type, so the situation is special. Assume that the type of field name is char(10), then the following situations occur:

For fixed-length character encoding (such as ASCII code), the field name will be stored in a fixed-length format. Each character of the ASCII code occupies one byte, so the name occupies 10 bytes.
For variable-length character encodings (such as utf8mb4), at least 10 bytes will be reserved for name. If possible, InnoDB will save it to 10 bytes by trimming trailing whitespace. If the space cannot be saved after trimming, the trailing spaces will be trimmed to the minimum byte length of the column value (usually 1 byte). The maximum length of the column is: the maximum character length of the character encoding × the maximum character length of the N character encoding \times the maximum character length of the N character encoding × N. For example, the encoding of the name field is utf8mb4, that is 4×104 \times 104×10 .
char columns greater than or equal to 768 bytes are treated as variable-length fields (just like varchar) and can be stored across pages. For example, the maximum byte length of the utf8mb4 character set is 4, so a char(255) column may exceed 768 bytes for cross-page storage.

To be honest, I don’t quite understand the design of char. Although I have read it for a long time, including official documents and some blogs, I hope students who understand can clarify their doubts in the comment area: For character encoding with variable length, char Isn’t it a bit like a variable-length type? The commonly used utf8mb4 occupies 1 ~ 4 bytes, so the space occupied by char(10) is 10 ~ 40 bytes. This change is quite big, but it does not leave enough space for it, nor does it Is it special to use a variable-length field list to record the space usage of char fields?

Start calculation

Okay, we already know what is stored in each page, and now we have the computing power. Since I have already calculated the remaining space of the page in the page format above, there will be 15232 bytes available for each page. Let’s calculate the rows directly below.

Non-leaf node calculation

Single node calculation

The index page is the node where the index is stored, that is, the non-leaf node. Each index record contains the value of the current index, a 6-byte pointer information, and a 5-byte row header, which is used to point to the pointer to the next layer of data page.

I didn’t find the space occupied by the pointer in the index record in the official document, I refer to other blog posts for this 6 bytes. They said that it is 6 bytes in the source code, but I don’t know which section of the source code it is in? ?. I hope students who know more can clarify their doubts in the comment area.

Assuming that our primary key ID is of bigint type, which is 8 bytes, then the space occupied by each row of data in the index page is equal to 8 + 6 + 5 = 19 bytes. Each page can store 15232÷19≈801 pieces of index data.

Including the page directory, if calculated based on an average of 6 pieces of data per slot, there are at least 801÷6≈134 slots, which requires 268 bytes of space.

If the data storage space is allocated to slots, I calculated that approximately 787 index data can be stored.

If the primary key is of type int, more data can be stored, about 993 pieces of index data.

Calculation of non-leaf nodes in the first two layers

In a B+ tree, when a node has N index records, it will have N child nodes. Since the first two layers of our 3-layer B+ tree are index records, and the root node of the first layer has N index records, then the second layer will have N nodes. The data type of each node is the same as the root node. It can still be If NNN more records are stored, the number of nodes in the third layer will be equal to N^2. Then there are:

A table with a primary key of bigint can store 787^2 = 619369 leaf nodes. A table with a primary key of int can store 993^2 = 986049 leaf nodes.

OK calculation completed.

Calculation of data items

Minimum number of records to store

We mentioned earlier that the maximum row length is slightly less than half of the database page. The reason why it is slightly less than half is because each page leaves some space for other content in the page format, so we can think that each page can hold at least two pieces of data, each piece of data is slightly less than 8KB. If the data length of a row exceeds this value, InnoDB will definitely divide some data into overflow pages, so we will not consider it. If each piece of data is 8KB, each leaf node can only store 2 pieces of data. Such a table, when the primary key is bigint, can only store 2×619369=1238738 pieces of data, which is one hundred and twenty Tens of thousands of items, this amount of data, you didn’t expect it

More storage records

Suppose our table looks like this:

-- This is a very ordinary course schedule. In addition to the id, it only contains two fields: course id and teacher id.
-- And these fields are all int type (of course, the table will not be designed like this in actual production, this is just an example).

CREATE TABLE `course_schedule` (
  `id` int NOT NULL,
  `teacher_id` int NOT NULL,
  `course_id` int NOT NULL,
  PRIMARY KEY (`id`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Let’s first analyze the row data of this table: there is no null value list, no variable-length field list, the transaction ID and pointer fields need to be counted, and the row record header needs to be counted, then the space occupied by each row of data is 4 + 4 + 4 + 6 + 7 + 5 = 30 bytes, each leaf node can store 15232÷30≈507 pieces of data. Counting the space occupied by the slot of the page directory, each leaf node can store 502 pieces of data, so the maximum amount of data that the three-layer B + tree can store is 502×986049=494,996,598, nearly 500 million pieces of data! Didn’t expect that.

Number of records stored in regular tables

In most cases, our table fields are not like the above, so I chose a more conventional table for analysis to see how much data can be stored. The table situation is as follows:

CREATE TABLE `blog` ( `id` bigint unsigned NOT NULL AUTO_INCREMENT COMMENT 'blog id', `author_id` bigint unsigned NOT NULL COMMENT 'author id', `title` varchar(50) CHARACTER SET utf8mb4 NOT NULL COMMENT 'title', `description` varchar(250) CHARACTER SET utf8mb4 NOT NULL COMMENT 'description', `school_code` bigint unsigned DEFAULT NULL COMMENT 'school code', `cover_image` char(32) DEFAULT NULL COMMENT 'cover image', `create_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT 'Creation time', `release_time` datetime DEFAULT NULL COMMENT 'First published time', `modified_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT 'modified time', `status` tinyint unsigned NOT NULL COMMENT 'Publication status', `is_delete` tinyint unsigned NOT NULL DEFAULT 0, PRIMARY KEY (`id`), KEY `author_id` (`author_id`), KEY `school_code` (`school_code`) USING BTREE ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_general_mysql500_ci ROW_FORMAT=DYNAMIC;

Analyze the row records of this table:

Line record header information: Must be present, occupies 5 bytes.

Variable length field list: title in the table occupies 1 byte, description occupies 2 bytes, a total of 3 bytes.

Null value list: Only the three fields school_code, cover_image, and release_time in the table can be null, so they only occupy 1 byte.

Transaction ID and pointer fields: Both are required and occupy 13 bytes.

Field content information:

id, author_id, school_code are all bigint type, each occupying 8 bytes, totaling 24 bytes.

create_time, release_time, modified_time are all datetime types, each occupying 8 bytes, totaling 24 bytes.

status and is_delete are tinyint types, each occupying 1 byte, totaling 2 bytes. cover_image is char(32), and the character encoding is the table default value utf8. Since the actual content stored in this field is only English letters (for URL storage), combined with the storage under different character encoding conditions mentioned above, it only occupies 32 bytes. . 4.** title and description** are varchar(50) and varchar(250) respectively. These two should not produce overflow pages (not sure). The character encoding is utf8mb4. In actual production, more than 70% of It is stored in Chinese (3 bytes), 25% in English (1 byte), and 5% in 4-byte emoticons. When the storage is full, it will occupy (50 + 250) × (0.7 × 3 + 0.25×1 + 0.05×4)=765 bytes.

According to all the above analyses, a total of 869 bytes are occupied, so each leaf node can store 15232÷869≈17 items. Counting the page directory, 17 items can still be stored. Then the maximum amount of data that a three-layer B + tree can store is 17×619369=10,529,273, which is about 10 million pieces of data. Again, you didn’t expect it.

Data calculation summary

According to the calculations in the above three different situations, it can be seen that the data storage volume in the case of InnoDB three-layer B + tree ranges from more than 1.2 million records to nearly 500 million records This span is still very large. At the same time, we also calculated a blog information table that can store about 10 million pieces of data. So, when we are considering sub-tables for a project, we should pay more attention to the actual situation of the table, instead of blindly thinking that 20 million data is the critical point. If this issue comes up during the interview, I think the interviewer does not want to know what the number is, but wants to see how you analyze the problem and the process by which you arrive at the number. . If there is anything wrong in this article, friends are welcome to correct it in the comment area.