Is 20 million rows still an effective soft limit for MySQL tables in 2023?

Rumours

There are rumors on the internet that we should avoid having more than 20 million rows in a single MySQL table. Otherwise, the performance of the table will degrade, and when it exceeds the soft limit, you will find that SQL queries are much slower than usual. These judgments were made many years ago when using HDD hard disk storage. I wonder if this is still true for SSD-based MySQL databases in 2023, and if so, why?

Environment

· Database
MySQL version: 8.0.25
Instance type: AWS db.r5.large (2vCPUs, 16GiB RAM)
EBS Storage Type: General Purpose SSD (gp2)
· Test Client
Linux kernel version: 6.1
Instance type: AWS t2.micro (1 vCPU, 1GiB memory)

Design of experiments

Create tables with the same schema but different sizes. I created 9 tables with 100,000, 200,000, 500,000, 1 million, 2 million, 5 million, 10 million, 20 million, 30 million, 50 million and 60 million rows.

1. Create several tables with the same schema:

CREATE TABLE row_test(
`id` int NOT NULL AUTO_INCREMENT,
`person_id` int NOT NULL,
`person_name` VARCHAR(200),
`insert_time` int,
`update_time` int,
    PRIMARY KEY (`id`),
    KEY `query_by_update_time` (`update_time`),
    KEY `query_by_insert_time` (`insert_time`)
);

2. Insert a table with different rows. I use a test client and replication to create these tables. The script can be found here.

# test client
INSERT INTO {table} (person_id, person_name, insert_time, update_time) VALUES ({person_id}, {person_name}, {insert_time}, {update_time})

# copy
create table <new-table> like <table>
insert into (`person_id`, `person_name`, `insert_time`, `update_time`)
select `person_id`, `person_name`, `insert_time`, `update_time` from

The values for person_id, person_name, insert_time, and update_time are random.

3. Use the test client to execute the following sql query to test the performance. The script can be found here.

select count(*) from <table> -- full table scan
select count(*) from <table> where id = 12345 -- query by primary key
select count(*) from <table> where insert_time = 12345 -- query by index
select * from <table> where insert_time = 12345 -- query by index, but cause 2-times index tree lookup

4. View the status of the innodb buffer pool

SHOW ENGINE INNODB STATUS
SHOW STATUS LIKE 'innodb_buffer_pool_page%'

5. Be sure to restart the database after each test on the table! Flush the innodb buffer pool to avoid reading old caches and getting wrong results!

Results

Query 1: select count(*) from

This kind of query will cause a full table scan, which is not good for MySQL.

No-cache round: (first round) When there is no cached data in the buffer pool, the query is executed for the first time.
Cache round: (Other round) Execute the query when there is already data cached in the buffer pool, usually after the first execution.

A few observations:

1. The running time of the first executed query is longer than that of the subsequent ones

The reason is that MySQL uses innodb_buffer_pool to cache data pages. Before the first execution, the buffer pool is empty and it must do a lot of disk I/O to load the table from the .idb file. But after the first execution, the data is stored in the buffer pool, and subsequent executions can obtain the results through memory calculations, avoiding disk I/O, and the speed is faster. This process is called MySQL buffer pool warming.

2.select count(*) from

will attempt to load the entire table into the buffer pool

I compared the innodb_buffer_pool statistics before and after the experiment. After running a query, the buffer pool usage changes equal to the table size if the buffer pool is large enough. Otherwise only part of the table will be cached in the buffer pool. The reason is that the query select count(*) from table will scan the entire table and count the number of rows row by row. Without caching, this requires loading the full table into memory. Why? Because Innodb supports transactions, it cannot guarantee that transactions see the same table at different times. A full table scan is the only safe way to get an exact row count.

3. If the buffer pool cannot hold the full table, the query delay will explode

I noticed that the innodb_buffer_pool size has a big impact on query performance, so I tried running the query under different configurations. When using a buffer pool of 11G, the spike in query latency occurs when the table size reaches 50M. Then reduce the buffer pool size to 7G, and the sudden increase in query latency occurs when the table size is 30M. Finally reducing the buffer pool size to 3G, the spike in query runtime occurs when the table size is 20M. Obviously, if the data in the table cannot be cached in the buffer pool, executing select count(*) from

requires expensive disk I/O operations to load the data, resulting in a sudden increase in query runtime increase.

4. In the case of no cache, the query running time has a linear relationship with the size of the table and has nothing to do with the size of the buffer pool.

Uncached loop runtime is determined by disk I/O, independent of buffer pool size. There is no difference in select count(*) warming the buffer pool for storage disks with the same IOPS.

5. If the table cannot be fully cached in the buffer pool, then the difference in query runtime between rounds without caching and rounds with caching is constant.

Also note that although there will be a spike in query runtime if the table cannot be fully cached in the buffer pool, the runtime is predictable. The difference between run time without and with cache rounds is constant regardless of the size of the table. The reason is that part of the table’s data is cached in the buffer pool, and this difference represents the time saved by querying from the buffer pool instead of disk.

Query 2, 3: select count(*) from

where = 12345

This query uses an index. Since it is not a range query, it only needs to search for pages from top to bottom in the path of the B + tree, and cache these pages in the innodb buffer pool.

The B+trees of the tables I created all had a depth of 3, resulting in 3-4 I/Os to warm up the buffer, taking an average of 4-6ms. After that, if I run the same query again, it looks up the results directly from memory, which is 0.5ms, which is equal to the network RTT. If a cached page misses for a long time and is evicted from the buffer pool, it must be loaded from disk again, which requires up to 4 disk I/Os.

Query 4: select * from

where = 12345

This query results in 2 index lookups. Since select * requires a query to get person_name, person_id that are not included in the index, the database engine must look for 2 B+ during query execution Tree. It first looks up the insert_timeB+ tree to get the primary key of the target row, and then looks up the primary key B+ tree to get the complete data of the row, as shown in the following figure:

This is why we should avoid select * in production. And in experiments, the data confirms that this query loads 2x more page blocks than queries 2 or 3, up to 8. The average query runtime is 6-10 ms, also 1.5 to 2 times longer than queries 2 or 3.

How did the rumor come about

First we need to know the physical structure of the innodb index page. The default page size is 16k and consists of header, system record, user record, page director and footer. There will only be 15-14k left to store free data.

Assuming you use INT as primary key (4 bytes), 1KB payload per row. Each leaf page can store 15 rows, which will be 4 + 8 = 12 bytes, making it a pointer to that page. Therefore, each non-leaf page can hold up to 15k / 12 bytes = 1280 pointers. If you have a 4-level B+ tree, it can hold up to 1280*1280*15 = 24.6M rows of data.

Back when HDDs dominated the market and SSDs were too expensive for databases, 4 random I/Os was probably the worst we could tolerate, and a query with 2 index tree lookups would make it even worse bad. Engineers at the time wanted to control the depth of index trees and didn’t want them to grow too deep. Now that SSDs are becoming more popular and random I/O is cheaper than before, we can look back at the rules of 10 years ago.

By the way, a 5-layer B+ tree can hold 1280*1280*1280*15 = 31.4B rows of data, which exceeds the maximum number that INT can hold. Different assumptions about the size of each row will result in different soft limits, smaller or larger than 20M. For example, at about 816 bytes per line in my experiments (I use the utf8mb4 character set, so each character takes 4 bytes), the soft limit that a 4-level B+ tree can hold is 29.5M.

Conclusion

  1. Innodb_buffer_pool size/table size determines whether performance degradation will occur.

  2. A more meaningful indicator to judge whether a MySQL table needs to be split isquery run time to buffer pool hit ratio. If queries always hit the buffer pool, there will be no performance issues. 20 million rows is just a value based on experience.

  3. In addition to tearing down tables, increasing the size of the InnoDB buffer pool or database memory is also an option.

  4. In a production environment, if possible, try to avoid using select *, because in the worst case it will cause two lookups of the index tree.

  5. (Personal opinion) Given the popularity of SSDs now, 20 million rows is not a very effective soft limit for MySQL tables.

Source: Yisheng’s blog?

For more technical dry goods, please pay attention to the official account “Cloud Native Database

squids.cn, based on the basic resources of the public cloud, provides RDS on the cloud, cloud backup, cloud migration, SQL window portal enterprise functions,

Help enterprises quickly build a cloud-based database integration ecosystem.

The knowledge points of the article match the official knowledge files, and you can further learn relevant knowledge MySQL entry skill tree Database composition Table 55738 people are studying systematically

syntaxbug.com © 2021 All Rights Reserved.