[MySQL Architecture] SQL execution process and buffer pool

Article directory

    • 1. SQL execution process
    • 2. Database buffer pool (Buffer Pool)
      • 2.1 Buffer pool overview
      • 2.2 How to read data from the buffer pool
      • 2.3 View and set the size of the buffer pool
      • 2.4 Multiple Buffer Pool instances
      • 2.5 Extended questions

1. SQL execution process

  1. Query cache: Because query efficiency is often not high, this feature was abandoned after MySQL 8.0
  2. Parser: Perform syntactic analysis and semantic analysis of SQL statements in the parser
  3. Preprocessor: In the preprocessor, determine whether tables, fields, etc. exist
  4. Optimizer: The optimizer will determine the execution path of the SQL statement, such as whether it is based on full table retrieval or based on index retrieval, etc., to generate Corresponding execution plan;A query can be executed in many ways, and the same result will be returned in the end. The role of the optimizer is to find the best execution plan
  5. Executor: Before execution, it is necessary to determine whether the user has permission. If you have permission, open the table and continue execution. When the table is opened, the executor will call the storage engine API to read and write the table according to the engine definition of the table. The storage engine API is just an abstract interface. There is also a storage engine layer below. The specific implementation still depends on the storage engine selected.

For example: in the table test, the ID field has no index, then the execution process of the executor is as follows:

Call the InnoDB engine interface to get the first row of this table and determine whether the ID value is 1. If not, skip it. If so, store this row in the result set;
Call the engine interface to get the "next row" and repeat the same judgment logic until the last row of the table is fetched.

The executor returns a record set composed of all rows that meet the conditions during the above traversal process to the client as a result set.

At this point, the execution of this statement is completed. For tables with indexes, the execution logic is similar.

The flow of SQL statements in MySQL is: SQL statement->query cache->parser->optimizer->executor

2. Database Buffer Pool

InnoDB The storage engine manages storage space in units of pages. The addition, deletion, modification and query operations we perform are essentially accessing pages (including reading pages, writing pages, creating new pages, etc.) . Disk I/0 takes a lot of time, and operations in memory are much more efficient. In order to allow the data in the data table or index to be used by us at any time, the DBMS will apply for to occupy memory as data Buffer Pool, before actually accessing the page, the page on the disk needs to be cached in the Buffer Pool in the memory before it can be accessed.

The advantage of this is that it can minimize disk activity, thereby reducing the time spent doing I/0 directly to the disk. You know, this strategy is crucial to improving the query performance of SQL statements. If the indexed data is in the buffer pool, the cost of access will be much reduced.

2.1 Buffer pool overview

Buffer pool and query cache are not the same thing

In the Innodb storage engine, some data will be put into the memory, and the buffer pool accounts for most of this part of the memory. It is used to store caches of various data, as shown in the following figure:

The InnoDB buffer pool includes data pages, index pages, insertion cache, lock information, adaptive Hash and data dictionary information, etc.

When the InnoDB storage engine processes the client’s request, when it needs to access the data of a certain page, it will load all the data of the complete page into the memory. That is to say, even if we only need to access one For a record of a page, the data of the entire page needs to be loaded into memory first. After loading the entire page into memory, you can perform read and write access. After completing the read and write access, you are not in a hurry to release the memory space corresponding to the page, but cache it. In this way, when there is a request to access the page again in the future, the overhead of disk IO can be saved.

Caching principles:

  • The principle of “location * frequency” can help us optimize I/0 access efficiency.
  • First of all, location determines efficiency. The buffer pool is provided so that data can be directly accessed in memory.
  • Secondly, frequency determines the order of priority. Because the size of the buffer pool is limited, for example, the disk has 200G, but the memory is only 16G, and the buffer pool size is only 1G, it is impossible to load all the data into the buffer pool. This involves priority order, which will Prioritize loading of frequently used hot data

Read-ahead features of the buffer pool:

The function of the buffer pool is to improve the efficiency of I/0, and there is a “locality principle” when we read data. That is to say, if we use some data, there is a high probability that we will also use the surrounding data. Some data are loaded in advance using the “read ahead” mechanism, which can reduce possible future disk I/O operations.

2.2 How to read data from the buffer pool

The buffer pool manager will try to save frequently used data. When the database reads a page, it will first determine whether the page is in the buffer pool. If it exists, it will read it directly. If it does not exist, it will read it through memory or The disk stores the page in the buffer pool and then reads it.

The structure and function of the cache in the database are shown in the following figure: Note that it is in the storage engine

If we update the data in the cache pool when executing a SQL statement, will the data be synchronized to the disk immediately?

In fact, when we modify the records in the database, we will first modify the record information in the page in the buffer pool, and then the database will be refreshed to the disk at a certain frequency. Note that not every update operation occurs, disk writeback will be performed immediately. The buffer pool will use a mechanism called checkpoint to write data back to the disk. The benefit of this is to improve the overall performance of the database.

For example, when the buffer pool is not enough, some infrequently used pages need to be released. At this time, the checkpoint method can be forcibly used to write back the infrequently used dirty pages to the disk, and then remove them from the disk. These pages are released from the buffer pool. The dirty page here refers to the modified page in the buffer pool, which is inconsistent with the data page on the disk.

2.3 View and set the size of the buffer pool

If you are using the MySQL MyISAM storage engine, it only caches indexes and does not cache data. The corresponding key cache parameter is key_buffer_size, which you can use to view

If you are using the InnoDB storage engine, you can view the buffer pool size by looking at the innodb_buffer_pool_size variable. The command is as follows:

show variables like 'innodb_buffer_pool_size';

That is, the default buffer pool size of innoDB is only 134217728/1024/1024=128MB. We can modify the buffer pool size, for example to 256MB, as follows:

set global innodb_buffer_pool_size = 268435456;

Or modify the configuration file

[server]
innodb_buffer_pool_size = 268435456

2.4 Multiple Buffer Pool instances

The essence of the Buffer Pool is a continuous memory space that InnoDB applies to the operating system. In a multi-threaded environment, accessing data in the Buffer Pool requires locking processing. When the Buffer Pool is particularly large and concurrent access from multiple threads is particularly high, a single Buffer Pool may affect the request processing speed. So when the Buffer Pool is particularly large, we can split them into several small Buffer Pools. Each Buffer Pool is called an instance. They are Independently, apply for memory space independently, and manage various linked lists independently. Therefore, concurrent access by multiple threads will not affect each other, thereby improving concurrent processing capabilities.

You can modify the number of Buffer Pool instances by setting the value of innodb_buffer_pool_instances when the server starts. For example, the following value defaults to 1:

[server]
innodb_buffer_pool_instances=2

To check the number of buffer pools, use the command:

show variables like 'innodb_buffer_pool_instances';

The memory space occupied by each Buffer Pool instance is the total size divided by the number of instances, which is the following formula:

innodb_buffer_pool_size/innodb_buffer_pool_instances

That is, after setting the total buffer pool size to 256MB and then setting the number of buffer pool instances to 2, the calculation result is that each instance occupies 128MB of memory space.

Note: It is not that the more Buffer Pool instances are created, the better. Managing each Buffer Pool separately also requires performance overhead. InnoDB stipulates: Set up multiple instances when the value of innodb_buffer_pool_size is less than 1G is invalid, InnoDB will modify the value of innodb_buffer _pooL instances to 1 by default. And we encourage setting up multiple Buffer Pool instances when the Buffer Pool is greater than or equal to 1G.

2.5 Extended questions

When we query data, we will first query the Buffer Pool. If it does not exist in the Buffer Pool, the storage engine will first load the data from the disk into the Buffer Pool, and then return the data to the client; similarly, when we update a certain data, if the data does not exist in the Buffer Pool, The data will also be loaded first, and then the data in the memory will be modified. The modified data will be flushed to the disk later.

This process seems to have no problem, but in fact it is problematic. What if we successfully modify the data in the Buffer Pool, but MySQL hangs before we can flush the data to the disk? According to the logic in the figure above, the updated data only exists in the Buffer Pool. If MySQL crashes at this time, If the machine is shut down, this part of the data will be permanently lost;
Furthermore, an error suddenly occurred when I was halfway through the update. What should I do if I want to roll back to the version before the update? How can I talk about crash recovery if I can’t even guarantee data persistence and transaction rollback?

answer:

Redo Log: Solve the problem of MySQL downtime due to failure to flush the disk in time

Undo Log: Solve the problem that an error occurs and the transaction needs to be rolled back

The above two Logs will be analyzed in detail in the log section.