Mysql parameter optimization sync_binlog innodb_flush_log_at_trx_commit

How MySQL works:

https://blog.csdn.net/inthat/article/details/123244844

Binary log files are not synchronized to disk every time they are written. Therefore, when the operating system where the database is located is down, the last part of the data may not be written into the binary log file, which brings problems for recovery and replication.

The parameter sync_binlog=[N] means that each write buffer is synchronized to the disk multiple times. If N is set to 1, that is, sync_binlog=1 means that the binary log is written synchronously to the disk. At this time, the write operation does not use the system buffer to write the binary log. (Note: The value is 0 by default, and the operating system mechanism is used to synchronize buffered data).

When sync_binlog=1, there will be other problems. When using the InnoDB storage engine, before a transaction issues a commit action, since sync_binlog is set to 1, the binary log will be written to disk immediately. If the binary log has been written at this time, but the commit has not occurred, and a downtime occurs at this time, then when the Mysql database is started next time, the transaction will be rolled back because the commit operation has not occurred. But the binary log has already recorded the transaction information and cannot be rolled back.

This problem can be solved by setting innodb_support_xa to 1 to ensure the synchronization of binary logs and InnoDB storage engine data files.

From the official explanation, the role of innodb_support_xa is divided into two categories:

First, it supports multi-instance distributed transactions (external xa transactions), which are generally used more in distributed database environments.

Second, it supports internal xa transactions. To put it bluntly, it supports data consistency between binlog and innodb redo log. (Unlike other libraries that only have transaction libraries, the addition of redo is binlog)

Binlog only records submitted

The writing of redo log is split into two steps: prepare and commit, which is “two-phase commit”.

Why must there be a “two-phase commit”?

If two-phase commit is not used, assume that the value of field c in the current row with ID=2 is 0, and then assume that a crash occurs during the execution of the update statement after the first log is written and the second log is not yet written , what will happen?

1. Write redo log first and then binlog. Assume that when the redo log is finished and the binlog is not finished, the MySQL process restarts abnormally. As we said before, after the redo log is written, even if the system crashes, the data can still be recovered, so the value of c in this line after recovery is 1.

However, because the binlog crashed before it finished writing, this statement is not recorded in the binlog at this time. Therefore, when the log is backed up later, this statement does not exist in the saved binlog.

Then you will find that if you need to use this binlog to restore the temporary library, because the binlog of this statement is lost, the temporary library will be updated less this time, and the value of c in the restored line is 0, which is different from the value of the original library different.

2. Write binlog first and then redo log. If the crash occurs after the binlog is written, since the redo log has not been written yet, the transaction is invalid after the crash recovery, so the value of c in this line is 0. But the log “change c from 0 to 1” has been recorded in the binlog. Therefore, when binlog is used to restore later, one more transaction will appear, and the value of c in the restored line is 1, which is different from the value of the original database.

It can be seen that if the “two-phase commit” is not used, the state of the database may be inconsistent with the state of the library restored with its log.

Simply put, both redo log and binlog can be used to represent the commit status of a transaction, and two-phase commit is to keep the two states logically consistent.

The writing mechanism of binlog

Binlog only records the submitted ones, and the unsubmitted cache data is discarded.

Multiple transactions submitted at the same time also greatly affect MySQL and IO performance. It can be mitigated by group commit patch

The writing logic of the binlog is relatively simple: during transaction execution, the log is first written to the binlog cache, and when the transaction is committed, the binlog cache is then written to the binlog file.

The binlog of a transaction cannot be disassembled, so no matter how large the transaction is, it must be written once. This involves the preservation of binlog cache.

The system allocates a piece of memory to the binlog cache, one for each thread, and the parameter binlog_cache_size is used to control the size of the memory occupied by the binlog cache in a single thread. If the size specified by this parameter is exceeded, it must be temporarily saved to disk.

When the transaction is committed, the executor writes the complete transaction in the binlog cache to the binlog and clears the binlog cache. The status is shown in the figure below.

It can be seen that each thread has its own binlog cache, but shares the same binlog file.

(1) The write in the figure refers to the page cache that writes the log to the file system, and does not persist the data to the disk, so the speed is relatively fast.

(2) The fsync in the figure is the operation of persisting data to disk. In general, we think that fsync only accounts for disk IO.

The timing of write and fsync is controlled by the parameter sync_binlog:

When sync_binlog=0, it means that every time a transaction is submitted, it only writes and does not fsync;

When sync_binlog=1, it means that fsync will be executed every time a transaction is committed;

When sync_binlog=N(N>1), it means write every time a transaction is submitted, but fsync only after accumulating N transactions.

Therefore, in scenarios where IO bottlenecks occur, setting sync_binlog to a relatively large value can improve performance. In actual business scenarios, considering the controllability of the amount of lost logs, it is generally not recommended to set this parameter to 0. It is more common to set it to a value between 100 and 1000.

However, if sync_binlog is set to N, the corresponding risk is: if the host restarts abnormally, the binlog logs of the latest N transactions will be lost.

innodb_flush_log_at_trx_commit

When committing a transaction, write the redo log to the disk. The so-called redo log is to record what modification you have made to the data. For example, the value of the name field is changed to xxx for the record “id=10”, which is a log. If we want to commit a transaction, the redo log will be flushed from the redo log buffer to the disk file according to a certain strategy. At this time, this strategy is configured through innodb_flush_log_at_trx_commit, which has several options.

Value 0: When committing a transaction, the data in the redo log buffer is not flushed to the disk file immediately, but is flushed to the disk by the main thread of InnoDB every second. At this time, you may submit the transaction, and as a result, mysql is down, and then all the data in the memory is lost at this time.

The value is 1: when the transaction is committed, the redo log must be flushed from the memory to the disk file. As long as the transaction is successfully committed, the redo log must be on the disk. Note that because of the “delayed write” feature of the operating system, the flashing at this time is only written to the buffer of the operating system, so the synchronization operation can ensure that it will be persisted to the hard disk.

Value 2: When committing a transaction, write the redo log into the os cache corresponding to the disk file instead of directly entering the disk file. It may take 1 second before the data in the os cache is written to the disk file. .

It can be seen that only 1 can really guarantee the durability of the transaction, but because the refresh operation fsync() is blocked and does not return until it is completed, we know that the speed of writing to the disk is very slow, so the performance of MySQL will be significantly decline. If you don’t care about transaction loss, 0 and 2 can achieve higher performance.

# query select @@innodb_flush_log_at_trx_commit;

sync_binlog

This parameter controls the process of writing the binary log to disk.

Valid values for this parameter are 0, 1, N:

0: Default value. After the transaction is committed, the binary log is written from the buffer to the disk, but the refresh operation (fsync()) is not performed. At this time, only the operating system buffer is written. If the operating system crashes, part of the binary log will be lost.

1: After the transaction is committed, the binary file is written to the disk and the refresh operation is performed immediately, which is equivalent to writing to the disk synchronously without going through the cache of the operating system.

N: A refresh operation is performed every N times the operating system buffer is written.

Setting this parameter to a value above 1 will improve the performance of the database, but at the same time it will be accompanied by the risk of data loss.

Binary log files involve data recovery, and if you want to obtain maximum consistency between the master and slave, you should set this parameter to 1, but it will also cause a certain performance loss.

Optimization case

The application of Binlog logs in MySQL is slow, what should I do?

The bitterness of DBA2021-07-15647

Today, there is a business requirement that requires data recovery operations. It needs to be restored to 2021-07-11 15:21:00. Everyone should know that this kind of point-in-time recovery first restores the full amount of mysql to the remote machine through physical backup. , and then incrementally restore the binlog to achieve point-in-time recovery;

The data volume of the business environment is about 150G. According to the analysis of the total time of a large number of random recovery tests done before, the recovery of the data volume of 150G can be controlled within 30 minutes;

Under normal circumstances, the application time of 1G binlog is about 1~3 minutes, but in the process of restoring the application of binlog this time, it took nearly 15 minutes and it has not ended, resulting in the entire recovery time of more than 40 minutes. Business personnel cannot accept it, as a DBA it is estimated that it cannot be accepted!

Let’s do a simple analysis on the slow playback of Binlog:

Problem phenomenon:

Log in to the recovery instance and look at the current application status. It can be seen that the thread is currently in the state of Waiting for GTID to be committed. From this state, it is waiting for the GTID to be committed. The GTID is 548723ca-1f7f-11e9-b3ab-005056b748c5: 1655852015

mysql>show processlist; + ---- + ----------- + ----------------- + ------ + --------- + -------- + ------------------------------ ---- + -------------------------------------------- ------------------------------- + ----------- + ------ --------- + | Id | User | Host | db | Command | Time | ---------------- + ------ + --------- + -------- + ------- --------------------------- + ------------------- -------------------------------------------------- ---- + ----------- + --------------- + |6| dba_admin |127.0.0.1:6492| NULL | Sleep | 218132 | NULL |0| 0 || 16 | dba_admin | 127.0.0.1:11286 | NULL | Query |218078| Waiting for GTID to be committed | 1655852015 '| 0 |0||17| dba_admin |127.0.0.1:11294| NULL | Query | 0 | starting | show processlist |0| 0 | + ---- + ----------- + ----------------- + ------ + --------- + -------- + ----- ----------------------------- + -------------------- -------------------------------------------------- ------ + ----------- + --------------- + 3 rows in set (0.00 sec)

problem analysis:

1. First check the resource usage of the recovery machine, especially IO. After checking, it is found that the load of the recovery machine is very low and the resources are very idle, so it should not be caused by busy resources.

2. It may be that large transactions cause the slow application of binlog. Next, analyze whether there are large transactions in binlog

$ mysqlbinlog mysqlbin.002032 | grep"GTID$(printf '\t')last_committed" -B 1 | grep -E '^# at' | awk '{print $3}\ '| awk 'NR==1 {tmp=$1} NR>1 {print ($1-tmp);tmp=$1}'| sort -n -r | head -n 10mysqlbinlog: [Warning] unknown variable ' loose-default-character-set=utf8'203995532518143518103518055518045518043518037518035518033518023

Judging from the results, there is a large transaction of 200M. My God, it is estimated that this large transaction caused the playback to be slow. Is it caused by this large transaction?

3. Next, according to the GTID information provided by the long-running thread status, analyze the binlog file to see what the card owner’s GTID transaction is doing.

# at 156880949#210711  2:00:05 server id 255  end_log_pos 156881014 CRC32 0x9c02b320  GTID    last_committed=127167   sequence_number=127168SET @@SESSION.GTID_NEXT= '548723ca-1f7f-11e9-b3ab-005056b748c5:1655852015'/* !*/;# at 156881014#210711  2:00:05 server id 255  end_log_pos 156881077 CRC32 0xf27b8fdc  Query   thread_id=55553520      exec_time=54    error_code=0SETTIMESTAMP=1625940005/*!*/;BEGIN/*!*/;# at 156881077# 210711  2:00:05 server id 255  end_log_pos 156881203 CRC32 0xf297d5f2  Rows_query# delete from tabname where create_time <= date_sub(CURRENT_TIMESTAMP, interval 24 hour)# at 156881203#210711  2:00:05 server id 255  end_log_pos 156881301 CRC32 0xa101984d  Table_map: ` test_db`.`tabname` mapped to number 110# at 156881301#210711  2:00:05 server id 255  end_log_pos 156889457 CRC32 0xbe518068  Delete_rows: table id 110# at 156889457#210711  2:00:05 server id 255  end_log_pos 156897608 CRC32 0x54e957f0  Delete_rows : table id 110# at 156897608#210711 2:00:05 server id 255 end_log_pos 156905781 CRC32 0x8d2612ad  Delete_rows: table id 110# at 156905781#210711  2:00:05 server id 255  end_log_pos 156913928 CRC32 0xb15a94ea  Delete_rows: table id 110# at 156913928#210711  2:00:05 server id 255  end_log_pos 156922097 CRC32 0x6393fa7c  Delete_rows: table id 110# at 156922097#210711  2:00:05 server id 255  end_log_pos 156930266 CRC32 0x2b3d1fda  Delete_rows: table id 110# at 156930266#210711  2:00:05 server id 255  end_log_pos 156938414 CRC32 0x78874052  Delete_rows: table id 110# at 156938414 #210711  2:00:05 server id 255  end_log_pos 156946567 CRC32 0xb67779fe  Delete_rows: table id 110# at 156946567#210711  2:00:05 server id 255  end_log_pos 156954729 CRC32 0x0c0f8899  Delete_rows: table id 110# at 156954729#210711  2:00: 05 server id 255 end_log_pos 156962870 CRC32 0x5f79436d Delete_rows: table id 110.......

From the parsed binlog, it can be found that it is a delete operation, delete from tabname where create_time <= date_sub(CURRENT_TIMESTAMP, interval 24 hour), the SQL deletes all data before 24 hours, and this operation is exactly the 200M transaction, to The specific reason is known here, so how to solve the problem?

solution:

1. Then wait, the application will always end. You can accept this, but the business personnel probably cannot accept it;

2. Temporarily adjust the sync_binlog and innodb_flush_log_at_trx_commit parameters from double 1 to 0, and then the playback speed will be much faster;

Remarks: Anyway, it is restored on a different machine, security is not important, and the online environment should be carefully considered;

Finally, communicate with business personnel to confirm the rationality of the operation, and the business side splits a large transaction into multiple small transactions for execution;

Well, let’s talk so much first, everyone has a better way, welcome to leave a message to learn and communicate together