PostgreSQL Replication Slot Analysis and Research

PostgreSQL 9.4 begins to support the new feature of Replication Slot. Regarding Replication Slot, there is not much introduction in the document, and it is difficult to understand what it does at first glance. In fact, the emergence of Replication Slot mainly paves the way for the eventual realization of logical replication and bidirectional replication in the PG kernel (currently, logical and bidirectional replication still lacks many core functional points in the kernel, and requires the help of the BDR plug-in. See the PG official wiki for introduction. The reasons behind Replication Slot can be found here). However, as a major feature of version 9.4, it can be used not only for logical replication, but also for physical replication (or Streaming Replication). Replication Slot for physical replication is called Physical Replication Slot.

Since everyone currently only uses the physical replication method that comes with PG, we will focus on analyzing the Physical Replication Slot.

Using Physical Replication Slot, you can achieve two effects:

A) It can ensure that the logs required by the slave library (standby) are not backed up prematurely, causing the slave library to fail and the following error occurs:

ERROR: requested WAL segment 00000001000000010000002D has already been removed

Through the slave database status recorded by Replication Slot, PG will ensure that the slave database has no apply logs and will not clear or archive them from the log directory of the master database. Moreover, the status information of the replication slot is saved persistently. Even if the slave database is disconnected or the master database is restarted, this information will not be lost or invalidated.

B) When the application is allowed to connect to the slave database for read-only queries, Replication Slot can be used in conjunction with the runtime parameter hot_standby_feedback, so that the vacuum operation of the main database will not prematurely clear the records required for the slave database query, and the following error will occur ( The reason for the error will be explained in detail below):

ERROR: canceling statement due to conflict with recovery

Let’s take a look at the usage and kernel implementation of Physical Replication Slot.

Usage

The following are the steps to enable Replication Slot, it is very simple:

1) First, you need to configure the main and slave libraries of Steaming Replication. The parameters involved are listen_addresses (=’*’), hot_standby (=on), wal_level (=hot_standby), max_wal_senders (=1), especially pay attention to configuring max_replication_slots to be greater than or equal to 1. These parameters should be consistent between the master and slave libraries.

2) Create a replication slot in the main library:

postgres=# SELECT * FROM pg_create_physical_replication_slot( & amp;#039;my_rep_slot_1 & amp;#039;);
slot_name | xlog_position
--------------- + ---------------
my_rep_slot_1 |

At this time, the replication slot is not in the active state yet.

3) Configure recovery.conf in the slave library as follows, and then restart the slave library:

standby_mode = & amp;#039;on & amp;#039;
primary_slot_name = 'my_rep_slot_1 '
primary_conninfo = 'user=pg001 host=10.x.x.x port=5432 sslmode=prefer sslcompression=1 krbsrvname=postgres '

4) Observe the status changes of the main database replication slot:

postgres=# SELECT * FROM pg_replication_slots;
slot_name | plugin | slot_type | datoid | database | active | xmin | catalog_xmin | restart_lsn
--------------- + -------- + ----------- + -------- + ---- ------ + -------- + ------ + -------------- + ------------ -
my_rep_slot_1 | | physical | | | t | 1812 | | 0/3011A70

5) Used in conjunction with hot_standby_feedback. Set the hot_standby_feedback option in the postgresql.conf file of the slave library to on and restart the slave library.

Kernel implementation

The replication slot is added to the kernel by the following patch:

author Robert Haas & amp;lt;[email protected] & amp;gt;
Sat, 1 Feb 2014 03:45:17 + 0000 (22:45 -0500)
committer Robert Haas <[email protected]>
Sat, 1 Feb 2014 03:45:36 + 0000 (22:45 -0500)
Replication slots are a crash-safe data structure which can be created
on either a master or a standby to prevent premature removal of
write-ahead log segments needed by a standby, as well as (with
hot_standby_feedback=on) pruning of tuples whose removal would cause
replication conflicts. Slots have some advantages over existing
techniques, as explained in the documentation.

This patch changes a lot of files. When analyzing these codes, we focus on the following issues:

A) How is Replication Slot created in the kernel?

By analyzing the function ReplicationSlotCreate called when creating a Replication Slot, it can be seen thatReplication Slot is essentially some data structures in memory, plus a binary status file persisted to the pg_replslot/ directory. When PG starts, the memory used by these data structures (i.e. an array with a size of max_replication_slots) is allocated in advance in the shared memory. These data structures begin to be used when users create Replication Slots. After a Replication Slot is created and used, its data structure and status file will be updated by the sender (wal_sender) process of WAL (Write-Ahead-Log).

In addition, if we simply look at the name of the Replication Slot, it is easy to think that the Replication Slot will create a new connection with the slave database and send logs. In fact, no new connection to the slave database is created during the creation process. Replication Slot still uses the original connection of wal_sender (since there is one wal_sender connection for each slave database, there is only one active Replication between a slave database and the master database. Slot).

B) How is the status of Replication Slot updated?

It is easy to find that there are two situations when the status of Replication Slot is updated.

The first is that the function ProcessStandbyHSFeedbackMessage is updated. This function is called when processing the feedback reply message received by wal_sender and sent back from the library. Through this function, we can see that the Replication Slot of each wal_sender process (the Replication Slot created by the user) is saved in the global variable MyReplicationSlot. When processing the reply sent back from the library, the xmin information in the reply will be extracted, stored in the data.xmin and effective_xmin fields of the slot, and finally updated to the system’s global procArray->replication_slot_xmin structure through the function ProcArraySetReplicationSlotXmin ( to make it visible to all processes) and complete the slot update.

It should be noted here that if we have multiple Replication Slots (corresponding to their respective slave libraries), when updating the global structure procArray->replication_slot_xmin, the smallest xmin value among all slots will be selected.

The second is in ProcessStandbyReplyMessage. This function processes the restart lsn information sent from the slave library (that is, the number of the log applied by the slave library), directly updates it to the restart lsn field of the replication slot, and saves it to the disk, which is used by the main library to determine whether to retain the log. Be archived.

C) How does Replication Slot cooperate with hot_standby_feedback to avoid query conflicts from the slave database?

Here, the query conflict of the slave database refers to the following situation: there is a running query on the slave database, and the running time is very long; at this time, the main database is doing normal vaccum and clearing out useless record versions. However, the vaccum of the main library does not know that the query from the slave library exists, so when clearing, the running query of the slave library is not considered, only the transaction status in the main library is considered. As a result, vacuum may clear out record versions involved in database queries that are still in use. When these vaccum operations are synchronized to the slave database through logs, and the query from the slave database has not yet finished running, vaccum will wait or cancel the query to ensure that synchronization continues normally and the query does not produce erroneous results. In this way, whenever the user runs a long query from the database, the query conflict error mentioned above is prone to occur.

How to avoid this conflict? The best solution currently is to use hot_standby_feedback + Replication Slot. The principle is simply that the slave library represents the record version information that its query depends on as a transaction ID, and sends it to the master library in the reply sent back to the master library wal_sender (see function XLogWalRcvSendHSFeedback) , and finally transmitted it to the main database vaccum, asking it to retain people under the knife, and the relevant records will not be cleared for the time being.

The specific process is that in the slave library, the function XLogWalRcvSendHSFeedback calls GetOldestXmin to obtain xmin and puts it into the reply to the main library. After the wal_sender of the main library receives it, if Replication Slot is used, it will put this xmin into the status information of the slot, and update the minimum xmin of all slots in the system at this time. How is the minimum xmin of all slots in this system transmitted to vacuum in the main library? Taking the automatically triggered vacuum operation as an example, the logic sequence is as follows:

GetSnapshotData (when the vacuum transaction starts, get the slot xmin and store it in the global variable) ->vacuum_set_xid_limits (call GetOldestXmin, get the system xmin and slot xmin through the global variable, take the smaller value) -> vacuum_lazy (use xmin to determine which record versions can be cleared)

In this way, using the Replication Slot channel, query conflicts from the database are solved.

Notes

Finally, let’s introduce the precautions for using Replication Slot:

1) If the reply from the slave library cannot be received, the status of the Replication Slot restart lsn will remain unchanged, causing the master library to keep local logs, which may cause the log disk to become full. Therefore, log disk usage should be monitored in real time and a smaller wal_sender_timeout should be set to detect slave disconnections as early as possible.

2) When hot_standby_feedback is set to on, note that if slow queries occur in the slave database for a long time, the xmin sent back to the main database may change slowly, and the vaccum operation of the main database will stagnate, causing the size of tables that are frequently updated in the main database to explode. increase.

In addition to physical replication, Replication Slot is of greater significance to logical replication. We look forward to seeing it come into play in version 9.5 where logical replication functionality may appear.