postgresql14 management (6) – backup and recovery

Definition

Backup: Copy the files or structure and data of the database to other locations for storage through physical copy or logical export;

Restore: Partial recovery. Use the backup file to restore the database to the state at the time of backup. Data changes after this point in time cannot be recovered;

Recovery: First restore, and then use the log files (WAL) generated from the backup time point to the failure point to restore the database to the latest state;

Category

Physical Backup: Directly copies database-related files. In general, physical backup is faster than logical backup and takes up more space. PostgreSQL supports online and offline physical backup. In the actual environment, physical backup should be the main one.

Logical Backup: Export the structure and data of the database into a sql file. During restoration, the database is rebuilt and the data is restored through the SQL statements and commands in the file. Logical backup usually requires more backup time and restore time. Logical backups can supplement physical backups or be used for data import and export for testing purposes.

Method

Online Backup, Hot Backup: backup when pg is in startup state;

Offline Backup, Cold Backup: Backup when pg is closed;

Method

Full Backup: Back up everything at once, with long backup time and short recovery time;
Incremental Backup: Backs up the last changed files, with short backup time and long recovery time;
Differential Backup: Back up all files that have changed after the first full backup, with the backup and recovery time centered;


pg achieves the effect of incremental backup through a baseline backup (Base Backup) and continuous backup of transaction log files (WAL).

Tools

pg_dump: a logical backup tool that supports the export of a single database (the schema and table can be specified), and the export format can be selected;

pg_dumpall: logical backup tool, used to export the entire database cluster, including public global objects;

pg_basebackup: Physical backup tool, creates a baseline backup for the database cluster. Or set up the initialization of slave nodes based on log transmission or streaming replication;

psql: an interactive command line tool that can also be used to import SQL files generated by logical backups;

pg_restore: logical restore tool, used to restore backup files in archive format exported by pg_dump;

COPY: Proprietary SQL statement to copy data in the table to a file, or copy data in a file to a table;

Third-party open source: pgAdmin, Barman, pg_probackup, pgBackRest

Logical backup and recovery

#plain format: Dump a readable script
pg_dump -U postgres -d hrdb -f "D:\bak\hrdb.sql"
#import
pgsql -U postgres newdb -f "D:\bak\hrdb.sql"

#-Fc Custom format, cannot be viewed directly
pg_dump -U postgres -d hrdb -f "D:\bak\hrdb.dmp" -Fc
#recover
pg_restore -U postgres -d hrdb "D:\bak\hrdb.dmp"

Back up the entire pg

#You must enter a password to connect to each database, which can be replaced by the password option.
pg_dumpall -U postgres -f "D:\bak\cluster.sql"
#recovery
psql -f "D:\bak\cluster.sql" postgres

Export table data

---a means exporting only data (excluding structures), -t specifies the tables to be exported, and -T means excluded tables
pg_dump -a -t 'emp*' -T employees testdb > testdb_table
-- You can also specify other export formats when exporting, and use corresponding methods to import data.
#Connect to the corresponding database
\c hrdb
#COPY supports different writing/reading file formats: text, csv or binary. The default is text.
COPY employees TO "D:\bak\cluster.txt" (format csv, header true)
#Change to csv format, including table headers

#Copy export data to new table
COPY newemployees FROM "D:\bak\cluster.txt" (format csv, header true)

Back up large database

#Dump first and then compress
pg_dumpall | gzip > cluster.sql.gz
#Extract first and then import
gunzip -c cluster.sql.gz | psql new
#Split the export files into clusteraa, clusterab
pg_dumpall | split -b 1G - cluster
#recover
cat cluster* | psql newdb
COPY products to PROGRAM 'gzip > /bak/products.dat.gz'
COPY products FROM PROGRAM 'gunzip < /bak/products.dat.gz'
#Concurrent backup
pg_dump -j 8 -F d -f out.dir testdb
#reduction
pg_restore -j 8 -d newdb testdb_dir

Physical backup and recovery

Since offline backup requires stopping the service, there are few usage scenarios. The following describes online backup.

All modifications to data by pg will be written to a file called a write-ahead log (WAL), located in the pg_wal (pg_xlog) subdirectory of the data directory. When a system failure requires recovery, database recovery can be performed by redoing the WAL log since the last checkpoint.
Based on the above mechanism, you can first create a file-level full backup, and then continuously back up the generated WAL files to achieve the effect of incremental backup. These backup operations can all be performed online, making them critical for business systems that cannot tolerate downtime.

Write-Ahead Logging is a standard method to achieve reliability and data integrity, while also reducing
Disk IO operations to improve database performance.

Archive log mode (archive log)

Configuration

Modify postgresql.conf

#Record information level, optional minimal, replica, logical
wal_level = replica
#Whether to start log archiving
archive_mode = on
#Execute log archiving command
# test checks whether the file exists, ! is inverted, %f is an archive file, so it is judged that the file does not exist
# %p is the full path of the file
# windows changed to 'copy %p D:\bak\%f'
archive_command = 'test ! -f /var/lib/pgsql/wal_archive/%f & amp; & amp; cp %p
/var/lib/pgsql/wal_archive/%f'
# Timeout for document switching
archive_timeout = 300

Start service pg_ctl restart

-- View wal archive status
SELECT * FROM pg_stat_archiver

Baseline backup

This function will create a backup label file named backup_label in the data directory of the cluster to record the relevant information of this backup, including the start time and backup identification.

-- Start a backup
SELECT pg_start_backup('basebackup_date');

In addition, if there is a custom table space, a table space mapping file named tablespace_map will also be created.
Information about tablespace symbolic links in pg_tblspc/.

PostgreSQL enters backup mode, and then you can use operating system commands (cp, tar, rsync, etc.) to copy the data directory

cp -r /var/lib/pgsql/14/data/ /var/lib/pgsql/14/backups/basebackup_date/

This function will terminate backup mode and perform a WAL segment file switch.

-- End backup
SELECT pg_stop_backup();

In addition, this function will also generate a wal segmented number.backup’ file in the pg_wal (pg_xlog) directory
(00000001000000000000002E.00000060.backup), which stores the historical information of this backup.

Point-in-Time Recovery

#Table created after baseline backup
\c hrdb
create table t1(id int);
insert into t1 values(1);
pg_ctl stop
mkdir /monidiushi
mv /var/lib/pgsql/14/data/* /monidiushi
#Startup prompt directory is not the database cluster directory
pg_ctl start

#Restore from baseline backup
cp -R /var/lib/pgsql/14/backups/basebackup_date/* /var/lib/pgsql/14/data/
#Check symbolic links
ls /var/lib/pgsql/14/data/pg_tblspc/ -l

#Delete the wal log of the baseline backup
rm -rf /var/lib/pgsql/14/data/pg_wal/*
#Copy unarchived wal files to the original directory
cp -R /monidiushi/pg_wal/* /var/lib/pgsql/14/data/pg_wal

restore_command is used to specify the command to restore archived WAL files, and recovery_target_time is used to set the recovery target, that is, the point in time to which the recovery will be performed. pg supports multiple recovery target settings, and the default is to restore to the latest state.

#Create recovery.conf
echo "restore_command = 'cp /var/lib/pgsql/wal_archive/%f %p'
#recovery_target_time = '2019-3-28 12:05 GM'" >> recovery.conf

pg_ctl start
#Under windows: pgctl start -D "data absolute path"

examine

select * from t1;