rsync backup tool (with rsync+inotify real-time synchronization deployment example)

rsync

  • 1. Overview of rsync
    • 1.1 About rsync
    • 1.2 Features of rsync
    • 1.3 Working principle
  • 2. rsync related commands
    • 2.1 Basic format and common options
    • 2.2 Start and shut down the rsync service
    • 2.3 Basic format of downlink synchronization
    • 2.4 Basic format of upstream synchronization
    • 2.4 No interaction
      • 2.4.1 Specify password file
      • 2.4.2rsync-daemon method
      • 2.4.3 rsync-ssh method
    • 2.5 Regular synchronization
  • 3. Deploy rsync for regular synchronization (upstream synchronization + downstream synchronization)
    • 3.1 Upstream synchronization
      • Step1 Configure serverP
      • Step2 Configure serverQ
      • Step3 The client synchronizes data to the server
      • Step4 Observe whether the uplink synchronization is successful
    • 3.2 Downlink synchronization
      • Step1 Modify the rsync configuration file and restart the service
      • Step2 Downlink synchronization effect test
  • 4. Combine with inotify to realize rsync real-time synchronization
    • 4.1 Why use real-time synchronization
    • 4.2 Principles
    • 4.3 inotify kernel parameters
    • 4.4 inotify-tools (providing auxiliary tools)
  • 5. Deploy real-time synchronization
    • Step1 Modify the rsync source server configuration file
    • Step2 The sending end adjusts the inotify kernel parameters
    • Step3 Install inotify-tools on the sending side
    • Step4: The sender writes a triggered synchronization script
    • Step5 Real-time synchronization effect test
  • 6. Use rsync to quickly delete a large number of files
    • 6.1 Usage background
    • 6.2 Simulation implementation

1. Overview of rsync

1.1 About rsync

Rsync (Remote sync) is a remote data synchronization tool, a fast incremental backup tool, used on multiple platforms such as unix/Linux/windows.

Rsync uses the so-called “Rsync algorithm” to synchronize files between the local and remote hosts. This algorithm only transmits different parts of the two files instead of transmitting the entire file each time, so Quite fast.

The machine running Rsync server is also called backup server. One Rsync server can back up the data of multiple clients at the same time; multiple Rsync servers can also back up the data of one client.

Rsync can be used with rsh or ssh or even in daemon mode.

The Rsync server will open a 873 service channel (port) and wait for the other party’s Rsync connection.
When connecting, the Rsync server will check whether the password matches. If it passes the password check, the file transfer can begin.

When the first connection is completed, the entire file will be transferred once, and the next time only the different parts of the two files will be transferred.

Official website: http://rsync.samba.org

1.2 Characteristics of rsync

The entire directory tree and file system can be mirrored;

It is easy to maintain the original file permissions, time, soft and hard links, etc.;

No special permissions are required to install;

Optimized process and high file transfer efficiency;

You can use rcp, ssh, etc. to transfer files, and of course you can also connect through direct socket;

Supports anonymous transmission.

1.3 Working Principle

In the remote synchronization task, the client responsible for initiating the rsync synchronization operation is called the initiator, and the client is responsible for font color=”orange”>Response to rsync synchronization operations from clientserver >Called rsync synchronization source.

First, server B (original source) performs data backup to server A (synchronization source), and backs up its own data to server A.

When the data in server B is lost or incremented, the data will be synchronized from server A.

If the data on server B is lost, the lost part of the data will be synchronized from server A.

When the data of server B increases, the data will be backed up to server A again, but the backup will not be a complete backup, but an incremental backup, that is, the data that is not in the synchronization source will be backed up.

2. rsync related commands

2.1 Basic format and common options

#Basic format
rsync [options] origin destination destination

font>Files that exist in the target location but not in the original location

Common options Description
-v Show details of the synchronization process
-z while transferring files Compression
-a Archive mode , retain file permissions, attributes and other information, is equivalent to the combination option “-rlptgoD”
-r Recursive mode, including all files in the directory and subdirectories
-l Still copy symbolic link files For symbolic link files
-p Preserve the file’s permission flag
-t Keep the time stamp of the file
-g Keep the group tag of the file (only for super users)
-o Preserve the owner mark of the file (only for super users)
-H Keep hard link files
-A Keep ACL attribute information
-D Keep device files and other special files
–delete Delete
–checksum According to checksum (rather than file size or modification time) to decide whether to skip the file

2.2 Start and shut down the rsync service

Start service

#Start the rsync service and run it as an independent listening service (daemon process)
rsync --daemon

Close service

#Close rsync service
kill $(cat /var/run/rsyncd.pid)
rm -rf /var/run/rsyncd.pid

2.3 Basic format of downstream synchronization

Pull data from the source server

rsync [options] Source server location local location


##for example#
#Format one
rsync -avz [email protected]::message /opt/

#Format 2
rsync -avz rsync://[email protected]/message /opt/

#test is the authorized account in the configuration file
#IP address is the synchronization source address
#message is the shared module defined in the configuration file

2.4 Basic format of upstream synchronization

Push data to the source server

rsync [options] local location source server location

2.4 No interaction

2.4.1 Specify password file

echo "abc123" > /etc/server.pass
chmod 600 /etc/server.pass

2.4.2rsync-daemon mode

rsync -avz --delete --password-file=/opt/userlist [email protected]::wwwky31 /opt/data/ #rsync-daemon method

2.4.3 rsync-ssh mode

rsync -avz --delete -e 'sshpass -p abc1234 ssh -p 22' /etc/yum.repos.d [email protected]:/opt/data #rsync-ssh method

2.5 Regular synchronization

Combined with crontab scheduled tasks, regular synchronization can be achieved.

#Give an example
crontab -e
30 22 * * * /usr/bin/rsync -az --delete --password-file=/etc/server.pass [email protected]::wwwroot /opt/
#In order to avoid entering a password during the synchronization process, you need to create a password file to save the password of the backuper user, such as /etc/server.pass.
#Use the option "--password-file=/etc/server.pass" to specify when performing rsync synchronization.

systemctl restart crond
systemctl enablecrond

3. Deploy rsync regular synchronization (upstream synchronization + downstream synchronization)

Server IP
Sync source 1 192.168.2.100
Sync source 2 192.168.2.103
Client 192.168.2.102

Preparation

rpm -q rsync #Whether rsync has been installed

rpm -qc rsync #rsync configuration file location

#Automatically turn off the firewall when booting
systemctl disable firewalld --now

#Permanently close selinux
sed -i 's/enforcing/disabled/' /etc/selinux/config

3.1 Upstream synchronization

rsync client synchronizes data to rsync server.

Step1 Configure serverP

vim /etc/rsyncd.conf #Add the following configuration items

uid=root
gid=root
use chroot = yes #imprisoned in the source directory
address = 192.168.2.100 #Listening address
port=873
#Listening port tcp/udp 873, can be viewed through cat /etc/services | grep rsync
log file = /var/log/rsyncd.log #Log file location
pid file = /var/run/rsyncd.pid #The file location where the process ID is stored
hosts allow = 192.168.2.0/24 #Client address allowed to be accessed
dont compress = *.gz *.bz2 *.tgz *.zip *.rar *.z #File types that are no longer compressed during synchronization


[message] #Shared module name
path = /data #The actual path of the source directory
comment = test
write only = yes #Whether it is read-only
auth users = test #Authorized accounts, multiple accounts separated by spaces
secrets file = /etc/rsyncd_users.db #Data file to store account information

#If you use anonymous mode, just remove the "auth users" and "secrets file" configuration items.
#Create data files for backup accounts

#Add password to authorized account
vim /etc/rsyncd_users.db

test:abc123 #No need to create a system user with the same name

#Set only the owner of the file can read and modify the password file
chmod 600 /etc/rsyncd_users.db

#Ensure that all users have read permissions to the source directory/data
mkdir /data
chmod + r /data

#Start the rsync service and run it as an independent listening service (daemon process)
rsync --daemon
#Observe whether the startup is successful
ss -napt | grep rsync

Step2 Configure serverQ

The configuration is the same as that of serverP. Just change the listening IP address to the IP of serverQ.

vim /etc/rsyncd.conf
...
uid=root
gid=root
use chroot = yes
address=192.168.2.103
port=873
log file = /var/log/rsyncd.log
pid file = /var/run/rsyncd.pid
hosts allow = 192.168.2.0/24
dont compress = *.gz *.bz2 *.tgz *.zip *.rar *.z


[message]
path = /data
comment = test
write only=yes
read only=no
authusers=test
secrets file = /etc/rsyncd_users.db

chmod 600 /etc/rsyncd_users.db

mkdir /data
chmod + r /data

#Start service
rsync --daemon

Step3 The client synchronizes data to the server

#Sync to serverP
rsync -avz /data/message [email protected]::message

#Synchronize to serverQ
rsync -avz /data/message [email protected]::message

Step4 Observe whether the uplink synchronization is successful


3.2 Downstream synchronization

Downstream synchronization only requires modifying the rsync configuration file, change the shared module to read-only mode.

Step1 Modify the rsync configuration file and restart the service

vim /etc/rsyncd.conf
...
readonly=yes
....

#Restart service
kill $(cat /var/run/rsyncd.pid)
rm -rf /var/run/rsyncd.pid

Step2 Downlink synchronization effect test

rsync -avz [email protected]::message /data

4. Combine with inotify to realize rsync real-time synchronization

4.1 Why use real-time synchronization

Disadvantages of regular synchronization

The backup execution time is fixed, with obvious delay and poor real-time performance;

When the synchronization source does not change for a long time, intensive periodic tasks are unnecessary.

Advantages of real-time synchronization

Once the synchronization source changes, start the backup immediately;

As long as there are no changes to the synchronization source, the backup will not be performed.

4.2 Principle

Initiator configuration rsync + inotify.

Using the inotify notification interface, you can monitor various changes in the file system, such as file access, deletion, movement, modification, etc.

Using this mechanism, you can easily implement file change alarms, incremental backups, and respond promptly to changes in directories or files.

Combining the inotify mechanism with the rsync tool can achieve triggered backup (real-time synchronization), that is, as long as the document in the original location changes, the incremental backup operation will be started immediately;

Otherwise, it is in a silent waiting state. In this way, problems such as delays and excessive cycles that exist when backing up on a fixed cycle are avoided.

Because the inotify notification mechanism is provided by the Linux kernel, it is mainly used for local monitoring. is more suitable for upstream synchronization when applied in triggered backup.

4.3 inotify kernel parameters

In the Linux kernel, the default inotify mechanism provides three control parameters

1) max_queue_events (monitoring event queue, default value is 16384)

2) max_user_instances (maximum number of monitoring instances, default value is 128)

3) max_user_watches (maximum number of monitored files per instance, default value is 8192)

When the number of directories and files to be monitored is large or changes frequently, it is recommended to increase the values of these three parameters.

4.4 inotify-tools (providing auxiliary tools)

inotify-tools is installed to provide inotifywait and inotifywatch auxiliary tool programs for monitoring and summarizing changes.
inotifywait: It can monitor various events such as modify (modify), create (create), move (move), delete (delete), attrib (attribute change) and other events, and output the results immediately as soon as there is a change.
inotifywatch: Can be used to collect file system changes and output the summarized changes after the run.

5. Deploy real-time synchronization

Server IP address
Source server 192.168.2.100
Sender (client) 192.168.2.102
#Automatically turn off the firewall when booting
systemctl disable firewalld --now

#Permanently close selinux
sed -i 's/enforcing/disabled/' /etc/selinux/config

Step1 Modify the rsync source server configuration file

vim /etc/rsyncd.conf
...
read only = no #Turn off read-only, upstream synchronization needs to be writeable

#Restart service
kill $(cat /var/run/rsyncd.pid)
rm -rf /var/run/rsyncd.pid
rsync --daemon
ss -natp | grep rsync

mkdir /data
chmod 777 /data
#Free interaction
echo "abc123" > /etc/server.pass
chmod 600 /etc/server.pass

Step2 The sending end adjusts the inotify kernel parameters

vim /etc/sysctl.conf
fs.inotify.max_queued_events = 16384
fs.inotify.max_user_instances = 1024
fs.inotify.max_user_watches = 1048576

sysctl-p

Step3 Install inotify-tools on the sender

tar zxvf inotify-tools-3.14.tar.gz -C /opt/

cd /opt/inotify-tools-3.14
./configure
make -j2 & amp; & amp; make install

Step4 The sender writes a triggered synchronization script

Note that the script name cannot contain the rsync string, otherwise the script may not take effect.

vim /opt/inotify.sh

#!/bin/bash
INOTIFY_CMD="inotifywait -mrq -e modify,create,attrib,move,delete /var/www/html/"
RSYNC_CMD="rsync -avzH --delete --password-file=/etc/server.pass /data [email protected]::message/"
#Use while and read to continuously obtain monitoring results. Based on the results, you can further determine whether the output monitoring records have been read.
$INOTIFY_CMD | while read DIRECTORY EVENT FILE
do
    if [ $(pgrep rsync | wc -l) -le 0 ] ; then
#If rsync is not executing, start immediately
        $RSYNC_CMD
    fi
done

The above script is used to detect changes in the local /var/www/html directory. Once there is an update, the rsync synchronization operation is triggered and the backup is uploaded to the wwwroot shared directory of the server 192.168.80.10.
chmod + x /opt/inotify.sh
chmod 777 /data
chmod +x /etc/rc.d/rc.local
echo '/opt/inotify.sh' >> /etc/rc.d/rc.local #Add automatic execution at boot

Step5 Real-time synchronization effect test

#Verification process of triggered uplink synchronization
1) Run the /opt/inotify.sh script program on your local machine;

2) Switch to the /data directory of the local machine and perform operations such as adding, deleting, and modifying files;

3) Check the changes in the message directory in the remote server.

6. Use rsync to quickly delete a large number of files

6.1 Using background

When using rm -rf * to delete a large number of files, the efficiency is low.

At this time, using rsync’s replacement principle, combined with the --delete option, you can achieve quick deletion A large number of files, such as service caches.

6.2 Simulation implementation

#Generate a large number of junk files to simulate the production environment
mkdir /opt/d1
cd /opt/d1
touch {<!-- -->1..9999}.txt

#empty folder
mkdir /opt/test

rsync --delete-before -avH --progress --stats /opt/test/ /opt/d1/

--delete-before: The recipient performs delete operations before transmission
-a: Archive mode, which means to transfer files recursively and keep all file attributes
-H: keep hard linked files
-v: Verbose output mode
--progress: Display the transfer process in the transfer room
--stats: gives the transfer status of certain files