NFS lock related issues
Foreword
The source of the problem is the lock control problem of the nfs shared directory. Regarding the shared file system, there are usually two requirements
Requirement One
There needs to be a common lock, and then the client software will determine which machine the lock is on, and then manage the service based on this
Requirement 2
Using common files, clients cannot have locks between each other, otherwise the services cannot be started at the same time, so there may be problems when high availability is required
This article is based on the actual situation, let’s see how to deal with the nfs service, how to open the lock, how to close the lock, this is configured according to the actual situation
Practice
Prepare three machines, one server to share nfs data, and the other two machines to access this shared directory at the same time, using nfs v3
lab201 nfs server
lab202 nfs client
lab203 nfs client
Configure nfs service in lab201
[root@lab201 ~]# chmod 777 /nfsshare/ [root@lab201 ~]# cat /etc/exports /nfsshare *(fsid=123,no_root_squash,async,no_subtree_check,rw)
Use the default parameters to mount the nfs service in both lab202 and lab203
[root@lab202 ~]# mount -t nfs -o v3 192.168.0.201:/nfsshare /mnt [root@lab203 ~]# mount -t nfs -o v3 192.168.0.201:/nfsshare /mnt
View mount parameters
192.168.0.201:/nfsshare on /mnt type nfs (rw,relatime,vers=3,rsize=524288,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec =sys,mountaddr=192.168.0.201,mountvers=3,mountport=892,mountproto=udp,local_lock=none,addr=192.168.0.201)
Check the server’s
[root@lab201 ~]# systemctl status rpc-statd ● rpc-statd.service - NFS status monitor for NFSv2/3 locking. Loaded: loaded (/usr/lib/systemd/system/rpc-statd.service; static; vendor preset: disabled) Active: active (running) since Fri 2021-10-08 14:52:07 CST; 2min 25s ago Process: 8400 ExecStart=/usr/sbin/rpc.statd $STATDARGS (code=exited, status=0/SUCCESS) Main PID: 8405 (rpc.statd) Tasks: 1 CGroup: /system.slice/rpc-statd.service └─8405 /usr/sbin/rpc.statd -p 662 -o 2020 Oct 08 14:52:07 lab201 systemd[1]: Starting NFS status monitor for NFSv2/3 locking.... Oct 08 14:52:07 lab201 rpc.statd[8405]: Version 1.3.0 starting Oct 08 14:52:07 lab201 rpc.statd[8405]: Flags: TI-RPC Oct 08 14:52:07 lab201 systemd[1]: Started NFS status monitor for NFSv2/3 locking..
Introduced here, when multiple clients of nfs are mounted at the same time, the service of rpc-statd is used to handle the lock of multiple machines. The default is to lock through this service, that is, the remote lock
If there is a lock, let’s see what happens
Run the service on lab202 and lock the file
[root@lab202 mnt]# flock lockfile -c top
At this time, the top command has output, and the file is locked. We execute it on another machine lab203
[root@lab203 mnt]# flock lockfile -c top
It can be seen that this command has no output, and is in a stuck state. This is after waiting for the lockfile to be released, and this command can be output. Let’s interrupt the command just above lab202
The following command of lab203 will output immediately. At this time, the remote lock is normally enabled.
Now that we know that the above rpc-statd manages locks, let’s stop that service and see what happens
Note that the operation steps are as follows:
-
Uninstall the client
-
restart nfs service
-
close rpc-statd
-
mount client
The above steps must be
Then even if a machine executes it, it will report the following and cannot be locked
[root@lab202 mnt]# flock lockfile -c top flock: lockfile: No locks available
This situation is actually problematic, that is, the default mount gives the lock to the remote nfs server, but the remote cannot lock it, so a prompt appears that the lock cannot be locked
We verify services
Put the mysql data into the above configuration without lock directory
[root@lab202 mariadb]# df -h /var/lib/mysql/ Filesystem Size Used Avail Use% Mounted on 192.168.0.201:/nfsshare 80G 68G 13G 84% /var/lib/mysql [root@lab202 mariadb]# ls /var/lib/mysql/ aria_log.00000001 ibdata1 ib_logfile1 mysql test aria_log_control ib_logfile0 lockfile performance_schema
Start the service, you can see that it is stuck
[root@lab202 mysql]# systemctl start mariadb
we check the log
[root@lab202 mariadb]# tail -n 50 /var/log/mariadb/mariadb.log 211008 15:07:07 InnoDB: Using Linux native AIO 211008 15:07:07 InnoDB: Initializing buffer pool, size = 128.0M 211008 15:07:07 InnoDB: Completed initialization of buffer pool InnoDB: Unable to lock ./ibdata1, error: 37 211008 15:07:13 InnoDB: Retrying to lock the first data file InnoDB: Unable to lock ./ibdata1, error: 37 InnoDB: Unable to lock ./ibdata1, error: 37 InnoDB: Unable to lock ./ibdata1, error: 37
You can see this place, even if the mysql of a machine, this service cannot be started, so we can draw a conclusion that the start of the mysql service needs to lock the file
Then we do not provide a lock at the remote end, and this service cannot be started, but if the lock is started, then if two servers are to be started, it is impossible to get the remote lock at the same time
Then there should be only one situation for the processing of this place. We need to let each client handle its own lock situation. Locks are sometimes provided through the kernel lockd, and there is no place to close them. NFS is controlled through the client side, and mount parameters are provided to handle this situation.
Let’s look at the description to make it clearer
[root@lab202 ~]# man 5 nfs lock / nolock Selects whether to use the NLM sideband protocol to lock files on the server. If neither option is specified (or if lock is specified), NLM lock‐ ing is used for this mount point. When using the nolock option, apply‐ actions can lock files, but such locks provide exclusion only against other applications running on the same client. Remote applications are not affected by these locks. NLM locking must be disabled with the nolock option when using NFS to mount /var because /var contains files used by the NLM implementation on Linux. Using the nolock option is also required when mounting exports on NFS servers that do not support the NLM protocol.
You can look at the content. By default, the nlm lock will be enabled. If no mount parameters are added, if nolock is used, then the lock between machines will not be affected. The lock of this machine is provided by this machine.
When the /var/ directory is used to mount nfs, you need to use nolock, because it contains some functions that the nlm lock does not implement.
That is to mount the /var/lib/mysql service through nolock, and the local machine provides lock-related
mount again
[root@lab202 ~]# mount -t nfs -o v3,nolock 192.168.0.201:/nfsshare /var/lib/mysql/
[root@lab202 ~]# mount|grep mysql 192.168.0.201:/nfsshare on /var/lib/mysql type nfs (rw,relatime,vers=3,rsize=524288,wsize=524288,namlen=255,hard,nolock,proto=tcp,timeo=600,retrans= 2,sec=sys,mountaddr=192.168.0.201,mountvers=3,mountport=892,mountproto=udp,local_lock=all,addr=192.168.0.201)
Start mysql again, it can start normally
[root@lab202 ~]# systemctl start mariadb
We mount mysql on another machine, and we can see that it can be started directly
At present, both machines are mounted with nolock, we continue the previous flock test
Open two terminals on lab202
run
flock lockfile -c top
Open a terminal on lab203 to run
flock lockfile -c top
It can be seen that lab202 and lab203 can run successfully at the same time, and the other terminal of lab202 will be stuck waiting. This shows that the local lock is successful, and there is no lock across machines, which is the mode in which we need to run two machines mysql
When we look at the mount parameters above, we can see that
In the default mount parameters, local_lock=none, and after adding nolock, local_lock=all, it is automatically corrected to this option, that is, local lock
Let’s look at the parameter explanation of this piece
[root@lab202 ~]# man 5 nfs local_lock=mechanism Specifies whether to use local locking for any or both of the flock and the POSIX locking mechanisms. mechanism can be one of all, flock, posix, or none. This option is supported in kernels 2.6.37 and later. The Linux NFS client provides a way to make locks local. This means, the applications can lock files, but such locks provide exclusion only against other applications running on the same client. Remote applications are not affected by these locks. If this option is not specified, or if none is specified, the client assumes that the locks are not local. If all is specified, the client assumes that both flock and POSIX locks are local. If flock is specified, the client assumes that only flock locks are local and uses NLM sideband protocol to lock files when POSIX locks are used. If posix is specified, the client assumes that POSIX locks are local and uses NLM sideband protocol to lock files when flock locks are used. To support legacy flock behavior similar to that of NFS clients < 2.6.12, use 'local_lock=flock'. This option is required when exporting NFS mounts via Samba as Samba maps Windows share mode locks as flock. Since NFS clients > 2.6.12 implement flock by emulating POSIX locks, this will result in con‐ fighting locks. NOTE: When used together, the 'local_lock' mount option will be overridden by 'nolock'/'lock' mount option.
It can be seen that if the local is specified, the lock will use the local lock. If it is not specified, the local lock will not be used by default. If the nolock option is specified, this option will be overwritten
Let’s try not to specify nolock, directly specify the effect of local_lock
[root@lab202 ~]# mount -t nfs -o v3,local_lock=all 192.168.0.201:/nfsshare /var/lib/mysql/ [root@lab202 ~]# mount |grep mysql 192.168.0.201:/nfsshare on /var/lib/mysql type nfs (rw,relatime,vers=3,rsize=524288,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2, sec=sys,mountaddr=192.168.0.201,mountvers=3,mountport=892,mountproto=udp,local_lock=all,addr=192.168.0.201)
Test with the flock command, you can see that the effect is consistent with nolock
Summary
When we use nfs, if we need to use locks
First judge whether you need a local lock or a remote lock
If it is locked remotely, you need to check whether the rpc-statd service is started normally to provide remote lock service
If it is a local lock, you need to check whether local_lock=all is enabled, or whether nolock and local_lock=all are enabled at the same time
Through the above flock command, you can quickly judge the lock situation
Remarks
The locks of nfs v3 are handled through the nlm protocol, and the locks of nfs v4 are handled by nfs itself. This article is all about nfs v3