Redis cluster distributed lock master node downtime lock loss problem

Redis series directory

redis series – distributed lock
redis series – cache penetration, cache breakdown, cache avalanche
Redis series – Why is Redis so fast?
redis series – data persistence (RDB and AOF)
Redis series – consistent hash algorithm
Redis series – high availability (master-slave, sentinel, cluster)
redis series – things and optimistic locks
Redis series – data type geospatial: Is there a Lao Wang next door to you?
Redis series – data type bitmaps: Have you checked in today?
What is a Bloom filter?

1. Common implementation

Redis distributed lock most people will think of:

setnx + lua

set key value px milliseconds nx

The core implementation commands are as follows:

Get the lock (unique_value can be UUID, etc.)
SET resource_name unique_value NX PX 30000
Release the lock (in the Lua script, the value must be compared to prevent misunderstanding of the lock)
```
if redis.call("get",KEYS[1]) == ARGV[1]
then return redis.call("del",KEYS[1])
else return 0 end
```

There are three main points in this implementation method (which is also where the interview probability is very high):

The set command requires set key value px milliseconds nx;
value must be unique;
The value must be verified when releasing the lock, and the lock must not be misunderstood;

In fact, the biggest disadvantage of this type of lock is that it only acts on one Redis node when locking it. Even if Redis ensures high availability through the cluster, if the master node switches master-slave due to some reasons, the lock will be lost. Condition:

The lock is obtained on the Redis master node; but the locked key has not been synchronized to the slave node; the master fails, a failover occurs, and the slave node is upgraded to the master node; resulting in the lock being lost.

Because of this, Redis author antirez proposed a more advanced distributed lock implementation method based on the distributed environment: Redlock.

2. Redlock implementation

First of all, it needs to be explained that the establishment of Redlock’s solution is based on two premises:

It is no longer necessary to deploy slave libraries and sentinel instances, only the main library is deployed
However, multiple main libraries need to be deployed, and the official recommendation is at least 5 instances.
In other words, if you want to use Redlock, you must deploy at least 5 Redis instances, all of which are the main library. There is no relationship between them and they are all isolated instances.

The redlock algorithm proposed by antirez is roughly like this:

In the Redis distributed environment, we assume that there are N Redis masters. These nodes are completely independent of each other, and there is no master-slave replication or other cluster coordination mechanism. We ensure that we will use the same method to acquire and release locks on N instances as under a Redis single instance. Now assume that there are 5 Redis master nodes, and we need to run these Redis instances on 5 servers to ensure that they will not all go down at the same time.

In order to obtain the lock, the client should perform the following operations:

Get the current Unix time in milliseconds.
Try to acquire locks from 5 instances in sequence, using the same key and unique value (such as UUID).
When requesting a lock from Redis, the client should set a network connection and response timeout, which should be less than the lock expiration time.
For example, if your lock’s automatic expiration time is 10 seconds, the timeout should be between 5-50 milliseconds. This can avoid the situation where the server-side Redis has hung up and the client is still waiting for the response result.
If the server does not respond within the specified time, the client should try to obtain a lock from another Redis instance as soon as possible.
The client uses the current time minus the time when it started to acquire the lock (the time recorded in step 1) to get the time to acquire the lock.
The lock is considered successful if and only if the lock is obtained from more than half (N/2 + 1, here is 3 nodes) of the Redis nodes, and the time used is less than the lock expiration time.
If the lock is acquired, the real valid time of the key is equal to the valid time minus the time it takes to acquire the lock (the result calculated in step 3).
If for some reason, the lock acquisition fails (the lock is not acquired on at least N/2 + 1 Redis instances or the lock acquisition time has exceeded the effective time), the client should unlock on all Redis instances (even if some The Redis instance is not locked successfully at all, which prevents some nodes from acquiring the lock but the client does not receive a response and the lock cannot be reacquired for a period of time).

3. Redlock source code

Redisson already encapsulates the redlock algorithm. Next, we will briefly introduce its usage and analyze the core source code (assuming 5 redis instances).

POM depends on org.redisson redisson 3.3.2 usage

First, let’s take a look at the usage of distributed lock implemented by the redlock algorithm encapsulated by redission. It is very simple and somewhat similar to reentrant lock (ReentrantLock):

Config config1 = new Config();
config1.useSingleServer()
       .setAddress("redis://192.168.0.1:5378")
       .setPassword("a123456").
       setDatabase(0);
RedissonClient redissonClient1 = Redisson.create(config1);

Config config2 = new Config();
config2.useSingleServer()
        .setAddress("redis://192.168.0.1:5379")
        .setPassword("a123456").
        setDatabase(0);
RedissonClient redissonClient2 = Redisson.create(config2);

Config config3 = new Config();
config3.useSingleServer()
       .setAddress("redis://192.168.0.1:5380")
       .setPassword("a123456")
       .setDatabase(0);
RedissonClient redissonClient3 = Redisson.create(config3);

String resourceName = "REDLOCK_KEY";
//Each lock has a unique value
RLock lock1 = redissonClient1.getLock(resourceName);
RLock lock2 = redissonClient2.getLock(resourceName);
RLock lock3 = redissonClient3.getLock(resourceName);

//Try to lock 3 redis instances
RedissonRedLock redLock = new RedissonRedLock(lock1, lock2, lock3);
boolean isLock;
try {
// isLock = redLock.tryLock();
// If the lock cannot be obtained within 500ms, it is considered that the lock acquisition failed. 10000ms or 10s is the lock expiration time.
    isLock = redLock.tryLock(500, 10000, TimeUnit.MILLISECONDS);
    System.out.println("isLock = " + isLock);
if (isLock) {
//TODO if get lock success, do something;
}
    } catch (Exception e) {
    
    } finally { // No matter what, it must be unlocked in the end
      redLock.unlock();
    } //Unique ID

A very important point in implementing distributed locks is that the value of the set must be unique. How does the value of redisson ensure the uniqueness of the value? The answer is UUID + threadId.

The entrance is in redissonClient.getLock(“REDLOCK_KEY”), and the source code is in Redisson.java and RedissonLock.java:

protected final UUID id = UUID.randomUUID();
String getLockName(long threadId) { return id + ":" + threadId; } //Get the lock

The code to acquire the lock is redLock.tryLock() or redLock.tryLock(500, 10000, TimeUnit.MILLISECONDS). The final core source code of both is the following code, but the default lease time (leaseTime) of the former to acquire the lock is LOCK_EXPIRATION_INTERVAL_SECONDS, which is 30s:

RFuture tryLockInnerAsync(long leaseTime, TimeUnit unit, long threadId, RedisStrictCommand command) {
      internalLockLeaseTime = unit.toMillis(leaseTime);
      // Lua command that needs to be executed on the redis instance when acquiring the lock
      return commandExecutor.evalWriteAsync(getName(), LongCodec.INSTANCE, command, // First, the KEY of the distributed lock cannot exist. If it does not exist, then execute the hset command (hset REDLOCK_KEY uuid + threadId 1), and set the expiration time through pexpire (It is also the lease time of the lock)
      "if (redis.call('exists', KEYS[1]) == 0) then "
       + "redis.call('hset', KEYS[1], ARGV[2], 1); "
       + "redis.call('pexpire', KEYS[1], ARGV[1]); "
       + "return nil; " + "end; " + // If the KEY of the distributed lock already exists and the value also matches, it means that the lock is held by the current thread, then the number of reentries is increased by 1, and the expiration time is set.
      "if (redis.call('hexists', KEYS[1], ARGV[2]) == 1) then "
       + "redis.call('hincrby', KEYS[1], ARGV[2], 1); "
       + "redis.call('pexpire', KEYS[1], ARGV[1]); "
       + "return nil; " + "end; " + // Get the expiration time of the distributed lock KEY in milliseconds
      "return redis.call('pttl', KEYS[1]);", // These three parameters correspond to KEYS[1], ARGV[1] and ARGV[2] respectively
     Collections.singletonList(getName()), internalLockLeaseTime, getLockName(threadId));
}

In the command to acquire the lock,

KEYS[1] is Collections.singletonList(getName()), which represents the key of the distributed lock, that is, REDLOCK_KEY;
ARGV[1] is internalLockLeaseTime, which is the lease time of the lock. The default is 30s;
ARGV[2] is getLockName(threadId), which is the only value set when acquiring the lock, that is, UUID + threadId

release lock
The code to release the lock is redLock.unlock(). The core source code is as follows:

protected RFuture unlockInnerAsync(long threadId) { // Lua command that needs to be executed on the redis instance when releasing the lock
    return commandExecutor.evalWriteAsync(getName(), LongCodec.INSTANCE, RedisCommands.EVAL_BOOLEAN, // If the distributed lock KEY does not exist, publish a message to the channel
    "if (redis.call('exists', KEYS[1]) == 0) then " +
    "redis.call('publish', KEYS[2], ARGV[1]); " +
    "return 1; " + "end;" + // If the distributed lock exists, but the value does not match, it means that the lock has been occupied, then return directly
    "if (redis.call('hexists', KEYS[1], ARGV[3]) == 0) then " +
    "return nil;" + "end; " + // If the current thread occupies the distributed lock, then reduce the number of reentries by 1
    "local counter = redis.call('hincrby', KEYS[1], ARGV[3], -1); " + // If the value after subtracting 1 from the number of reentries is greater than 0, it means that the distributed lock has been reentrant. , then only the expiration time is set and cannot be deleted.
    "if (counter > 0) then " + "redis.call('pexpire', KEYS[1], ARGV[2]); " + "return 0; " + "else " + // After the number of reentries is reduced by 1 If the value is 0, it means that the distributed lock has only been acquired once, then delete the KEY and publish the unlock message
    "redis.call('del', KEYS[1]); " +
    "redis.call('publish', KEYS[2], ARGV[1]); " +
    "return 1; " + "end; " +
    "return nil;", // These 5 parameters correspond to KEYS[1], KEYS[2], ARGV[1], ARGV[2] and ARGV[3] respectively
    Arrays.asList(getName(), getChannelName()), LockPubSub.unlockMessage, internalLockLeaseTime, getLockName(threadId));
}