Usage scenarios and possible problems of MultiLock and RedLock distributed locks implemented by redisson.

Table of Contents

1. Introduction to redisson

2. Interlocking MultiLock and RedLock

3. Possible functional defects of these two locks

4. Solutions to problems – personal ideas

1. Introduction to redisson

Redisson is a Java in-memory data grid (In-Memory Data Grid) implemented on the basis of Redis. It is an advanced distributed lock client based on Redis. At the same time, Redisson is the Java language implementation of RedlLock specified by the Redis official website.

There are already many introductions on the Internet along Redisson’s official documents, including the usage of fair locks, read-write locks, semaphores, locks, and source code analysis. I won’t go into details here. This blog post mainly talks about the necessary scenarios for using interlocking MultiLock and red lock RedLock, as well as possible problems.

2. Interlocking MultiLock and red lock RedLock

Because Redis is an AP architecture design in CAP, its main focus is to sacrifice consistency with a relatively small possibility of occurrence to achieve high availability and high performance. The emergence of the RedLock algorithm is to solve the problem that the distributed lock implemented by redis cannot guarantee strong data consistency. Even so, RedLock only tries to minimize the possibility of data inconsistency and cannot completely solve this problem. The following will focus on the problems that may occur with RedLock.

Because RedissonRedLock inherits from RedissonMultiLock, it just overrides the failedLocksLimit method, which allows the number of failed lock acquisitions, the waiting time to acquire the lock, and other methods. Generally speaking, it is still the same as paxos’s more than half mechanism. So let’s talk about RedissonMultiLock first.

The so-called interlocking is actually to form a lock combination of multiple RLocks, traverse each key in the combination, and acquire the lock respectively, but it allows the failed lock (Key) to be 0. The specific code is as follows:

 protected int failedLocksLimit() {
        return 0;
    }

This kind of lock is suitable for methods where resources are strictly mutually exclusive and require distributed lock support. That is, it is suitable for businesses where each key that may need to update data is strictly protected. Even in other methods, one of the keys needs to be updated. , at this time, they can only be mutually exclusive and cannot obtain resources. For example, in my last blog post: Scenario (1): There are multiple products in an order. When submitting the order, the inventory of each product needs to be deducted. In this scenario, you need to useMultiLock, instead of RedLock. But what if redis is deployed in stand-alone mode and master-slave mode, and the server goes down? There is no backup instance, or the backup instance cannot be automatically replaced, and there is no disaster recovery plan.

So can it be solved using sentinel or cluster mode deployment? Both modes have failover functions, but during the transfer process, split-brain problems may occur. That is to say, part of the data on an instance mapped to a certain key among multiple keys was lost during fault translation. It happened that the cache data of this key was lost after the node failure recovered, and the corresponding lock no longer existed.

RedLock is designed for this. Even if a small number of keys among multiple keys lose their locks due to failure, data loss, the combination lock as a whole still has the distributed lock function. Specifically, there are two methods of rewriting RedissonMultiLock.

 public RedissonRedLock(RLock... locks) {
        super(locks);
    }

    @Override
    protected int failedLocksLimit() {
        return locks.size() - minLocksAmount(locks);
    }
    
    protected int minLocksAmount(final List<RLock> locks) {
        return locks.size()/2 + 1;
    }

As can be seen from the source code, compared to RedissonMultiLock, its changes are not major. It is still mapped to different nodes based on CRC16 hash. So the key of RedLock, if the key of business-related mutually exclusive resources is completely selected, When acquiring a distributed lock, you need to obtain the number of redis master node service instances, which may be an even number or an odd number not greater than 3. In this way, the majority algorithm cannot be implemented.

3. Functional defects that may occur in these two locks

If there are 5 master nodes, if the key selection is inappropriate and all 5 keys are mapped to the same master node, and this master node fails over, isn’t it the same as the stand-alone mode? But if you select a key that is completely irrelevant to the business, and the key used in each request is the same, wouldn’t it be an irrelevant and non-mutually exclusive request (for example: the previous order requires product A, and then an order requiring product B is received) Submit a request), do you also need to contend for the same lock? However, each time the key is randomly selected, the request that may require mutual exclusion to compete for the lock becomes a non-mutually exclusive business request, which loses the original purpose of the distributed lock and puts the cart before the horse.

Many people believe that to implement RedLock locks, at least 5 redis Master instances are needed, because it is officially recommended? The official example here is just an example, and it is not recommended that at least 5 master node instances be required. I think it’s a translation problem.

Looking at the source code and official instructions, it is at least 3 nodes.

So what are the scenarios for using RedLock? Or is there any compensation plan for the lockKey setting of RedLock?

First of all, being able to choose the RedLock solution means that some data inconsistencies can be tolerated, such as

Scenario (2): Thread 1 has three keys A, B, and C in its request parameters, while thread 2’s request parameters only have C (or C, E, and F). If C in thread 1 happens to be invalid due to failover, Thread 2 successfully acquires the lock and modifies C-related resources at the same time. (Or thread 2 only fails to acquire C, and the distributed lock is acquired as a whole.) Doesn’t this change the original intention of ensuring data consistency and achieve mutual exclusion?

It can be seen that RedLock has great limitations. It may only be suitable for distributed deployment and fixed parameter request methods. It is intended to reduce the possibility of acquiring locks at the same time caused by failover. This is also the reason why I chose MultiLock instead of RedLock for my previous project.

If a compensation plan must be made, the current Redisson source code needs to be rewritten. No matter how many lockKeys there are, all keys need to create memory data on all master nodes. However, the redis master node can only store data in shards. , cannot be redundant. Therefore, you need to add prefixes and suffixes. The same prefix and suffix rules will get the same mutually exclusive key. For example, the data of key test on nodes A, B, and C are testA, testB, and testC. Even if there is only one key, more than half of the mechanism can be implemented, so that the applicable scenarios of RedLock can be expanded in terms of algorithm functions and business functions. It can also be regarded as repairing the shortcomings of algorithm implementation. (There is nothing wrong with the algorithm itself, but the implementation of the algorithm is not very good)

Here I mainly list the problems that may cause conflicts in the business. As for the problems that RedLock may cause due to environmental factors, many people on the Internet have summarized them, so I will not go into details, but I will quote them:

Problem 1: After the computer crashed and restarted, two clients obtained the same lock.

Assume that the five nodes are A, B, C, D, and E. Client 1 gets the lock on A, B, and C, but D and E do not get the lock. Client 1 gets the lock successfully. At this time, C hangs up and restarts, and the data locked on C is lost (assuming that the machine is powered off and the data has not had time to be flushed; or the master node on C hangs up and the slave nodes are not synchronized). Client 2 goes to get the lock and gets the lock from 3 nodes C, D, and E. A and B do not get it (they are still held by client 1). Client 2 also exceeds the majority and will get the lock.

Solution – delayed restart; however, due to clock jump factors, the restart aging is delayed (this problem cannot be solved);

Problem 2: Split-brain problem: Multiple clients compete for the same lock at the same time, and eventually they all fail.

For example, there are nodes 1, 2, 3, 4, and 5. A, B, and C compete for the lock at the same time. A obtains 1 and 2, B obtains 3, 4, and C obtains 5. In the end, ABC failed to obtain the lock successfully, and did not obtain more than half of the locks. lock. The official recommendation is to send lock acquisition commands to all nodes concurrently as much as possible. The shorter the time it takes for the client to obtain most of the Redis instance locks, the lower the probability of split-brain. It needs to be emphasized that when a client fails to acquire locks from most Redis instances, it should release (part of) the locks that have been successfully acquired as soon as possible to facilitate other clients to acquire the locks. If the lock release fails, it can only wait for the lock. Released after timeout (Release as soon as possible if acquisition of lock fails, redisson source code has been implemented)

Problem 3: Low efficiency. The more master nodes there are, the longer it takes to acquire the lock;

Problem 4: Clock jump

The solution just discussed above strictly relies on clocks, and the clocks on the five machines may have errors.

The meaning of clock jump is: the actual time has only passed 1 second (assuming), but the difference between the two times in the system may be 1 minute, that is, a jump has occurred between systems. When this happens, the operation and maintenance personnel may think that the system time has been modified.

Clock jumping has 2 consequences:

(1) The delayed restart mechanism fails. Clock jumping may cause the machine to hang up and restart immediately, causing the above problems.

(2) Clock jump causes the client to invalidate the lock immediately after getting it. The difference between endTime – beginTime is too large. Although this does not affect the accuracy, it does affect the efficiency of lock picking.

What about turning the clock back? endTime – beginTime will become a negative value, which does not affect the correctness of the algorithm.

Problem 3: There is a large client delay (such as full GC), and two clients get the same lock.

Theoretically, any lock with a timeout forced release mechanism may cause this problem. The server forcibly released the lock, but the client’s code did not finish executing and was stuck somewhere (such as full GC, or other reasons that caused the process to be suspended). The lock was assigned to another client.

To address this problem, Redis proposed the watch dog mechanism. The general meaning is that before the lock is about to expire and it is found that the client’s business logic has not been executed, the lock is renewed to prevent the lock from being forcibly released and allocated to another client. However, lock renewal itself is a network operation, and there is no way to guarantee that the renewal will be successful!

From this case, two important revelations can be drawn:

(1) In a distributed system, algorithms that strictly rely on the local clock of each machine may be risky.

(2) Any lock with a “timeout forced release mechanism” may cause the lock to be forcibly released while the client is still holding the lock.

4. The solution to the problem-personal vision

Regardless of whether it is MultiLock or RedLock, first put aside the defects in business functions (problems with limited applicable scenarios), the above problems can be summed up into one: the business code has not been executed, the lock data is directly lost, and the watchdog WatchDog Renewal is not possible.

Then let me tell you my own plan: after the business calculation is completed, when you want to modify the data through redisClient, do not use RedisTemplate. Instead, all calls that need to be modified are packaged into redis transactions for execution. Before committing the transaction, check again whether all keys exist. If one does not exist (more than half of RedLocK do not exist), manually roll back all operations. If there is a database operation, use a database transaction to roll back .

Redis transactions are implemented through commands such as MULTI, EXEC, WATCH and UNWATCH. In the Redisson client, the RTransaction object generates a MULTI command when calling the commit() method, and saves all operation parameters in the command cache. When the transaction completes and the commit() method is called, Redisson will send an EXEC command to the Redis server to perform all operations in the transaction. If any of the operations fails, Redisson will immediately roll back the entire transaction (added WATCH command). Failure to execute the redis transaction means reporting an error and throwing an exception to RedissonTransaction.

// Define transaction options TransactionOptions options = TransactionOptions.defaults() .timeout(1) //Set the transaction timeout in seconds, the default is 60 seconds .retryAttempts(2) //Set the number of transaction retries, the default is 3 times .retryInterval(100); //Set the transaction retry interval in milliseconds, the default is 1000 milliseconds //Create RTransaction object and start transaction RTransaction transaction = redisson.createTransaction(options); try { //Perform a series of operations in a transaction transaction.getBucket("key1").set("value1"); transaction.getMap("map1").put("field1", "value2"); transaction.getSet("set1").add("value3"); // Submit transaction transaction.commit(); } catch (Exception e) { //Rollback transaction transaction.rollback(); } finally { //Close the Redisson connection redisson.shutdown(); }

The role of redis transactions is reflected here. Compared with Lua scripts, failure to execute will roll back all previous operations.

How to implement rollback in business code? This is even simpler. You only need to clear all instructions in the cache and release the key lock that still exists.

For more resource sharing, please follow my official account: search or scan the code