Redisson’s watchdog strategy – a mechanism to ensure the security and stability of Redis data

Foreword

Custom redis distributed locks cannot be automatically renewed. For example, a lock is set to be released with a timeout of 1 minute. If the thread that obtains the lock does not complete execution within one minute, the lock will be obtained by other threads, which may cause This leads to serious online problems. In the flash sale scenario, it is easy to oversell due to this defect.

In distributed systems, Redis, as a high-performance, low-latency memory data storage system, is widely used in various scenarios. However,In complex environments, Redis data may face problems such as expiration or deadlock, which poses a threat to the stability and security of the application. In order to solve these problems, the Redisson library provides a watchdog (Watch Dog) strategy. When the distributed lock expires, it will be automatically renewed. When reddison does not define leaseTime, the watchdog mechanism will be enabled by default when it is turned on. The default is 30s

What is a watchdog strategy

AWatchdog policy is a mechanism that automatically detects and handles expired keys. It is implemented based on the “WATCH” command of Redis and monitors the specified key on the Redis server by creating a monitor (Watch Dog) in the Redisson library.

When an application uses the Redisson library to watch a key, Watch Dog sends a “WATCH” command to the Redis server and watches the key on the server. If another client attempts to modify a monitored key, the Redis server will return an error. This error will be caught and handled by Watch Dog. Watch Dog triggers an event inside the Redisson library and passes the event to the application so that the application can take appropriate action.

The watchdog mechanism is an automatic extension mechanism provided by Redission. This mechanism allows the distributed lock provided by Redission to be automatically renewed.

private long lockWatchdogTimeout = 30 * 1000;

The default timeout provided by the watchdog mechanism is 30*1000 milliseconds, which is 30 seconds. If after a thread acquires the lock, the time it takes to run the program and release the lock is greater than the automatic release time of the lock (that is, the timeout time provided by the watchdog mechanism is 30s), then Redission will automatically extend the timeout for the target lock in redis. If we want to start the watchdog mechanism in Redission, we don’t need to define the leaseTime (lock automatic release time) ourselves when acquiring the lock. If you define the lock automatic release time yourself, the watchdog mechanism cannot be enabled whether through the lock or tryLock method. However, if the incoming leaseTime is -1, the watchdog mechanism will also be enabled.

Distributed locks cannot be set to never expire. This is to avoid a deadlock situation in which a node goes down after acquiring the lock in a distributed situation, so a distributed lock needs to be set with an expiration time. However, this will cause a thread to obtain the lock and the program has not finished running when the lock expiration time arrives, causing the lock to time out and be released. Then other threads can acquire the lock, causing problems. Therefore, the automatic renewal of the watchdog mechanism solves this problem very well.

Source code interpretation

Enter the tryLock method. The tryLock(waitTime, -1, unit) here has three parameters.

  • waitTime: the maximum waiting time to acquire the lock (default is -1 if not passed)
  • leaseTime: the time when the lock is automatically released (if not passed, the default is -1)
  • unit: unit of time (waiting time and time unit for lock automatic release)
public boolean tryLock(long waitTime, TimeUnit unit) throws InterruptedException {<!-- -->
    return tryLock(waitTime, -1, unit);
}
 @Override
    public boolean tryLock(long waitTime, long leaseTime, TimeUnit unit) throws InterruptedException {<!-- -->
        long time = unit.toMillis(waitTime);
        long current = System.currentTimeMillis();
        long threadId = Thread.currentThread().getId();
        Long ttl = tryAcquire(waitTime, leaseTime, unit, threadId);
        // lock acquired
        if (ttl == null) {<!-- -->
            return true;
        }
        
        time -= System.currentTimeMillis() - current;
        if (time <= 0) {<!-- -->
            acquireFailed(waitTime, unit, threadId);
            return false;
        }
        
        current = System.currentTimeMillis();
        RFuture<RedissonLockEntry> subscribeFuture = subscribe(threadId);
        if (!subscribeFuture.await(time, TimeUnit.MILLISECONDS)) {<!-- -->
            if (!subscribeFuture.cancel(false)) {<!-- -->
                subscribeFuture.onComplete((res, e) -> {<!-- -->
                    if (e == null) {<!-- -->
                        unsubscribe(subscribeFuture, threadId);
                    }
                });
            }
            acquireFailed(waitTime, unit, threadId);
            return false;
        }

        try {<!-- -->
            time -= System.currentTimeMillis() - current;
            if (time <= 0) {<!-- -->
                acquireFailed(waitTime, unit, threadId);
                return false;
            }
        
            while (true) {<!-- -->
                long currentTime = System.currentTimeMillis();
                ttl = tryAcquire(waitTime, leaseTime, unit, threadId);
                // lock acquired
                if (ttl == null) {<!-- -->
                    return true;
                }

                time -= System.currentTimeMillis() - currentTime;
                if (time <= 0) {<!-- -->
                    acquireFailed(waitTime, unit, threadId);
                    return false;
                }

                // waiting for message
                currentTime = System.currentTimeMillis();
                if (ttl >= 0 & amp; & amp; ttl < time) {<!-- -->
                    subscribeFuture.getNow().getLatch().tryAcquire(ttl, TimeUnit.MILLISECONDS);
                } else {<!-- -->
                    subscribeFuture.getNow().getLatch().tryAcquire(time, TimeUnit.MILLISECONDS);
                }

                time -= System.currentTimeMillis() - currentTime;
                if (time <= 0) {<!-- -->
                    acquireFailed(waitTime, unit, threadId);
                    return false;
                }
            }
        } finally {<!-- -->
            unsubscribe(subscribeFuture, threadId);
        }
// return get(tryLockAsync(waitTime, leaseTime, unit));
    }


Advantages of watchdog strategy:

1. Real-time monitoring:
The watchdog can monitor the keys on the Redis server in real time to ensure the real-time and accuracy of the data.
2. Prevent expiration and invalidation:
By regularly detecting and handling expired keys, a watchdog policy prevents data invalidation issues caused by key expiration.
3. Prevent deadlock:
Under distributed high concurrency conditions, if a thread acquires a lock but has not yet had time to release the lock, it will be unable to execute the command to release the lock due to system failure or other reasons. This will cause other threads to be unable to acquire the lock, thus Cause deadlock. By using a watchdog policy, an application can ensure that all instances respond to data changes in a timely manner and avoid this situation.
4. Scalability:
The watchdog strategy supports horizontal scalability and can maintain high performance and stability as the system grows in size.

Disadvantages of watch dog

Although the redisson watchdog can ensure that the lock will not be released when the thread has not completed execution, it is suitable for strong consistency scenarios such as flash killing, but it is not suitable for anti-replication scenarios. In high concurrency In this case, the interface performance will be degraded.

During high concurrency and duplication prevention, if the lock fails, it will fail quickly. At this time, you can use a custom lock or tryLock, as follows

RLock lock = redissonClient.getLock("Export:create:" + Context.get().getCorpId());
try {<!-- -->
    //Try to lock, wait up to 0 seconds, and automatically unlock 5 seconds after locking
    if (lock.tryLock(0, 5, TimeUnit.SECONDS)) {<!-- -->
        //Business processing
    } else {<!-- -->
        Assert.isTrue(false, "Queuing, please try again later!");
    }
} catch (InterruptedException e) {<!-- -->
    Assert.isTrue(false, "Do not repeat the operation!");
} finally {<!-- -->
    if (lock.isLocked()) {<!-- -->
        lock.unlock();
    }
}

Redisson plus unlocking API

public void test() throws Exception{<!-- -->
        RLock lock = redissonClient.getLock("guodong"); // Will keep retrying when the lock fails.
        
        // With Watch Dog automatic extension mechanism, the default extension is 30s, and it continues to 30s every 30/3=10 seconds.
        lock.lock();
         
        // With Watch Dog automatic extension mechanism, the default extension is 30s
        // Stop retrying after trying to get the lock for 10 seconds and return false. It has Watch Dog automatic extension mechanism and defaults to 30 seconds.
        boolean res1 = lock.tryLock(10, TimeUnit.SECONDS);
       
       // No Watch Dog
       //Try to acquire the lock for 10 seconds, give up if unable to acquire it
        lock.lock(10, TimeUnit.SECONDS);
       
       // No Watch Dog
       //Try to acquire the lock, wait 100 seconds, hold the lock for 10 seconds
        boolean res2 = lock.tryLock(100, 10, TimeUnit.SECONDS);
        
        
        Thread.sleep(40000L);
       
        
        lock.unlock();
    }

The lock() method is a blocking method for acquiring a lock. If the current lock is held by another thread, the current thread will be blocked and wait to acquire the lock. The waiting will not end until the lock is acquired or a timeout or interruption occurs. After acquiring the lock, this method can ensure that thread access to shared resources is mutually exclusive, and is suitable for scenarios where it is necessary to ensure that shared resources can only be accessed by one thread. Redisson’s lock() method supports features such as reentrant locks and fair locks, which can better meet the needs of multi-threaded concurrent access.

The tryLock() method is a non-blocking way to acquire a lock. It will not block the current thread when trying to acquire the lock, but will immediately return the result of acquiring the lock. If the acquisition is successful, it will return true, otherwise it will return false. Redisson’s tryLock() method supports locking time limit, waiting time limit, and reentrancy features, which can better control the process and waiting time of acquiring the lock, and avoid problems such as the program being unable to respond for a long time.

By default, the watchdog renewal time is 30s, which can also be specified separately by modifying Config.lockWatchdogTimeout. In addition, Redisson also provides a locking method that can specify the leaseTime parameter to specify the locking time. After this time is exceeded, the lock will be automatically unlocked and the validity period of the lock will not be extended.

Summary

There are many problems when using Redis to implement distributed locks.

For example, if the business logic processing time > the lock automatic release time set by yourself, Redis will release the lock according to the timeout situation, and other threads will take advantage of the opportunity to snatch the lock and cause problems, so a continuation is required. period of operation. Moreover, if the operation of releasing the lock is completed in finally, you need to determine whether the current lock belongs to your own lock to prevent the lock of other threads from being released. In this way, the operation of releasing the lock is not atomic. This problem can be easily solved by using Lua script is enough.

With the emergence of Redisson, the watchdog mechanism can solve the renewal problem very well. Its main steps are as follows:

  1. When acquiring the lock, leaseTime cannot be specified or leaseTime can only be set to -1, so that the watchdog mechanism can be turned on.
  2. Try to acquire the lock in the tryLockInnerAsync method. If the lock is acquired successfully, call scheduleExpirationRenewal to execute the watchdog mechanism.
  3. The more important method in scheduleExpirationRenewal is renewExpiration. When the thread acquires the lock for the first time (that is, it is not a reentrant situation), the renewExpiration method will be called to turn on the watchdog mechanism.
  4. In renewExpiration, a delayed task task will be added to the current lock. This delayed task will be executed after 10s. The executed task is to refresh the validity period of the lock to 30s (this is the default lock release time of the watchdog mechanism)
  5. And renewExpiration will continue to be called recursively at the end of the task.

That is to say, the general process is to first obtain the lock (the lock is automatically released after 30 seconds), and then set a delay task for the lock (executed after 10 seconds). The delay task refreshes the release time of the lock to 30 seconds, and also sets the lock again. An identical delayed task (executed after 10 seconds), so that if the lock is not released (the program is not completed), the watchdog mechanism will refresh the automatic release time of the lock to 30 seconds every 10 seconds.

When an exception occurs in the program, the watchdog mechanism will not continue to call renewExpiration recursively, so the lock will be automatically released after 30 seconds. Or, after the program actively releases the lock, the process is as follows:

  1. Remove the thread ID corresponding to the lock
  2. Then obtain the delayed task from the lock and cancel the delayed task
  3. Remove this lock from EXPIRATION_RENEWAL_MAP.