How to ensure data consistency in the seckill scenario? I have given the most detailed plan on this issue

What is a Lightning Deal?

Literally understood, the so-called spike is a high-concurrency scenario where a large number of requests flood in in a very short period of time, and problems such as service crashes or data inconsistencies are prone to occur when improperly handled.

Common seckill scenarios include Taobao Double Eleven, online car-hailing drivers grabbing orders, 12306 ticket grabbing, and so on.

Second Kill Oversold Bug Reappears in High Concurrency Scenario

Here we have prepared a small case of commodity spike,

1. Write the code according to the normal logic, check the inventory first when the request comes in, deduct the inventory when the inventory is greater than 0, and then execute other order logic business codes;

/**
 * Commodity spike
 */
@Service
public class GoodsOrderServiceImpl implements OrderService {

    @Autowired
    private GoodsDao goodsDao;

    @Autowired
    private OrderDao orderDao;

    /**
     * place an order
     *
     * @param goodsId commodity ID
     * @param userId user ID
     * @return
     */
    @Override
    public boolean grab(int goodsId, int userId) {
        // Query inventory
        int stock = goodsDao. selectStock(goodsId);
        try {
            // Sleeping here for 2 seconds is to simulate the concurrency coming here, simulating the real influx of a large number of requests
            Thread. sleep(2000);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }

        // Inventory is greater than 0, fastener inventory, save order
        if (stock > 0) {
            goodsDao.updateStock(goodsId, stock - 1);
            orderDao.insert(goodsId, userId);
            return true;
        }
        return false;
    }
}
@Service("grabNoLockService")
public class GrabNoLockServiceImpl implements GrabService {

    @Autowired
    OrderService orderService;

    /**
     * Lock-free buying logic
     *
     * @param goodsId
     * @param userId
     * @return
     */
    @Override
    public String grabOrder(int goodsId, int userId) {
        try {
            System.out.println("User:" + userId + "Execute buying logic");
            boolean b = orderService.grab(goodsId, userId);
            if (b) {
                System.out.println("User:" + userId + "Purchased successfully");
            } else {
                System.out.println("User:" + userId + "Buying failed");
            }
        } finally {

        }
        return null;
    }
}

2. The inventory is set to 2;

3. Use jmeter to open 10 threads for pressure measurement.

  • Pressure test results

Stock remaining: 1

Rush Orders: 10

there is a problem! Something is wrong! !

Originally there were two in stock, but now there is one left, but there are 10 successful flash sales, and there has been a serious oversold problem!

problem analysis:

The problem is actually very simple. When the seckill starts, 10 requests come in at the same time, check the inventory at the same time, find that the inventory = 2, and then deduct the inventory, change the inventory to 1, and the seckill is successful. A total of 10 items are sold, and the inventory is reduced. 1.

So how to solve this problem, it is quite simple to say, just add a lock.

Solution in stand-alone mode

Add JVM lock

First of all, in the stand-alone mode, there is only one service, and it is OK to add a JVM lock, and both synchronized and Lock are available.

@Service("grabJvmLockService")
public class GrabJvmLockServiceImpl implements GrabService {

    @Autowired
    OrderService orderService;

    /**
     * The buying logic of JVM lock
     *
     * @param goodsId
     * @param userId
     * @return
     */
    @Override
    public String grabOrder(int goodsId, int userId) {
        String lock = (goodsId + "");

        synchronized (lock. intern()) {
            try {
                System.out.println("User:" + userId + "Execute buying logic");
                boolean b = orderService.grab(goodsId, userId);
                if (b) {
                    System.out.println("User:" + userId + "Purchased successfully");
                } else {
                    System.out.println("User:" + userId + "Buying failed");
                }
            } finally {

            }
        }
        return null;
    }
}

Take synchronized as an example here. After locking, restore the inventory and re-test the pressure. The result is:

  • Pressure test results

Inventory Remaining: 0

Rush Orders: 2

You’re done!

Does the JVM lock still work in cluster mode?

The problem in stand-alone mode is solved, so in cluster mode, is it still effective to add JVM-level locks?

There are two services here, and a layer of gateways is added for load balancing and re-testing.

  • Pressure test results

Inventory Remaining: 0

Rush Orders: 4

The answer is obvious, the lock is invalid! !

Solutions in cluster mode

problem analysis:

The reason for this kind of problem is that the JVM-level locks are two different locks in the two services, each of the two services takes one, each sells its own, and is not mutually exclusive.

then what should we do? It’s also easy to handle, just separate the lock and let the two services get the same lock, which is a distributed lock.

Distributed lock:

Distributed locks are a way to control synchronized access to shared resources between distributed systems.

In distributed systems, it is often necessary to coordinate their actions. If different systems or different hosts of the same system share a resource or a group of resources, when accessing these resources, mutual exclusion is often required to prevent mutual interference and ensure consistency. At this time, you need to use distributed resources. style lock.

Common implementations of distributed locks include MySQL, Redis, Zookeeper, etc.

Distributed lock–MySQL:

The solution for MySQL to implement locks is: prepare a table as a lock,

  • When locking, insert the product ID to be snapped up as the primary key or unique index into the locked table, so that other threads will fail to insert when locking, thereby ensuring mutual exclusion;

  • Delete this record when unlocking, and other threads can continue to lock.

According to the above scheme, part of the code written:

  • Lock

/**
 * Distributed lock written by MySQL
 */
@Service
@Data
public class MysqlLock implements Lock {

    @Autowired
    private GoodsLockDao goodsLockDao;

    private ThreadLocal<GoodsLock> goodsLockThreadLocal;

    @Override
    public void lock() {
        // 1. Try to lock
        if (tryLock()) {
            System.out.println("Try to lock");
            return;
        }
        // 2. Sleep
        try {
            Thread. sleep(10);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
        // 3. Call again recursively
        lock();
    }

    /**
     * Non-blocking locking, if it succeeds, it will succeed, and if it fails, it will fail. return directly
     */
    @Override
    public boolean tryLock() {
        try {
            GoodsLock goodsLock = goodsLockThreadLocal. get();
            goodsLockDao.insert(goodsLock);
            System.out.println("Lock object:" + goodsLockThreadLocal.get());
            return true;
        } catch (Exception e) {
            return false;
        }
    }

    @Override
    public void unlock() {
        goodsLockDao.delete(goodsLockThreadLocal.get().getGoodsId());
        System.out.println("Unlock object:" + goodsLockThreadLocal.get());
        goodsLockThreadLocal. remove();
    }

    @Override
    public void lockInterruptibly() throws InterruptedException {
        // TODO Auto-generated method stub

    }

    @Override
    public boolean tryLock(long time, TimeUnit unit) throws InterruptedException {
        // TODO Auto-generated method stub
        return false;
    }
    
    @Override
    public Condition newCondition() {
        // TODO Auto-generated method stub
        return null;
    }
}
  • panic buying logic

@Service("grabMysqlLockService")
public class GrabMysqlLockServiceImpl implements GrabService {

    @Autowired
    private MysqlLock lock;
    
    @Autowired
    OrderService orderService;

    ThreadLocal<GoodsLock> goodsLock = new ThreadLocal<>();

    @Override
    public String grabOrder(int goodsId, int userId) {
        // generate key
        GoodsLock gl = new GoodsLock();
        gl.setGoodsId(goodsId);
        gl. setUserId(userId);
        goodsLock. set(gl);
        lock.setGoodsLockThreadLocal(goodsLock);

        // lock
        lock. lock();

        // perform business
        try {
            System.out.println("User:" + userId + "Execute buying logic");

            boolean b = orderService.grab(goodsId, userId);
            if(b) {
                System.out.println("User:" + userId + "Purchased successfully");
            } else {
                System.out.println("User:" + userId + "Buying failed");
            }
        } finally {
            // release the lock
            lock. unlock();
        }
        return null;
    }
}

After the inventory was restored, the stress test was continued, and the results were in line with expectations, and the data was consistent.

  • Pressure test results

Remaining stock: 0

Successful snap-up: 2

Problem and solution:

  • What should I do if the lock is not released successfully due to sudden network disconnection or other reasons?

Answer: Add the start time and end time fields to the lock table as the validity period of the lock. If the lock is not released in time due to various reasons, you can judge whether the lock is valid according to the validity period.

  • After adding a valid period to the lock, what should I do if the valid period ends and the thread task has not been executed yet?

Answer: The watch dog mechanism can be introduced to renew the lock before the task is executed. This will be explained in detail later.

Distributed lock–Redis:

The MySQL solution can be used in some small and medium-sized projects, and in large projects, it can also be used by adding the configuration of MySQL, but Redis is the most used one.

The implementation of Redis locking is to use the setnx command, format: setnx key value.

setnx is the abbreviation of “set if not exists”; if the key does not exist, set the value of the key to value; when the key exists, do nothing.

  • Locking: setnx key value

  • Unlock: del key

Redis distributed lock–deadlock problem

cause

The locked service hangs up during execution, and the lock has not been released in time. The lock has always existed in Redis, causing other services to fail to lock.

solution

Set the expiration time of the key so that the key will automatically expire. After the key expires, the key will no longer exist, and other services can continue to lock it.

  • It should be noted that when adding an expiration time, this method cannot be used:

setnx key value;
expire key time_in_second;

This method may also hang up after the first sentence of setnx is successful, and the expiration time is not set, resulting in a deadlock.

  • An effective solution is to lock and set the expiration time through a one-line command, the format is as follows:

set key value nx ex time_in_second;

This method is supported in Redis version 2.6.12, and older versions of Redis can use LuaScript.

Problems caused by expiration time

Question 1: Assume that the lock expiration time is set to 10 seconds, and the execution of service 1 has not ended after 10 seconds of locking. At this time, the lock has expired, and service 2 can successfully add the lock, resulting in two services obtaining the lock at the same time.

Question 2: Service 1 ends to release the lock after 14 seconds of execution, and it will release the lock added by service 2. At this time, service 3 can successfully lock again.

solution:

Problem 2 is easy to solve. When releasing the lock, judge whether it is a lock added by yourself. If it is a lock added by yourself, release it; if not, skip it.

Solution to problem one: it is the above-mentioned Watch Dog (watchdog) mechanism

The simple understanding is to set up another sub-thread (watchdog) to help the main thread watch the expiration time. When the main thread is not finished executing business logic, every third of the expiration time, the sub-thread (watchdog) will The expiration time is extended to ensure that the main thread does not end and the lock will not expire.

  • Realization of Watch Dog (watchdog) mechanism

@Service
public class RenewGrabLockServiceImpl implements RenewGrabLockService {

    @Autowired
    private RedisTemplate<String, String> redisTemplate;

    @Override
    @Async
    public void renewLock(String key, String value, int time) {
        System.out.println("continue" + key + " " + value);
        String v = redisTemplate.opsForValue().get(key);
        // Written as an infinite loop, add judgment
        if (StringUtils.isNotBlank(v) & amp; & amp; v.equals(value)){
            int sleepTime = time / 3;
            try {
                Thread. sleep(sleepTime * 1000);
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
            redisTemplate.expire(key,time,TimeUnit.SECONDS);
            renewLock(key, value, time);
        }
    }

Redis single node failure:

If Redis hangs up during execution, and all services cannot be locked, this is a single-node failure problem.

solution:

Use multiple Redis.

First, let’s analyze a question, can multiple Redis be master-slave?

Redis master-slave problem:

When a thread locks successfully, the key has not been synchronized, and the Redis Master node hangs up. At this time, there is no key in the Slave node, and the lock can still be successfully locked by another service.

Therefore, the master-slave scheme cannot be used.

Another option is the red lock.

Red Lock:

The red lock solution also uses multiple Redis, but there is no relationship between multiple Redis, it is an independent Redis.

When locking, after the lock is successfully locked on one Redis, immediately go to the next Redis to lock. Finally, if more than half of the Redis are successfully locked, the lock is successful, otherwise the lock fails.

Will there be an oversold problem with Red Lock?

meeting! .

If the operation and maintenance brother is very diligent and has done automation, after Redis hangs up, restart one immediately, then the restarted Redis does not have the previously locked key, and other threads can still lock successfully, which leads to two threads at the same time. get the lock.

  • Solution: Delay restarting the hanged Redis, there is no problem in delaying the startup for one day, and restarting too quickly will cause problems.

Ultimate question:

Is the program perfect so far?

not at all!

When the program is executing, the lock is also added, and the watch dog (watch dog) also begins to renew continuously. Everything seems to be very good, but there is still an ultimate problem in Java-STW (Stop The World).

When encountering FullGC, the JVM will cause STW (Stop The World). At this time, the world is pressed the pause button, the main thread executing the task is suspended, and the dog (watch dog) used for renewal will not continue During the period, the lock in Redis will slowly expire. When the lock expires, other JVMs can successfully lock it again. The original problem reappears, and two services get the lock at the same time.

solution:

  • Solution 1: Ostrich Algorithm

  • Solution 2: Ultimate solution — Zookeeper + MySQL optimistic lock

Distributed lock–Zookeeper + MySQL optimistic lock

How does Zookeeper solve the STW problem?

  • When locking, create a temporary sequential node in zookeeper. After the creation is successful, zookeeper will generate a serial number, and store this serial number in the verson field in MySQL for verification; if the lock is not released, STW occurs, and then the lock expires. After other services are locked, the version field in MySQL will be changed;

  • When unlocking, verify whether the version field is the content when you locked it yourself. If yes, delete the node and release the lock; if not, it means that you have fallen asleep and the execution failed.

The world has become quieter.

Related code

  • gitee: distributed-lock