Design spike system under high concurrency

Design a spike system under high concurrency (similar to regular snap-ups during the epidemic)

Article directory

  • core logic
  • Supplementary background knowledge
    • What is a bloom filter?
    • How to deduct inventory with redis?
    • What is the difference between cache breakdown and cache penetration?
    • Why should the order be set as an asynchronous request?

Phenomenon: In the first few minutes, the number of concurrent users really increases suddenly, and when it reaches the time point of the second kill, the number of concurrent users will reach its peak.
Only a very small number of users can complete the request and place an order. Instantaneous concurrency.
Users tend to keep refreshing the page, but only at that point does it become a clickable button.

Core logic

1. Reduce unnecessary requests
1. Avoid non-critical requests
1. The page is static: the user browses products and other routine operations, and does not request to the server. Access to the server is only allowed when the seckill time is reached and the user actively clicks the seckill button.
1. CDN Acceleration: Enable users to obtain the desired content nearby, reduce network congestion, and improve user access response speed and hit rate.
2. Avoid random triggering of key requests
1. Second kill button: js file control, grayed out before the time
1. How is the CDN updated?
2. When the seckill starts, the system will generate a new js file, the flag is true at this time, and a new value is generated for the random parameter, and then synchronized to the CDN. Because of this random parameter, the CDN will not cache the data, and the latest js code can be obtained from the CDN every time.
3. The front end can also add a timer to control, for example: within 10 seconds, only one request is allowed. If the user clicks the seckill button once, it will be grayed out within 10 seconds, and no further clicks are allowed. After the time limit has passed, the button is allowed to be clicked again.
2. Current limiting (anti-wool party, crawler
1. Limit the same user: limit id, ip (easy to accidentally hurt)
2. The interface itself limits the current
1. How to achieve it?
1. Current limiting based on nginx
2. Current limiting based on redis
3. Verification code mechanism (moving the slider is better than digital verification
4. Increase the business threshold (only the designated membership level can make a request
2. Accelerate the completion of requests
1. Cache (redis cluster
1. The scene itself reads more and writes less: before placing an order, you need to check the remaining amount, and then you can place an order.
2. Problems caused by caching
1. Cache breakdown (the first spike, the query value does not exist in the cache
1. Distributed lock (insurance
2. Cache warm-up (anti-breakdown, at the same time, query the database after the cache misses, the database hangs up, but it is still afraid of expiration
2. Cache penetration (a large number of commodity ids passed in by the request do not exist in the cache or in the database
1. There is no big problem with locks, but the performance is poor
2. Bloom filter (added before accessing the cache
1. How can the data in the Bloom filter be more consistent with the data in the cache?
1. Most of them are used in scenarios where the cached data is rarely updated
2. Synchronization is required. If synchronization fails, retrying is required. How to ensure real-time consistency across data sources? (Bloom filters are only suitable for scenarios with few data updates)
3. Cache the non-existing product ID: access the Bloom filter first, then check the cache, and finally cache the non-existing product ID when both the Bloom filter and the cache fail.
2. Process order asynchronously
1. Introduction: There are three steps: seckill, order, and payment. The order is asynchronous, that is, msg is sent to mq after seckill, and the order service consumes the message processing request.
3. Question:
1. Message loss problem: message sending record table
1. Add a message sending record form after the seckill, and mark it as processed after the order is placed.
2. Regularly query the sending record table, and resend the mq message if it fails
3. If the order has been placed but the callback fails, will it lead to repeated orders?
2. Repeated consumption problem: add message processing table
1. After the consumer reads the message, it first judges whether the message exists in the message processing table. If it exists, it means repeated consumption, and returns directly. If it does not exist, place an order, then write the message into the message processing table, and then return.
2. There is a more critical point: placing an order and writing a message processing table should be placed in the same transaction to ensure atomic operations.
3. Spam problem
1. The consumer keeps failing to place an order and cannot call back the status change interface, so the job will keep retrying to send messages. In the end, a lot of spam is generated.
2. Every time a job is retried, it is necessary to first determine whether the number of times the message has been sent in the message sending table has reached the maximum limit. If so, it will return directly. If not, add 1 to the count and send the message.
3. In this way, if there is an exception, only a small amount of spam will be generated, and normal business will not be affected.
4. Delayed consumption problem
1. If the payment is not completed within 15 minutes, the order will be automatically canceled
2. The real-time performance of the job is not good
3. Delay Queue
1. When placing an order, the message producer will first generate the order, and the status is pending payment at this time, and then send a message to the delay queue.
2. When the delay time is reached, the message consumer will check whether the status of the order is pending payment after reading the message.
3. If it is pending payment, the order status will be updated to cancel status.
4. If it is not in the pending payment state, it means that the order has already been paid, and it will be returned directly.
5. Another key point is that after the user completes the payment, the order status will be changed to paid.
3. Request Validity Guarantee
1. Oversold inventory problem
1. Withholding and return of inventory: The user has not completed the payment within a period of time and needs to return the inventory.
2. Database deduction inventory
1. update set
2. How to prevent users from operating when the inventory is insufficient?
3. Check the inventory before updating
4. Query operations and update operations are not atomic, which will lead to oversold inventory in concurrent scenarios
5. update product set stock=stock-1 where id=product and stock > 0; high concurrent contention row lock, easy to deadlock.
3. Redis deducts inventory
1. The incr of redis is atomic
2. Locking, synchronized poor performance
3. There is a negative inventory, and there will be no oversold problems
4. Lua script deducts inventory

Supplementary background knowledge

What is a bloom filter?

Bloom filter is a probabilistic data structure, which is used to quickly determine whether an element belongs to a set. It can efficiently determine whether an element must not exist in the set, but there is a certain misjudgment rate.

How to use redis to deduct inventory?

boolean exist = redisClient.query(productId,userId);
if(exist) {<!-- -->
  return -1;
}
if(redisClient.incrby(productId, -1)<0) {<!-- -->
  return 0;
}
redisClient.add(productId, userId);
return 1;

The main flow of the code is as follows:
First judge whether the user has flashkilled the product, and if so, return -1 directly.
Deduct the inventory and judge whether the return value is less than 0. If it is less than 0, return 0 directly, indicating that the inventory is insufficient.
If the return value is greater than or equal to 0 after deducting the inventory, save the record of this seckill.
Then return 1, indicating success.

At first glance, the program seems to be no problem.

However, if there are multiple requests to deduct inventory at the same time in a high-concurrency scenario, the result of the incrby operation for most requests will be less than 0. Although the inventory is negative, there will be no oversold problems.

However, since this is a pre-reduction of inventory, if the negative value is too negative, the inventory will be inaccurate if the inventory is to be returned later.

What is the difference between cache breakdown and cache penetration?

  • Cache breakdown refers to requesting data that does not exist in the cache, causing the request to directly access the underlying storage system and increasing the load pressure.
  • Cache penetration refers to a malicious request for a large amount of data that does not exist in the cache, causing the request to fail to hit in the cache, increasing the system load, and may become a denial of service attack.

Why should the order be set as an asynchronous request?

Designing the ordering process to be asynchronous can bring the following benefits in the three links of seckill, ordering, and payment:

  1. Reduce response time: In flash sales, a large number of requests will flood into the system in a short period of time due to the rush-buying behavior of users. If the ordering process is synchronous, that is, it needs to wait for each ordering request to complete before continuing to process the next request, the system may respond slowly due to the accumulation of requests, affecting user experience. After the order placing process is designed to be asynchronous, the system can immediately respond to the order request and return a successful response, and then process the order logic asynchronously in the background, so that the system can respond to the user’s request faster.
  2. Improve concurrent processing capabilities: Asynchronous processing of orders can reduce the serial processing of requests and improve the concurrent processing capabilities of the system. During the seckill activity, there may be a large number of users submitting order requests at the same time. If the orders are placed synchronously, the system needs to process these requests one by one, resulting in a slow processing speed. The asynchronous processing order can put the request into the message queue or task queue, which is processed asynchronously by the background worker thread or consumer, which greatly improves the concurrent processing capability of the system and can better deal with high concurrency situations.
  3. Increase the stability and reliability of the system: Asynchronous order placement can decouple the order request from the subsequent payment process, even if there is a problem after the order is placed (such as insufficient inventory, payment failure, etc.), It will affect the order behavior of other users. The system can process the order request asynchronously in the background while performing operations such as inventory check and payment status verification to ensure the accuracy and completeness of the order. This decoupling and asynchronous processing method can increase the stability and reliability of the system and reduce the risk of single point failure.