Irregular use of the thread exception caused by CountDownLatch to wait for timeout

Quote:

The concept of CountDownLatch: Allow one or more threads to wait for other threads to complete operations. The constructor allows to receive an int value as the initial number of threads, which can also be considered as the last few numbers.
After the thread execution is completed, the count value in the construction is operated by -1 through the countDown() method. After the count value is 0, the CountDownLatch will release the waiting of all threads.
There is also the await() method, which allows all threads to wait.
These two methods are the key methods of CountDownLatch.

Project scenario:

Due to the need to automatically update the customer level in the project, the promotion and downgrade operation of customer data is carried out regularly (every morning). This task is divided into four stages:

  1. Get four months of all customer-related order data and save and update the auto-promotion table

  2. Customer promotion (transaction promotion long-term, FANS)

  3. Customer downgraded to deal

  4. Delete order data four months ago (promotion table)

In order to optimize the code execution speed, CountDownLatch is used in the first stage of the code to perform multi-threaded processing on the update data part of the scheduled task, and wait for the thread to be released after all the threads are finished and continue to execute the main thread operation (promotion and demotion).

Using the combination of CountDownLatch and Thread has indeed reduced the time-consuming of the interface a lot. But it also laid a hidden danger for the bug I encountered this time.

?Problem Description:

Since the scheduled task is executed in the early morning of every day, I went to work the next day to check the execution status of the scheduled task, and found that the scheduling was successful, but the task was not over; at this time, I started to check the logs to try to find out the problem, and found that the thread was being modified After the data is completed, the main thread enters the thread waiting stage and has not been released. Have a look at the problem code:

//If it is greater than 2000, use threads to update data
if (orderDataOfTheCurrentMonthData. size() > 2000) {
    //Set the number of threads
    CountDownLatch latch = new CountDownLatch(10); // problem part
    
    List<List<TAutoPromotionGradePO>> partition = Lists.partition(orderDataOfTheCurrentMonthData, 2000);
    for (List<TAutoPromotionGradePO> autoPromotionGradePOList : partition) {
        taskExecutor. execute(() -> {
            saveOrUpdatePromotionData(autoPromotionGradePOList);
            //A single task ends, the counter is decremented by one
            latch. countDown();
        });
    }

    try {
        // wait for all tasks to finish
        latch. await();
    } catch (InterruptedException e) {
        log.error("automatically update customer levels error msg :", e);
        throw new BizException(50000, "Thread task release exception");
    } finally {
        log.info("The task of updating automatic promotion data is completed elapsed time :{}", stopwatch.elapsed(TimeUnit.MICROSECONDS));
    }
} else {
    saveOrUpdatePromotionData(orderDataOfTheCurrentMonthData);
}

Reason analysis:

After debugging, it was found that the thread has not been released. The reason is that the initialization value was set to 10 when the CountDownLatch was created; when the task was executed, the order data of the month did not reach 20,000, so the number of threads used did not exceed the preset value. The value of 10 caused only 7 threads to be released, and the remaining 3 were not released, which caused the **latch.await()** step to wait for all the threads to be released to 0.

Since the timed task is a stand-alone serial, it will cause subsequent tasks to be blocked!

The official description is as follows:

A CountDownLatch is initialized with a given count. The await methods block until the current count reaches zero due to invocations of the countDown method, after which all waiting threads are released and any subsequent invocations of await return immediately. This is a one-shot phenomenon – the count cannot be reset. If you need a version that resets the count, consider using a CyclicBarrier.
A CountDownLatch is a versatile synchronization tool and can be used for a number of purposes. A CountDownLatch initialized with a count of one serves as a simple on/off latch, or gate: all threads invoking await wait at the gate until it is opened by a thread invoking countDown. A CountDownLatch initialized to N can be used to make one thread wait until N threads have completed some action, or some action has been completed N times.
A useful property of a CountDownLatch is that it doesn’t require that threads calling countDown wait for the count to reach zero before proceeding, it simply prevents any thread from proceeding past an await until all threads could pass.

CountDownLatch acts as a simple on/off latch or gate: all threads calling await wait at the gate until it is opened by the thread calling countDown.

It is equivalent to CountDownLatch being a tour guide, who needs to wait for all ten of his tourists to arrive before starting the next itinerary.

Problem solved:

Now that the problem is found, it is very simple to solve it.

  1. The easiest way is to calculate the number of threads that need to be used according to the number and then set the initialization value. You can also divide the data by setting the initialization value.

  2. By modifying latch.await(60,TimeUnit.SECONDS) to set the number of seconds to wait for the timeout to release the thread.

Both can solve the problem, and it is recommended to use the first one. After all, standard use of tools is what a programmer should do. The second one can be used depending on the situation, or both. Since I don’t need to set a timeout release for this task, I chose the first one.

private static final int COUNT = 2000;

...
  
//If it is greater than 2000, use threads to update data
 if (orderDataOfTheCurrentMonthData. size() > COUNT) {
     //Set the required thread
     int i = (orderDataOfTheCurrentMonthData. size() / COUNT) + 1;
     CountDownLatch latch = new CountDownLatch(i);
     
     List<List<TAutoPromotionGradePO>> partition = Lists.partition(orderDataOfTheCurrentMonthData, COUNT);
     for (List<TAutoPromotionGradePO> autoPromotionGradePOList : partition) {
         taskExecutor. execute(() -> {
             saveOrUpdatePromotionData(autoPromotionGradePOList);
             //A single task ends, the counter is decremented by one
             latch. countDown();
         });
     }
     
     try {
         // wait for all tasks to finish
         latch. await();
     } catch (InterruptedException e) {
         log.error("automatically update customer levels error msg :", e);
         throw new BizException(50000, "Thread task release exception");
     } finally {
         log.info("The task of updating automatic promotion data is completed elapsed time :{}", stopwatch.elapsed(TimeUnit.MICROSECONDS));
     }
 } else {
     saveOrUpdatePromotionData(orderDataOfTheCurrentMonthData);
 }

Question Summary:

This kind of problem should not have occurred, and this time it is recorded to remember that this kind of irregular use has occurred.

There is no useful summary of this BUG, so just record it briefly, and hope that it will not happen again in the future! ! !