Stop worrying about the size of the thread pool and the number of threads. There is no fixed formula.

96071de406f85ae99630416323796fa0.gif

Many people may have seen a theory about setting the number of threads:

  • CPU-intensive programs – number of cores + 1

  • I/O intensive programs – number of cores * 2

No, no, does anyone really plan the number of threads according to this theory?

Small test of thread count and CPU utilization

Putting aside some operating systems and computer principles, let’s talk about a basic theory (no need to worry about whether it is rigorous, just for easy understanding):

One CPU core can only execute the instructions of one thread per unit time.

So in theory, one thread only needs to keep executing instructions to reach full utilization of one core.

Let’s write an example of running in an endless loop to verify:

Test environment: AMD Ryzen 5 3600, 6 – Core, 12 – Threads


public class CPUUtilizationTest {
  public static void main(String[] args) {
    //Infinite loop, do nothing
    while (true){
    }
  }
}

After this example, let’s take a look at the current CPU utilization:

4598f76363a4de5b7891f1efc3ddd0a9.png

As you can see from the picture, my No. 3 core utilization has been fully utilized.

Based on the above theory, should I try to open a few more threads?


public class CPUUtilizationTest {
  public static void main(String[] args) {


    for (int j = 0; j < 6; j + + ) {
      new Thread(new Runnable() {
        @Override
        public void run() {
          while (true){
          }
        }
      }).start();
    }
  }
}

Looking at the CPU utilization at this time, the utilization rates of several cores on 1/2/5/7/9/11 are already full:

ea71cafb8888a5a33c693d1794e98528.png

So if 12 threads are opened, will all cores be fully utilized? The answer must be yes:

1246b48d75ec33c74e20d7ba9daf5934.png

If I continue to increase the number of threads in the above example to 24 threads at this time, what will be the result?

45d8182ded1dcce09cf056f9879cb3d6.png

As you can see from the picture above, the CPU utilization is the same as the previous step, still 100% for all cores, but at this time the load has increased from 11.x to 22.x (for the explanation of load average, please refer to https://scoutapm.com/blog/ understanding-load-averages), indicating that the CPU is busier at this time and the thread’s tasks cannot be executed in time.

Modern CPUs are basically multi-core. For example, the AMD 3600 I tested here has 6 cores and 12 threads (hyper-threading). We can simply think of it as a 12-core CPU. Then my CPU can do 12 things at the same time without disturbing each other.

If the number of threads to be executed is greater than the number of cores, then it needs to be scheduled by the operating system. The operating system allocates CPU time slice resources to each thread, and then switches continuously to achieve the effect of “parallel” execution.

But is this really faster? As can be seen from the above example, one thread can fully utilize one core. If each thread is very “overbearing”, constantly executing instructions, not giving the CPU idle time, and the number of threads executing at the same time is greater than the number of CPU cores, it will cause the operating system to switch thread execution more frequently. To ensure that each thread can be executed.

However, switching comes at a cost. Each switching will be accompanied by operations such as register data update and memory page table update. Although the cost of a switch is negligible compared with I/O operations, if there are too many threads, thread switches are too frequent, or even the switching time per unit time is greater than the program execution time, it will lead to excessive CPU resources. Wasted on context switching instead of executing the program, the gain outweighs the gain.

The above example of running in an endless loop is a bit too extreme, and it is unlikely that such a program would exist under normal circumstances.

Most programs will have some I/O operations when they are running, which may be reading and writing files, sending and receiving messages over the network, etc. These I/O operations need to wait for feedback while they are in progress. For example, when reading and writing on the network, you need to wait for messages to be sent or received. During this waiting process, the thread is in a waiting state and the CPU is not working. At this time, the operating system will schedule the CPU to execute instructions from other threads, thus making perfect use of the idle period of the CPU and improving CPU utilization.

In the above example, the program keeps looping and does nothing, and the CPU has to keep executing instructions, leaving almost no free time. What if an I/O operation is inserted and the CPU is idle during the I/O operation? What will happen to the CPU utilization? Let’s first look at the results under single thread:


public class CPUUtilizationTest {
  public static void main(String[] args) throws InterruptedException {


    for (int n = 0; n < 1; n + + ) {
      new Thread(new Runnable() {
        @Override
        public void run() {
          while (true){
                        //After each empty loop 100 million times, sleep 50ms to simulate I/O waiting and switching
            for (int i = 0; i < 100_000_000l; i + + ) {
            }
            try {
              Thread.sleep(50);
            }
            catch (InterruptedException e) {
              e.printStackTrace();
            }
          }
        }
      }).start();
    }
  }
}

7245c6d6f08af7099b376204797ab450.png

Wow, the only utilization rate of core No. 9 is only 50%. Compared with the previous 100% without sleep, it is already half lower. Now adjust the number of threads to 12 and see:

7f537be00bd1cbedb56ef08ae672b98a.png

The utilization rate of a single core is about 60, which is not much different from the single-thread result just now. The CPU utilization has not been fully reached yet. Now increase the number of threads to 18:

057622355ac80e5f338f612a77fe4a61.png

At this time, the single core utilization is close to 100%. It can be seen that when there are I/O and other operations in the thread that do not occupy CPU resources, the operating system can schedule the CPU to execute more threads at the same time.

Now increase the frequency of I/O events and reduce the number of loops to half, 50_000_000, the same 18 threads:

781ee959a955421709f491bd47fde81a.png

At this time, the utilization rate of each core is only about 70%.

Small summary of thread number and CPU utilization

The above example is just an auxiliary. In order to better understand the relationship between the number of threads/program behavior/CPU status, let’s briefly summarize:

  1. An extreme thread (when constantly executing “computing” operations) can fully utilize the utilization of a single core. A multi-core CPU can only execute a maximum number of “extreme” threads equal to the number of cores at the same time.

  2. If each thread is so “extreme” and the number of threads executing at the same time exceeds the number of cores, it will cause unnecessary switching, cause the load to be too high, and will only make the execution slower.

  3. During pause operations such as I/O, the CPU is in an idle state. The operating system schedules the CPU to execute other threads, which can improve CPU utilization and execute more threads at the same time.

  4. The higher the frequency of I/O events or the longer the waiting/pause time, the longer the idle time of the CPU. The lower the utilization rate, the operating system can schedule the CPU to execute more threads.

Formula for Thread Count Planning

The previous foreshadowing is all to help understanding. Now let’s look at the definition in the book. “Java Concurrent Programming in Practice” introduces a formula for calculating the number of threads:

ba69818fc59d45fe28cb4d8bfd591841.png

If you want the program to run to the target CPU utilization, the formula for the number of threads required is:

fd3513473699f595cb5889969eba2894.png

The formula is very clear. Now let’s try it with the above example:

If I expect a target utilization of 90% (multi-core 90), then the number of threads required is:

Number of cores 12 * Utilization rate 0.9 * (1 + 50 (sleep time)/50 (cycle 50_000_000 time consuming)) ≈ 22

Now adjust the number of threads to 22 and see the results:

dab7992a0071fa8c46e4c46a3c508699.png

The CPU utilization is now about 80+, which is close to expectations. Due to the excessive number of threads, some context switching overhead, and the lack of rigorous test cases, it is normal for the actual utilization to be lower.

Changing the formula, you can also calculate CPU utilization by the number of threads:

459a0076984ffc021e918ef4c63891a4.png

Number of threads 22 / (Number of cores 12 * (1 + 50 (sleep time) / 50 (cycle 50_000_000 time consuming))) ≈ 0.9

Although the formula is good, in a real program it isgenerally difficult to get accurate waiting times and calculation times because the program is complex and does more than just “calculate”. There will be a lot of memory reading and writing, calculation, I/O and other composite operations in a piece of code. It is difficult to accurately obtain these two indicators, so calculating the number of threads by formula alone is too ideal.

Number of threads in real program

So in actual programs, or in some Java business systems, what is the appropriate number of threads (thread pool size) to plan?

Let me talk about the conclusion first: There is no fixed answer. First set expectations, such as what I expect the CPU utilization to be, what the load is, what the GC frequency is, and other indicators, and then continuously adjust to a reasonable thread through testing. number.

For example, for an ordinary, SpringBoot-based business system, the default Tomcat container + HikariCP connection pool + G1 recycler, if at this time the project also needs a multi-thread (or thread pool) for the business scenario to execute the business process asynchronously/in parallel.

At this time, if I plan the number of threads according to the above formula, the error will be very large. Because there are already many running threads on this host at this time. Tomcat has its own thread pool, HikariCP also has its own background thread, JVM also has some compiled threads, and even G1 has its own background thread. These threads also run on the current process and current host, and will also occupy CPU resources.

So due to environmental interference, it is difficult to accurately plan the number of threads by relying on formulas alone. It must be verified through testing.

The process is generally as follows:

1. Analyze whether there is interference from other processes on the current host

2. Analyze whether there are other running or possible running threads on the current JVM process

3. Set goals

  • Target CPU Utilization – How high can I tolerate my CPU going?

  • Target GC frequency/pause time – After multi-thread execution, the GC frequency will increase. What is the maximum frequency that can be tolerated and how long is each pause time?

  • Execution efficiency – For example, when batch processing, how many threads do I need to open per unit time to complete the processing in time?

4. Sort out the key points of the link to see if there are any stuck points, because if there are too many threads, the limited resources of some nodes on the link may cause a large number of threads to wait for resources (such as three-party interface current limit, limited number of connection pools , the middleware is under too much pressure to support, etc.)

5. Continuously increase/decrease the number of threads to test, test according to the highest requirements, and finally obtain a number of threads that “satisfies the requirements”

And and and! The concept of thread number in different scenarios is also different:

  1. maxThreads in Tomcat is different under Blocking I/O and No-Blocking I/O

  2. Dubbo still has a single connection by default. There is also a distinction between I/O threads (pools) and business threads (pools). I/O threads are generally not bottlenecks, so there is no need to have too many, but business threads can easily be called bottlenecks.

  3. Redis is also multi-threaded after 6.0, but it is only I/O multi-threaded, and “business” processing is still single-threaded.

So, don’t worry about how many threads to set. There is no standard answer. You must combine the scenario, with goals, and through testing to find the most appropriate number of threads.

Some students may have questions: “There is no pressure on our system. We don’t need such an appropriate number of threads. It’s just a simple asynchronous scenario that doesn’t affect other functions of the system.”

It’s normal. Many internal business systems don’t require much performance, as long as they are stable, easy to use and meet the needs. Then my recommended number of threads is:Number of CPU cores.

Appendix

Java gets the number of CPU cores


Runtime.getRuntime().availableProcessors()//Get the number of logical cores, such as 6 cores and 12 threads, then 12 will be returned

Linux gets the number of CPU cores


# Total number of cores = number of physical CPUs X number of cores per physical CPU
#Total number of logical CPUs = Number of physical CPUs X Number of cores per physical CPU X Number of hyperthreads


# Check the number of physical CPUs
cat /proc/cpuinfo| grep "physical id"| sort| uniq| wc -l


# Check the number of cores in each physical CPU (i.e. the number of cores)
cat /proc/cpuinfo| grep "cpu cores"| uniq


# Check the number of logical CPUs
cat /proc/cpuinfo| grep "processor"| wc -l

-end-

The knowledge points of the article match the official knowledge files, and you can further learn related knowledge. Java Skill TreeHomepageOverview 137428 people are learning the system