How to Properly Estimate Java Thread Pool Size: A Comprehensive Guide

Thread creation in Java incurs significant costs. Creating threads consumes time, increases request processing latency, and involves a lot of work for both the JVM and the operating system. To alleviate these overheads, thread pools come into play.

In this article, we’ll delve into the art of determining the ideal thread pool size. A fine-tuned thread pool extracts the best performance from the system and helps us handle peak workloads with ease. However, it is important to remember that even with a thread pool, the management of the threads itself can become a bottleneck.

e284733fc81220970d47ac4a2aeb551c.png

1 Reasons for using thread pool

  • Performance: Thread creation and destruction can be expensive, especially in Java. Thread pools help reduce this overhead by creating threads that can be reused for multiple tasks.

  • Scalability: The thread pool can be scaled to meet the needs of the application. For example, under heavy load, the thread pool can be expanded to handle additional tasks.

  • Resource Management: Thread pools can help manage the resources used by threads. For example, a thread pool can limit the number of threads that can be active at any given time, which helps prevent applications from running out of memory.

2 Adjust the size of the thread pool: Understand system and resource limitations

When sizing a thread pool, it is critical to understand the limitations of your system, including hardware and external dependencies. Let us elaborate on this concept with an example:

Scene:

Suppose we are developing a web application that handles incoming HTTP requests. Each request may involve processing data in the database and calling external third-party services. Our goal is to determine the optimal thread pool size to efficiently handle these requests.

Factors to consider:

Database connection pool: Suppose we use a connection pool such as HikariCP to manage database connections. We have configured it to allow up to 100 connections. If we create more threads than available connections, these extra threads will end up waiting for available connections, causing resource contention and potential performance issues.

The following is an example of configuring a HikariCP database connection pool:

import com.zaxxer.hikari.HikariConfig;
import com.zaxxer.hikari.HikariDataSource;

public class DatabaseConnectionExample {
    public static void main(String[] args) {
        HikariConfig config = new HikariConfig();
        config.setJdbcUrl("jdbc:mysql://localhost:3306/mydb");
        config.setUsername("username");
        config.setPassword("password");
        config.setMaximumPoolSize(100); // Set the maximum number of connections

        HikariDataSource dataSource = new HikariDataSource(config);

        // Use the dataSource to get database connections and perform queries.
    }
}

External service throughput: There are limits on the external services our application interacts with. It can only handle a few requests at the same time, say 10 requests at a time. Sending more requests simultaneously can overwhelm the service and cause performance degradation or errors.

CPU Cores: Determining the number of CPU cores available on the server is critical to optimizing the thread pool size.

int numOfCores = Runtime.getRuntime().availableProcessors();

Each core can execute one thread at the same time. Exceeding the number of CPU cores for threads can cause excessive context switches, which can degrade performance. Search the Java Zhiyin official account, reply to “Java question bank”, and you will receive a Java interview guide.

3CPU-intensive tasks and I/O-intensive tasks

ecf6db20e8e93bba49eb2b8b3f53e3e6.png

CPU-intensive tasks are those that require a lot of processing power, such as performing complex calculations or running simulations. These tasks are typically limited by the speed of the CPU rather than the speed of the I/O device.

  • Encode or decode audio or video files

  • Compile and link software

  • Run complex simulations

  • Perform machine learning or data mining tasks

  • Playing video games

Optimization:

Multi-Threading and Parallelism: Parallel processing is a technique that divides larger tasks into smaller subtasks and distributes these subtasks across multiple CPU cores or processors to take advantage of Execute concurrently and improve overall performance

Suppose we have a large array of numbers, and we want to take advantage of parallel processing by calculating the square of each number simultaneously using multiple threads.

import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;

public class ParallelSquareCalculator {
    public static void main(String[] args) {
        int[] numbers = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
        int numThreads = Runtime.getRuntime().availableProcessors(); // Get the number of CPU cores
        ExecutorService executorService = Executors.newFixedThreadPool(numThreads);

        for (int number : numbers) {
            executorService.submit(() -> {
                int square = calculateSquare(number);
                System.out.println("Square of " + number + " is " + square);
            });
        }

        executorService.shutdown();
        try {
            executorService.awaitTermination(Long.MAX_VALUE, TimeUnit.NANOSECONDS);
        } catch (InterruptedException e) {
            Thread.currentThread().interrupt();
        }
    }

    private static int calculateSquare(int number) {
        // Simulate a time-consuming calculation (e.g., database query, complex computation)
        try {
            Thread.sleep(1000); // Simulate a 1-second delay
        } catch (InterruptedException e) {
            Thread.currentThread().interrupt();
        }

        return number * number;
    }
}

IO-intensive tasks are those that interact with storage devices (e.g., reading/writing files), network sockets (e.g., making API calls), or user input (e.g., user interaction in a graphical user interface).

  • Read or write large files to disk (e.g., save video files, load databases)

  • Download or upload files over the Internet (e.g., browse the web, watch streaming videos)

  • Send and receive emails

  • Run a web server or other network service

  • Execute database query

  • The web server that handles incoming requests.

Optimization:

  • Caching: Cache frequently accessed data in memory to reduce the need for repeated I/O operations.

  • Load Balancing: Distribute I/O-intensive tasks across multiple threads or processes to efficiently handle concurrent I/O operations.

  • Use of SSD: Solid-state drives (SSD) can significantly speed up I/O operations compared to traditional hard disk drives (HDD).

  • Use efficient data structures such as hash tables and B-trees to reduce the number of I/O operations required.

  • Avoid unnecessary file operations, such as opening and closing files multiple times.

Determine the number of threads

4For CPU-intensive tasks:

For CPU-intensive tasks, we want to maximize CPU utilization without overwhelming the system with too many threads, which would result in too many context switches. A common rule of thumb is to use the number of available CPU cores

Real life example: video encoding

Imagine we have a multi-core CPU available and are developing a video processing application. Video encoding is a CPU-intensive task and we need to apply complex algorithms to compress video files. Search the Java Zhiyin official account, reply to “Java question bank”, and you will receive a Java interview guide.

Determine the number of threads for CPU-intensive tasks:

Calculate the number of available CPU cores: Used in Java for Runtime.getRuntime().availableProcessors() to determine the number of available CPU cores. Let’s say we have 8 cores.

Create a thread pool: Create a thread pool with a size close to or slightly smaller than the number of available CPU cores. In this case, we can choose 6 or 7 threads, leaving some CPU capacity for other tasks and system processes.

import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;

public class VideoEncodingApp {
    public static void main(String[] args) {
        int availableCores = Runtime.getRuntime().availableProcessors();
        int numberOfThreads = Math.max(availableCores - 1, 1); // Adjust as needed

        ExecutorService threadPool = Executors.newFixedThreadPool(numberOfThreads);

        // Submit video encoding tasks to the thread pool.
        for (int i = 0; i < 10; i + + ) {
            threadPool.execute(() -> {
                encodeVideo(); // Simulated video encoding task
            });
        }

        threadPool.shutdown();
    }

    private static void encodeVideo() {
        // Simulate video encoding (CPU-bound) task.
        // Complex calculations and compression algorithms here.
    }
}

5For I/O intensive tasks:

For I/O-intensive tasks, the optimal number of threads is usually determined by the nature of the I/O operations and expected latency. We want to have enough threads to keep the I/O devices busy without overloading them. The ideal number does not necessarily equal the number of CPU cores.

Real life example: web scraping

Consider building a web crawler to download web pages and extract information. This involves making HTTP requests, which are I/O-intensive tasks due to network latency.

Determine the number of threads for I/O-intensive tasks:

Analyze I/O latency: Estimate expected I/O latency, depending on network or storage. For example, if each HTTP request takes approximately 500 milliseconds to complete, we may need to accommodate some overlap in I/O operations.

Create a thread pool: Create a thread pool that is sized to balance parallelism with expected I/O latency. Each task does not necessarily require a thread; instead, we can use smaller pools to efficiently manage I/O-intensive tasks.

import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;

public class WebPageCrawler {
    public static void main(String[] args) {
        int expectedIOLatency = 500; // Estimated I/O latency in milliseconds
        int numberOfThreads = 4; // Adjust based on your expected latency and system capabilities

        ExecutorService threadPool = Executors.newFixedThreadPool(numberOfThreads);

        // List of URLs to crawl.
        String[] urlsToCrawl = {
            "https://example.com",
            "https://google.com",
            "https://github.com",
            // Add more URLs here
        };

        for (String url : urlsToCrawl) {
            threadPool.execute(() -> {
                crawlWebPage(url, expectedIOLatency);
            });
        }

        threadPool.shutdown();
    }

    private static void crawlWebPage(String url, int expectedIOLatency) {
        // Simulate web page crawling (I/O-bound) task.
        // Perform HTTP request and process the page content.
        try {
            Thread.sleep(expectedIOLatency); // Simulating I/O latency
        } catch (InterruptedException e) {
            Thread.currentThread().interrupt();
        }
    }
}

6 Is there a specific formula we can follow?

The formula for determining the thread pool size can be written as follows:

Number of Threads = Number of Available Cores * Target CPU Utilization * (1 + Wait Time / Service Time)

Available cores: This is the number of CPU cores available to our application. Note that this is not the same as the number of CPUs, as each CPU may have multiple cores.

Target CPU Utilization: This is the percentage of CPU time we want the application to use. If we set the target CPU utilization too high, our application may become unresponsive. If set too low, our application will not be able to fully utilize the available CPU resources.

Wait Time: This is the time a thread spends waiting for an I/O operation to complete. This may include waiting for a network response, database query, or file operation.

Service Time: This is the amount of time a thread spends performing computations.

Blocking Factor: This is the ratio of wait time to service time. It measures the time a thread spends waiting for an I/O operation to complete relative to the time it takes to perform the computation.

7 usage examples

Suppose we have a server with 4 CPU cores and we want the application to use 50% of the available CPU resources.

Our application has two categories of tasks: I/O-bound tasks and CPU-bound tasks.

I/O-bound tasks have a blocking factor of 0.5, which means they spend 50% of their time waiting for I/O operations to complete.

Number of threads = 4 cores * 0.5 * (1 + 0.5) = 3 threads

CPU-intensive tasks have a blocking factor of 0.1, which means they spend 10% of their time waiting for I/O operations to complete.

Number of threads = 4 cores * 0.5 * (1 + 0.1) = 2.2 threads

In this example, we will create two thread pools, one for I/O-intensive tasks and another for CPU-intensive tasks. An I/O-intensive thread pool will have 3 threads, and a CPU-intensive thread pool will have 2 threads.

This is the formula for determining the size of the Java thread pool summarized based on a large number of cases. However, the focus considered in actual operation may be different, so it needs to be fine-tuned according to the actual scenario. This article provides an idea for determining the optimal value, hoping to help Determining the thread pool size will help you during development!

Source: https://dip-mazumder.medium.com

Back-end exclusive technology group

To build a high-quality technical exchange community, HR personnel engaged in programming development and technical recruitment are welcome to join the group. Everyone is also welcome to share their own company’s internal information, help each other and make progress together!

Speak civilly, use exchange technology, and lead recommendations for positionsindustry discussions

Advertisers are not allowed to enter. Please trust your contacts in private messages to prevent being deceived.

23d3e8df550ab1a153600a0d421e20f0.png

Add me as a friend and bring you into the group
The knowledge points of the article match the official knowledge files, and you can further learn related knowledge. Java skill treeUsing JDBC to operate the databaseDatabase operation 139,410 people are learning the system