“Multi-threading Killer” Python concurrent programming tool: ThreadPoolExecutor allows you to easily start multiple threads at one time and kill a large number of tasks instantly!

As program complexity and data volume continue to increase, traditional synchronous programming methods can no longer meet the needs of developers. Asynchronous programming emerged, providing higher concurrency performance and better resource utilization.

Python’s concurrent.futures module is a good asynchronous programming tool. It provides a set of interfaces that can facilitate concurrent programming.

Python already has the threading module, so why do we still need these thread pools and process pools? Taking the Python crawler as an example, we need to control the number of threads crawled at the same time. For example, we create 20 or even 100 threads, and only allow 5-10 threads to run at the same time, but 20-100 threads need to be created and destroyed. Threads The creation requires system resources. Is there a better solution?

In fact, you only need to create and run 5-10 threads at the same time. Each thread is assigned a task, and the remaining tasks are queued to wait. When a thread completes the task, the queued tasks can be arranged for this thread to continue execution.

However, it is difficult to write a thread pool perfectly by yourself. You also need to consider thread synchronization in complex situations, and deadlocks can easily occur. Starting from Python3.2, the standard library provides us with the concurrent.futures module, which provides ThreadPoolExecutor and ProcessPoolExecutor code>Two classes realize further abstraction of threading and multiprocessing. They can not only help us automatically schedule threads, but also do:

? The main thread can obtain the status of a certain thread (or task) and the return value.
? When a thread completes, the main thread knows immediately.
? Make the coding interfaces of multi-threads and multi-processes consistent.

Introduction

The concurrent.futures module is a new module introduced in Python 3.2 to support asynchronous execution and efficient concurrent programming in multi-core CPUs and network I/O. This module provides two classes, ThreadPoolExecutor and ProcessPoolExecutor, which simplify the implementation of cross-platform asynchronous programming.

First, let us first understand the two methods of concurrent programming:

1. Multi-process

When concurrent programming is implemented through multi-processing, the program assigns tasks to multiple processes, and these processes can run simultaneously on different CPUs. The processes are independent and each has its own memory space, etc., allowing true parallel execution. However, communication between processes is time-consuming and requires the use of the IPC (Inter-Process Communication) mechanism, and switching between processes is more time-consuming than switching between threads, so the cost of creating a process is higher.

2. Multi-threading

When concurrent programming is implemented through multi-threading, the program assigns tasks to multiple threads, and these threads can run simultaneously on different CPU cores in the same process. The memory space of the process is shared between threads, so the overhead is relatively small. However, it should be noted that in the Python interpreter, threads cannot achieve true parallel execution, because Python has a GIL (global interpreter lock), which ensures that only one thread is running Python code at the same time. Therefore, multiple threads in a Python process cannot execute in parallel, and multi-core CPUs cannot be fully utilized when using multi-threaded programming.

Simple use (cases and usage parameter description)

concurrent.futures is an important tool for performing asynchronous programming in Python. It provides the following two classes:

1. ThreadPoolExecutor

ThreadPoolExecutor creates a thread pool, and tasks can be submitted to this thread pool for execution. ThreadPoolExecutor is easier to use than ProcessPoolExecutor and does not have the overhead of a process. It allows us to perform cross-thread asynchronous programming in a Python interpreter because it circumvents the GIL.

Example:

from concurrent.futures import ThreadPoolExecutor

def test(num):
    print("Threads" num)

# Create a new ThreadPoolExecutor object and specify the maximum number of threads
with ThreadPoolExecutor(max_workers=3) as executor:
    # Submit multiple tasks to the thread pool
    executor.submit(test, 1)
    executor.submit(test, 2)
    executor.submit(test, 3)

Output result:

Thread 1
Thread 2
Thread 3

2. ProcessPoolExecutor

ProcessPoolExecutor creates a process pool, and tasks can be submitted to this process pool for execution. This thread pool should be used when the processing overhead for a single task is large, such as large-scale computing-intensive applications.

Example:

from concurrent.futures import ProcessPoolExecutor

def test(num):
    print("Processes" num)

# Create a new ProcessPoolExecutor object and specify the maximum number of processes
with ProcessPoolExecutor(max_workers=3) as executor:
    # Submit multiple tasks to the process pool
    executor.submit(test, 1)
    executor.submit(test, 2)
    executor.submit(test, 3)

Output result:

Process 2
Process 1
Process 3

Waiting for task completion

1. When ThreadPoolExecutor constructs an instance, pass in the max_workers parameter to set the maximum number of threads that can run simultaneously in the thread pool.

2. Use the submit function to submit the task (function name and parameters) that the thread needs to perform to the thread pool, and return the handle of the task. Note that submit() does not block, but returns immediately.

3. Through the task handle returned by the submit function, you can use the done() method to determine whether the task has ended.

4. Use the cancel() method to cancel the submitted task. If the task is already running in the thread pool, it cannot be canceled.

5. Use the result() method to obtain the return value of the task. Looking at the internal code, we found that this method is blocking.

After submitting tasks, we usually need to wait for them to complete. You can use the following methods:

1. result()

Used to obtain the result of the Future object returned by the submit() method. This method is synchronous and blocks the main thread until the result is obtained or an exception is thrown.

Example:

from concurrent.futures import ThreadPoolExecutor

def test(num):
    print("Tasks" num)

# Create a new ThreadPoolExecutor object and specify the maximum number of threads
with ThreadPoolExecutor(max_workers=3) as executor:
    # Submit multiple tasks to the thread pool and use the result method to wait for the tasks to complete
    future_1 = executor.submit(test, 1)
    future_2 = executor.submit(test, 2)
    future_3 = executor.submit(test, 3)
    print(future_1.result())

Output:

Task 1
Task 2
Task 3
None

2. add_done_callback()

Add a “on completion” callback function to each Future object returned by submit(). The main thread has finished running without waiting for the task to complete. This callback function will be automatically executed when the task is completed.

Example:

from concurrent.futures import ThreadPoolExecutor

def callback(future):
    print("Task done? ", future.done())
    print("Result: ", future.result())

# Create a new ThreadPoolExecutor object and specify the maximum number of threads
with ThreadPoolExecutor(max_workers=3) as executor:
    # Submit multiple tasks to the thread pool and add a "completion" callback function
    future_1 = executor.submit(pow, 2, 4)
    future_2 = executor.submit(pow, 3, 4)
    callback_future_1 = executor.submit(callback, future_1)

Common methods of ThreadPoolExecutor class

Most of the method names in the ThreadPoolExecutor and ProcessPoolExecutor classes are the same, but because one is thread mode and the other is process mode, the underlying logic implementation may be different. Since we use threads ThreadPoolExecutor a lot in our daily development process, we will use ThreadPoolExecutor as the main object for explanation.

When using the thread pool object created by ThreadPoolExecutor, we can use submit, map, shutdown and other methods to operate Threads and tasks in the thread pool.

1. Submit method

The submit method of ThreadPoolExecutor is used to submit tasks to the thread pool for processing. This method returns a Future object, representing the value of the result that will be returned in the future. The syntax of the submit method is as follows:

submit(fn, *args, **kwargs)

Among them, the fn parameter is the function to be executed, and *args and **kwargs are the parameters of fn.

Example:

from concurrent.futures import ThreadPoolExecutor

def multiply(x, y):
    return x*y

with ThreadPoolExecutor(max_workers=3) as executor:
    future = executor.submit(multiply, 10, 5)
    print(future.result()) # 50

2. map method

ThreadPoolExecutor’s map method is used to apply a function to each element in the iterator, which returns an iterator. The syntax of the map method is as follows:

map(func, *iterables, timeout=None, chunksize=1)

Among them, the func parameter is the function to be executed, *iterables is one or more iterators, timeout and chunksize is an optional parameter.

Example:

from concurrent.futures import ThreadPoolExecutor

def square(x):
    return x * x

def cube(x):
    return x * x * x

with ThreadPoolExecutor(max_workers=3) as executor:
    results = executor.map(square, [1, 2, 3, 4, 5])
    for square_result in results:
        print(square_result)

    results = executor.map(cube, [1, 2, 3, 4, 5])
    for cube_result in results:
        print(cube_result)

3. shutdown method

The shutdown method of ThreadPoolExecutor is used to close the thread pool. This method will close the thread pool after all threads have completed execution. The syntax of the shutdown method is as follows:

shutdown(wait=True)

Among them, the wait parameter indicates whether to wait for all tasks to be completed before closing the thread pool. The default is True.

Example:

from concurrent.futures import ThreadPoolExecutor
import time

def task(num):
    print("Task {} is running".format(num))
    time.sleep(1)
    return "Task {} is complete".format(num)

with ThreadPoolExecutor(max_workers=3) as executor:
    futures = [executor.submit(task, i) for i in range(1, 4)]
    executor.shutdown()

Source code analysis

The future in the cocurrent.future module means a future object, which can be understood as an operation completed in the future, which is the basis of asynchronous programming. After the thread pool submit(), the future object is returned. The task is not completed when it is returned, but it will be completed in the future. It can also be called the return container of the task, which stores the results and status of the task.

So how does ThreadPoolExecutor operate this object internally? The following is a brief introduction to part of the code of ThreadPoolExecutor:

1. init method

The main important things in the init method are the task queue and thread collection, which need to be used in other methods.

init source code analysis

2. Submit method

There are two important objects in submit, _base.Future() and _WorkItem() objects, _WorkItem()The object is responsible for running tasks and setting the **future object. Finally, the future object will be returned. You can see that the entire process returns immediately without blocking.

submit source code analysis

Summary

Based on the Python asyncio module, the concurrent.futures module provides a simple and efficient asynchronous programming method for Python, which supports synchronization, threads, processes, etc. Concurrent execution mode provides developers with a more flexible and efficient concurrency solution. We can use submit, map, shutdown and other methods to operate threads and tasks in the thread pool, and use Future objects (the core of asynchronous programming) to manage task status, making task submission, status management and thread pool more convenient management and control.

In the actual development process, we need to choose appropriate asynchronous programming tools and methods based on specific application scenarios to obtain better results. In short, the concurrent.futures module is a very good tool in Python asynchronous programming.

Interested friends will receive a complete set of Python learning materials, including interview questions, resume information, etc. See below for details.

1. Python learning routes in all directions

The technical points in all directions of Python have been compiled to form a summary of knowledge points in various fields. Its usefulness is that you can find corresponding learning resources according to the following knowledge points to ensure that you learn more comprehensively.

2. Essential development tools for Python

The tools have been organized for you, and you can get started directly after installation!

3. Latest Python study notes

When I learn a certain basic and have my own understanding ability, I will read some books or handwritten notes compiled by my seniors. These notes record their understanding of some technical points in detail. These understandings are relatively unique and can be learned. to a different way of thinking.

4. Python video collection

Watch a comprehensive zero-based learning video. Watching videos is the fastest and most effective way to learn. It is easy to get started by following the teacher’s ideas in the video, from basic to in-depth.

5. Practical cases

What you learn on paper is ultimately shallow. You must learn to type along with the video and practice it in order to apply what you have learned into practice. At this time, you can learn from some practical cases.

6. Interview Guide

Resume template

If there is any infringement, please contact us for deletion