As program complexity and data volume continue to increase, traditional synchronous programming methods can no longer meet the needs of developers. Asynchronous programming emerged, providing higher concurrency performance and better resource utilization.
Python’s concurrent.futures
module is a good asynchronous programming tool. It provides a set of interfaces that can facilitate concurrent programming.
Python already has the threading
module, so why do we still need these thread pools and process pools? Taking the Python crawler as an example, we need to control the number of threads crawled at the same time. For example, we create 20 or even 100 threads, and only allow 5-10 threads to run at the same time, but 20-100 threads need to be created and destroyed. Threads The creation requires system resources. Is there a better solution?
In fact, you only need to create and run 5-10 threads at the same time. Each thread is assigned a task, and the remaining tasks are queued to wait. When a thread completes the task, the queued tasks can be arranged for this thread to continue execution.
However, it is difficult to write a thread pool perfectly by yourself. You also need to consider thread synchronization in complex situations, and deadlocks can easily occur. Starting from Python3.2, the standard library provides us with the concurrent.futures
module, which provides ThreadPoolExecutor
and ProcessPoolExecutor
code>Two classes realize further abstraction of threading and multiprocessing. They can not only help us automatically schedule threads, but also do:
-
? The main thread can obtain the status of a certain thread (or task) and the return value.
-
? When a thread completes, the main thread knows immediately.
-
? Make the coding interfaces of multi-threads and multi-processes consistent.
Introduction
The concurrent.futures
module is a new module introduced in Python 3.2 to support asynchronous execution
and efficient concurrent programming in multi-core CPUs and network I/O. This module provides two classes, ThreadPoolExecutor and ProcessPoolExecutor, which simplify the implementation of cross-platform asynchronous programming.
First, let us first understand the two methods of concurrent programming:
1. Multi-process
When concurrent programming is implemented through multi-processing, the program assigns tasks to multiple processes, and these processes can run simultaneously on different CPUs. The processes are independent and each has its own memory space, etc., allowing true parallel execution. However, communication between processes is time-consuming and requires the use of the IPC (Inter-Process Communication) mechanism, and switching between processes is more time-consuming than switching between threads, so the cost of creating a process is higher.
2. Multi-threading
When concurrent programming is implemented through multi-threading, the program assigns tasks to multiple threads, and these threads can run simultaneously on different CPU cores in the same process. The memory space of the process is shared between threads, so the overhead is relatively small. However, it should be noted that in the Python interpreter, threads cannot achieve true parallel execution, because Python has a GIL (global interpreter lock), which ensures that only one thread is running Python code at the same time. Therefore, multiple threads in a Python process cannot execute in parallel, and multi-core CPUs cannot be fully utilized when using multi-threaded programming.
Simple use (cases and usage parameter description)
concurrent.futures
is an important tool for performing asynchronous programming in Python. It provides the following two classes:
1. ThreadPoolExecutor
ThreadPoolExecutor
creates a thread pool, and tasks can be submitted to this thread pool for execution. ThreadPoolExecutor
is easier to use than ProcessPoolExecutor
and does not have the overhead of a process. It allows us to perform cross-thread asynchronous programming in a Python interpreter because it circumvents the GIL.
Example:
from concurrent.futures import ThreadPoolExecutor def test(num): print("Threads" num) # Create a new ThreadPoolExecutor object and specify the maximum number of threads with ThreadPoolExecutor(max_workers=3) as executor: # Submit multiple tasks to the thread pool executor.submit(test, 1) executor.submit(test, 2) executor.submit(test, 3)
Output result:
Thread 1 Thread 2 Thread 3
2. ProcessPoolExecutor
ProcessPoolExecutor
creates a process pool, and tasks can be submitted to this process pool for execution. This thread pool should be used when the processing overhead for a single task is large, such as large-scale computing-intensive applications.
Example:
from concurrent.futures import ProcessPoolExecutor def test(num): print("Processes" num) # Create a new ProcessPoolExecutor object and specify the maximum number of processes with ProcessPoolExecutor(max_workers=3) as executor: # Submit multiple tasks to the process pool executor.submit(test, 1) executor.submit(test, 2) executor.submit(test, 3)
Output result:
Process 2 Process 1 Process 3
Waiting for task completion
1. When ThreadPoolExecutor constructs an instance, pass in the max_workers parameter to set the maximum number of threads that can run simultaneously in the thread pool.
2. Use the submit function to submit the task (function name and parameters) that the thread needs to perform to the thread pool, and return the handle of the task. Note that submit() does not block, but returns immediately.
3. Through the task handle returned by the submit function, you can use the done() method to determine whether the task has ended.
4. Use the cancel() method to cancel the submitted task. If the task is already running in the thread pool, it cannot be canceled.
5. Use the result() method to obtain the return value of the task. Looking at the internal code, we found that this method is blocking.
After submitting tasks, we usually need to wait for them to complete. You can use the following methods:
1. result()
Used to obtain the result of the Future
object returned by the submit()
method. This method is synchronous and blocks the main thread until the result is obtained or an exception is thrown.
Example:
from concurrent.futures import ThreadPoolExecutor def test(num): print("Tasks" num) # Create a new ThreadPoolExecutor object and specify the maximum number of threads with ThreadPoolExecutor(max_workers=3) as executor: # Submit multiple tasks to the thread pool and use the result method to wait for the tasks to complete future_1 = executor.submit(test, 1) future_2 = executor.submit(test, 2) future_3 = executor.submit(test, 3) print(future_1.result())
Output:
Task 1 Task 2 Task 3 None
2. add_done_callback()
Add a “on completion” callback function to each Future
object returned by submit()
. The main thread has finished running without waiting for the task to complete. This callback function will be automatically executed when the task is completed.
Example:
from concurrent.futures import ThreadPoolExecutor def callback(future): print("Task done? ", future.done()) print("Result: ", future.result()) # Create a new ThreadPoolExecutor object and specify the maximum number of threads with ThreadPoolExecutor(max_workers=3) as executor: # Submit multiple tasks to the thread pool and add a "completion" callback function future_1 = executor.submit(pow, 2, 4) future_2 = executor.submit(pow, 3, 4) callback_future_1 = executor.submit(callback, future_1)
Common methods of ThreadPoolExecutor class
Most of the method names in the ThreadPoolExecutor
and ProcessPoolExecutor
classes are the same, but because one is thread mode and the other is process mode, the underlying logic implementation may be different. Since we use threads ThreadPoolExecutor a lot in our daily development process, we will use ThreadPoolExecutor as the main object for explanation.
When using the thread pool object created by ThreadPoolExecutor, we can use submit
, map
, shutdown
and other methods to operate Threads and tasks in the thread pool.
1. Submit method
The submit
method of ThreadPoolExecutor is used to submit tasks to the thread pool for processing. This method returns a Future object, representing the value of the result that will be returned in the future. The syntax of the submit method is as follows:
submit(fn, *args, **kwargs)
Among them, the fn
parameter is the function to be executed, and *args
and **kwargs
are the parameters of fn.
Example:
from concurrent.futures import ThreadPoolExecutor def multiply(x, y): return x*y with ThreadPoolExecutor(max_workers=3) as executor: future = executor.submit(multiply, 10, 5) print(future.result()) # 50
2. map method
ThreadPoolExecutor’s map
method is used to apply a function to each element in the iterator, which returns an iterator. The syntax of the map method is as follows:
map(func, *iterables, timeout=None, chunksize=1)
Among them, the func
parameter is the function to be executed, *iterables
is one or more iterators, timeout
and chunksize
is an optional parameter.
Example:
from concurrent.futures import ThreadPoolExecutor def square(x): return x * x def cube(x): return x * x * x with ThreadPoolExecutor(max_workers=3) as executor: results = executor.map(square, [1, 2, 3, 4, 5]) for square_result in results: print(square_result) results = executor.map(cube, [1, 2, 3, 4, 5]) for cube_result in results: print(cube_result)
3. shutdown method
The shutdown
method of ThreadPoolExecutor is used to close the thread pool. This method will close the thread pool after all threads have completed execution. The syntax of the shutdown method is as follows:
shutdown(wait=True)
Among them, the wait
parameter indicates whether to wait for all tasks to be completed before closing the thread pool. The default is True
.
Example:
from concurrent.futures import ThreadPoolExecutor import time def task(num): print("Task {} is running".format(num)) time.sleep(1) return "Task {} is complete".format(num) with ThreadPoolExecutor(max_workers=3) as executor: futures = [executor.submit(task, i) for i in range(1, 4)] executor.shutdown()
Source code analysis
The future in the cocurrent.future
module means a future object, which can be understood as an operation completed in the future, which is the basis of asynchronous programming. After the thread pool submit(), the future object is returned. The task is not completed when it is returned, but it will be completed in the future. It can also be called the return container of the task, which stores the results and status of the task.
So how does ThreadPoolExecutor operate this object internally? The following is a brief introduction to part of the code of ThreadPoolExecutor:
1. init method
The main important things in the
init
method are the task queue and thread collection, which need to be used in other methods.
init source code analysis
2. Submit method
There are two important objects in submit
, _base.Future()
and _WorkItem()
objects, _WorkItem()
The object is responsible for running tasks and setting the **future
object. Finally, the future
object will be returned. You can see that the entire process returns immediately without blocking.
submit source code analysis
Summary
Based on the Python asyncio
module, the concurrent.futures
module provides a simple and efficient asynchronous programming method for Python, which supports synchronization, threads, processes, etc. Concurrent execution mode provides developers with a more flexible and efficient concurrency solution. We can use submit, map, shutdown and other methods to operate threads and tasks in the thread pool, and use Future objects (the core of asynchronous programming) to manage task status, making task submission, status management and thread pool more convenient management and control.
In the actual development process, we need to choose appropriate asynchronous programming tools and methods based on specific application scenarios to obtain better results. In short, the concurrent.futures
module is a very good tool in Python asynchronous programming.
Interested friends will receive a complete set of Python learning materials, including interview questions, resume information, etc. See below for details.
1. Python learning routes in all directions
The technical points in all directions of Python have been compiled to form a summary of knowledge points in various fields. Its usefulness is that you can find corresponding learning resources according to the following knowledge points to ensure that you learn more comprehensively.
2. Essential development tools for Python
The tools have been organized for you, and you can get started directly after installation!
3. Latest Python study notes
When I learn a certain basic and have my own understanding ability, I will read some books or handwritten notes compiled by my seniors. These notes record their understanding of some technical points in detail. These understandings are relatively unique and can be learned. to a different way of thinking.
4. Python video collection
Watch a comprehensive zero-based learning video. Watching videos is the fastest and most effective way to learn. It is easy to get started by following the teacher’s ideas in the video, from basic to in-depth.
5. Practical cases
What you learn on paper is ultimately shallow. You must learn to type along with the video and practice it in order to apply what you have learned into practice. At this time, you can learn from some practical cases.
6. Interview Guide
Resume template
If there is any infringement, please contact us for deletion