Comparison of processes and threads
1. The overhead of the process is much greater than the overhead of the thread.
2. Data between processes is isolated, but data between threads is not isolated.
3. Thread data between multiple processes is not shared —–> Let process communication (IPC) ——-> Threads under the process also communicate —-> Queue
GIL global interpreter lock (important theory)
When Python was designed, it was considered that in the main loop, only one thread would be executing at the same time. Although multiple threads can be “running” in the Python interpreter, only one thread is running in the interpreter at any time.
Access to the Python interpreter is controlled by the Global Interpreter Lock (GIL). It is this lock that ensures that there is only one thread at a time.
Process is running
Background information:
1. Python code runs on an interpreter. There is an interpreter to execute or interpret it.
2. Types of Python interpreters:
1. CPython 2, IPython 3, PyPy 4, Jython 5, IronPython
3. The most commonly used (95%) interpreter in the current market is the CPython interpreter
4. The GIL global interpreter lock exists in CPython
5. The conclusion is that only one thread is executing at the same time? The problem you want to avoid is that multiple threads compete for resources.
For example: Now start a thread to recycle garbage data and recycle the variable a=1. Another thread will also use this variable a. When the garbage collection thread has not finished recycling variable a, another thread will snatch this variable. a use.
How to avoid this problem is that in the design of the Python language, a lock is added directly to the interpreter. This lock is to allow only one thread to execute at the same time. The implication is which thread wants to execute. You must first get the lock (GIL). Only when this thread releases the GIL lock can other threads get it and then have execution permissions.
Conclusion: The GIL lock ensures that only one thread is executed at the same time. All threads must obtain the GIL lock to have execution permission
A few questions to remember:
1. Python code runs on an interpreter. There is an interpreter to execute or interpret it.
2. Types of Python interpreters:
1. CPython 2, IPython 3, PyPy 4, Jython 5, IronPython
3. The most commonly used (95%) interpreter in the current market is the CPython interpreter
4. The GIL global interpreter lock exists in CPython
5. The conclusion is that only one thread is executing at the same time? The problem you want to avoid is that multiple threads compete for resources.
For example: Now start a thread to recycle garbage data and recycle the variable a=1. Another thread will also use this variable a. When the garbage collection thread has not finished recycling variable a, another thread will snatch this variable. a use.
How to avoid this problem is that in the design of the Python language, a lock is added directly to the interpreter. This lock is to allow only one thread to execute at the same time. The implication is which thread wants to execute. You must first get the lock (GIL). Only when this thread releases the GIL lock can other threads get it and then have execution permissions.“””Conclusion: GIL lock ensures that only one thread is executed at the same time, and all threads must obtain the GIL lock to have execution permissions”””
“””The following questions need to be understood and memorized”””
1. The reason why python has GIL lock is that multiple threads in the same process are actually executing at the same time.
2. Only Python is used to open processes. Other languages generally do not open multiple processes. It is enough to open multiple threads.
3. The cpython interpreter cannot take advantage of multi-cores when running multiple threads. Only by running multiple processes can it take advantage of multi-core advantages. This problem does not exist in other languages.
4. 8-core CPU computer, make full use of my 8-core, at least 8 threads, all 8 threads are calculations—>The computer CPU usage rate is 100%,
5. If there is no GIL lock, if 8 threads are started in one process, it can make full use of CPU resources and run to full capacity of the CPU.
6. A lot of code and modules in the cpython interpreter are written based on the GIL lock mechanism and cannot be changed —》We cannot have 8 cores, but I can only use 1 core now, —-》Enable multiple Process—》Threads started under each process can be scheduled and executed by multiple CPUs
7. cpython interpreter: io-intensive uses multi-threading, and computing-intensive uses multi-process
# -IO-intensive, the CPU will be switched when encountering IO operations. Suppose you open 8 threads, and all 8 threads have IO operations—》IO operations do not consume CPU—》For a period of time, it seems that, in fact, 8 All threads have been executed. It is better to choose multi-threading.
# – Computationally intensive, consumes CPU. If 8 threads are opened, the first thread will always occupy the CPU and will not be scheduled to other threads for execution. The other 7 threads are not executed at all, so we open 8 processes. Each process has one thread, and the threads under 8 processes will be executed by 8 CPUs, resulting in high efficiency.
”’It is better to choose multi-process for computationally intensive tasks. In other languages, multi-thread is chosen instead of multi-process.”’
Mutex lock
In the case of multi-threading, if one data is executed at the same time, data confusion will occur.
n = 10 from threading import Lock import time def task(lock): lock.acquire() global n temp=n time.sleep(0.5) n=temp-1 lock.release()
Exchange time for space and space for time. Time complexity
from threading import Thread if __name__ == '__main__': tt = [] lock=Lock() for i in range(10): t = Thread(target=task, args=(lock, )) t.start() tt.append(t) for j in tt: j.join() print("main", n)
Thread Queue
Why are queues still used in threads?
Data of multiple threads in the same process is shared
Why does the same process still use queues in the first place?
Because the queue is
pipe + lock
Therefore, queues are used to ensure data security.
Thread queue:
1. First in, first out
2. Last in, first out
3. Priority queue
First in, first out
from multiprocessing import Queue """Thread Queue""" import queue queue.Queue() # The disadvantage of queue.Queue is that its implementation involves multiple locks and condition variables, so it may affect performance and memory efficiency. import queue q=queue.Queue() # infinite, q.put('first') q.put('second') q.put('third') q.put('third') q.put('third') q.put('third') q.put('third') print(q.get()) print(q.get()) print(q.get()) '''Result (first in, first out) first second third '''
LIFO
import queue #Lifo:last in first out q=queue.LifoQueue() q.put('first') q.put('second') q.put('third') print(q.get()) print(q.get()) print(q.get()) ''' Results (last in, first out) third second first '''
Priority Queue
import queue q=queue.PriorityQueue() #put enters a tuple. The first element of the tuple is the priority (usually a number, or it can be a comparison between non-numbers). The smaller the number, the higher the priority. q.put((20,'a')) q.put((10,'b')) q.put((30,'c')) print(q.get()) print(q.get()) print(q.get()) ''' Result (the smaller the number, the higher the priority, and those with higher priority will be dequeued first): (10, 'b') (20, 'a') (30, 'c') '''
The use of process pool and thread pool
Pool: Pool, container type, can hold multiple elements
Process pool: Define a pool in advance, and then add processes to this pool. In the future, you only need to drop tasks into this process pool. Then, there will be any process in this process pool. to perform tasks
Thread pool: Define a pool in advance, and then add threads to this pool. In the future, you only need to drop tasks into this thread pool, and then, there will be any thread in this thread pool. to perform tasks
def task(n, m): return n + m def task1(): return {'username':'kevin', 'password':123} """Open process pool""" from concurrent.futures import ThreadPoolExecutor,ProcessPoolExecutor def callback(res): print(res) # Future at 0x1ed5a5e5610 state=finished returned int> print(res.result()) # 3 def callback1(res): print(res) # Future at 0x1ed5a5e5610 state=finished returned int> print(res.result()) # {'username': 'kevin', 'password': 123} print(res.result().get('username')) if __name__ == '__main__': pool=ProcessPoolExecutor(3) # Define a process pool with 3 processes in it ## 2. Throw tasks into the pool pool.submit(task, m=1, n=2).add_done_callback(callback) pool.submit(task1).add_done_callback(callback1) pool.shutdown() # join + close print(123)
Multi-threaded crawling of web pages
import requests def get_page(url): res=requests.get(url) name=url.rsplit('/')[-1] + '.html' return {'name':name,'text':res.content} def call_back(fut): print(fut.result()['name']) with open(fut.result()['name'],'wb') as f: f.write(fut.result()['text']) if __name__ == '__main__': pool=ThreadPoolExecutor(2) urls=['http://www.baidu.com','http://www.cnblogs.com','http://www.taobao.com'] for url in urls: pool.submit(get_page,url).add_done_callback(call_back)
Coroutine theory
process
Resource allocation
thread
smallest unit of execution
coroutine
It is created by the programmer’s own imagination and does not actually exist in the operating system.
Coroutines are concurrency under a single thread
concurrent:
Switch + save state
The previous concurrent switching was actually switching processes or threads.
Since it is single-threaded, do you think it saves resources?
Coroutines are the most resource-saving, processes are the most resource-consuming, followed by threads
Basically, it is to maximize the utilization of CPU resources.
pip install geventimport gevent
Coroutines achieve high concurrency
Server:
from gevent import monkey; monkey.patch_all() import gevent from socket import socket # from multiprocessing import Process from threading import Thread def talk(conn): while True: try: data = conn.recv(1024) if len(data) == 0: break print(data) conn.send(data.upper()) except Exception as e: print(e) conn.close() def server(ip, port): server = socket() server.bind((ip, port)) server.listen(5) while True: conn, addr = server.accept() t=Process(target=talk,args=(conn,)) t=Thread(target=talk,args=(conn,)) t.start() gevent.spawn(talk, conn) if __name__ == '__main__': g1 = gevent.spawn(server, '127.0.0.1', 8080) g1.join()
Client:
import socket from threading import current_thread, Thread def socket_client(): cli = socket.socket() cli.connect(('127.0.0.1', 8080)) while True: ss = '%s say hello' % current_thread().getName() cli.send(ss.encode('utf-8')) data = cli.recv(1024) print(data) for i in range(5000): t = Thread(target=socket_client) t.start()