Comparison of processes and threads, GIL global interpreter lock, mutex lock, thread queue, use of process pool and thread pool, multi-thread crawling of web pages, coroutine theory, coroutine to achieve high concurrency

Comparison of processes and threads

1. The overhead of the process is much greater than the overhead of the thread.
2. Data between processes is isolated, but data between threads is not isolated.
3. Thread data between multiple processes is not shared —–> Let process communication (IPC) ——-> Threads under the process also communicate —-> Queue

GIL global interpreter lock (important theory)

When Python was designed, it was considered that in the main loop, only one thread would be executing at the same time. Although multiple threads can be “running” in the Python interpreter, only one thread is running in the interpreter at any time.

Access to the Python interpreter is controlled by the Global Interpreter Lock (GIL). It is this lock that ensures that there is only one thread at a time.
Process is running

Background information:

1. Python code runs on an interpreter. There is an interpreter to execute or interpret it.
2. Types of Python interpreters:
1. CPython 2, IPython 3, PyPy 4, Jython 5, IronPython
3. The most commonly used (95%) interpreter in the current market is the CPython interpreter
4. The GIL global interpreter lock exists in CPython
5. The conclusion is that only one thread is executing at the same time? The problem you want to avoid is that multiple threads compete for resources.
For example: Now start a thread to recycle garbage data and recycle the variable a=1. Another thread will also use this variable a. When the garbage collection thread has not finished recycling variable a, another thread will snatch this variable. a use.
How to avoid this problem is that in the design of the Python language, a lock is added directly to the interpreter. This lock is to allow only one thread to execute at the same time. The implication is which thread wants to execute. You must first get the lock (GIL). Only when this thread releases the GIL lock can other threads get it and then have execution permissions.

Conclusion: The GIL lock ensures that only one thread is executed at the same time. All threads must obtain the GIL lock to have execution permission

A few questions to remember:

1. Python code runs on an interpreter. There is an interpreter to execute or interpret it.
2. Types of Python interpreters:
1. CPython 2, IPython 3, PyPy 4, Jython 5, IronPython
3. The most commonly used (95%) interpreter in the current market is the CPython interpreter
4. The GIL global interpreter lock exists in CPython
5. The conclusion is that only one thread is executing at the same time? The problem you want to avoid is that multiple threads compete for resources.
For example: Now start a thread to recycle garbage data and recycle the variable a=1. Another thread will also use this variable a. When the garbage collection thread has not finished recycling variable a, another thread will snatch this variable. a use.
How to avoid this problem is that in the design of the Python language, a lock is added directly to the interpreter. This lock is to allow only one thread to execute at the same time. The implication is which thread wants to execute. You must first get the lock (GIL). Only when this thread releases the GIL lock can other threads get it and then have execution permissions.

“””Conclusion: GIL lock ensures that only one thread is executed at the same time, and all threads must obtain the GIL lock to have execution permissions”””

“””The following questions need to be understood and memorized”””
1. The reason why python has GIL lock is that multiple threads in the same process are actually executing at the same time.
2. Only Python is used to open processes. Other languages generally do not open multiple processes. It is enough to open multiple threads.
3. The cpython interpreter cannot take advantage of multi-cores when running multiple threads. Only by running multiple processes can it take advantage of multi-core advantages. This problem does not exist in other languages.
4. 8-core CPU computer, make full use of my 8-core, at least 8 threads, all 8 threads are calculations—>The computer CPU usage rate is 100%,
5. If there is no GIL lock, if 8 threads are started in one process, it can make full use of CPU resources and run to full capacity of the CPU.
6. A lot of code and modules in the cpython interpreter are written based on the GIL lock mechanism and cannot be changed —》We cannot have 8 cores, but I can only use 1 core now, —-》Enable multiple Process—》Threads started under each process can be scheduled and executed by multiple CPUs
7. cpython interpreter: io-intensive uses multi-threading, and computing-intensive uses multi-process

# -IO-intensive, the CPU will be switched when encountering IO operations. Suppose you open 8 threads, and all 8 threads have IO operations—》IO operations do not consume CPU—》For a period of time, it seems that, in fact, 8 All threads have been executed. It is better to choose multi-threading.

# – Computationally intensive, consumes CPU. If 8 threads are opened, the first thread will always occupy the CPU and will not be scheduled to other threads for execution. The other 7 threads are not executed at all, so we open 8 processes. Each process has one thread, and the threads under 8 processes will be executed by 8 CPUs, resulting in high efficiency.
”’It is better to choose multi-process for computationally intensive tasks. In other languages, multi-thread is chosen instead of multi-process.”’

Mutex lock

In the case of multi-threading, if one data is executed at the same time, data confusion will occur.

n = 10
from threading import Lock
import time
def task(lock):
    lock.acquire()
    global n
    temp=n
    time.sleep(0.5)
    n=temp-1
    lock.release()

Exchange time for space and space for time. Time complexity

from threading import Thread

if __name__ == '__main__':
    tt = []
    lock=Lock()
    for i in range(10):
        t = Thread(target=task, args=(lock, ))
        t.start()
        tt.append(t)
    for j in tt:
        j.join()

    print("main", n)

Thread Queue

Why are queues still used in threads?

Data of multiple threads in the same process is shared
Why does the same process still use queues in the first place?
Because the queue is
pipe + lock
Therefore, queues are used to ensure data security.

Thread queue:
1. First in, first out
2. Last in, first out
3. Priority queue

First in, first out

 from multiprocessing import Queue

"""Thread Queue"""
import queue
queue.Queue()

# The disadvantage of queue.Queue is that its implementation involves multiple locks and condition variables, so it may affect performance and memory efficiency.
import queue

q=queue.Queue() # infinite,
q.put('first')
q.put('second')
q.put('third')
q.put('third')
q.put('third')
q.put('third')
q.put('third')

print(q.get())
print(q.get())
print(q.get())
'''Result (first in, first out)
first
second
third
'''

LIFO

import queue

#Lifo：last in first out
q=queue.LifoQueue()
q.put('first')
q.put('second')
q.put('third')

print(q.get())
print(q.get())
print(q.get())
'''
Results (last in, first out)
third
second
first
'''

Priority Queue

import queue

q=queue.PriorityQueue()
#put enters a tuple. The first element of the tuple is the priority (usually a number, or it can be a comparison between non-numbers). The smaller the number, the higher the priority.
q.put((20,'a'))
q.put((10,'b'))
q.put((30,'c'))

print(q.get())
print(q.get())
print(q.get())
'''
Result (the smaller the number, the higher the priority, and those with higher priority will be dequeued first):
(10, 'b')
(20, 'a')
(30, 'c')
'''

The use of process pool and thread pool

Pool: Pool, container type, can hold multiple elements

Process pool: Define a pool in advance, and then add processes to this pool. In the future, you only need to drop tasks into this process pool. Then, there will be any process in this process pool. to perform tasks

Thread pool: Define a pool in advance, and then add threads to this pool. In the future, you only need to drop tasks into this thread pool, and then, there will be any thread in this thread pool. to perform tasks

def task(n, m):
    return n + m

def task1():
    return {'username':'kevin', 'password':123}
"""Open process pool"""
from concurrent.futures import ThreadPoolExecutor,ProcessPoolExecutor


def callback(res):
    print(res) # Future at 0x1ed5a5e5610 state=finished returned int>
    print(res.result()) # 3

def callback1(res):
    print(res) # Future at 0x1ed5a5e5610 state=finished returned int>
    print(res.result()) # {'username': 'kevin', 'password': 123}
    print(res.result().get('username'))
if __name__ == '__main__':
    pool=ProcessPoolExecutor(3) # Define a process pool with 3 processes in it
    ## 2. Throw tasks into the pool

    pool.submit(task, m=1, n=2).add_done_callback(callback)
    pool.submit(task1).add_done_callback(callback1)
    pool.shutdown() # join + close
    print(123)

Multi-threaded crawling of web pages

import requests

def get_page(url):
    res=requests.get(url)
    name=url.rsplit('/')[-1] + '.html'
    return {'name':name,'text':res.content}

def call_back(fut):
    print(fut.result()['name'])
    with open(fut.result()['name'],'wb') as f:
        f.write(fut.result()['text'])


if __name__ == '__main__':
    pool=ThreadPoolExecutor(2)
    urls=['http://www.baidu.com','http://www.cnblogs.com','http://www.taobao.com']
    for url in urls:
        pool.submit(get_page,url).add_done_callback(call_back)

Coroutine theory

process
Resource allocation
thread
smallest unit of execution
coroutine

It is created by the programmer’s own imagination and does not actually exist in the operating system.
Coroutines are concurrency under a single thread

concurrent:
Switch + save state

The previous concurrent switching was actually switching processes or threads.

Since it is single-threaded, do you think it saves resources?
Coroutines are the most resource-saving, processes are the most resource-consuming, followed by threads

Basically, it is to maximize the utilization of CPU resources.
pip install gevent

import gevent

Coroutines achieve high concurrency

Server:

from gevent import monkey;

monkey.patch_all()
import gevent
from socket import socket
# from multiprocessing import Process
from threading import Thread


def talk(conn):
    while True:
        try:
            data = conn.recv(1024)
            if len(data) == 0: break
            print(data)
            conn.send(data.upper())
        except Exception as e:
            print(e)
    conn.close()


def server(ip, port):
    server = socket()
    server.bind((ip, port))
    server.listen(5)
    while True:
        conn, addr = server.accept()
        t=Process(target=talk,args=(conn,))
        t=Thread(target=talk,args=(conn,))
        t.start()
        gevent.spawn(talk, conn)


if __name__ == '__main__':
    g1 = gevent.spawn(server, '127.0.0.1', 8080)
    g1.join()

Client:

 import socket
from threading import current_thread, Thread


def socket_client():
    cli = socket.socket()
    cli.connect(('127.0.0.1', 8080))
    while True:
        ss = '%s say hello' % current_thread().getName()
        cli.send(ss.encode('utf-8'))
        data = cli.recv(1024)
        print(data)


for i in range(5000):
    t = Thread(target=socket_client)
    t.start()