Python global interpreter locks and coroutines

Table of Contents

1. Comparison of processes and threads

2. GIL global interpreter lock

1 Introduction

2. Background knowledge

2. What is the GIL interpreter lock?

3.GIL and Lock

4.GIL and multithreading

5. Things to remember

3. Mutex lock

4. Thread queue

1. Why do we still use queues in threads?

2. Use of queues

5. Use of process pool and thread pool

1. What is a process pool and what is a thread pool?

2. What are the benefits of process pool and thread pool?

6. Coroutine

1. The difference between process threads and coroutines

2. Advantages and disadvantages of coroutines

3. Coroutines achieve high concurrency


1. Comparison of processes and threads

  1. Process overhead is much greater than thread overhead
  2. Data between processes is isolated, but data between threads is not.
  3. Thread data is not shared between multiple processes

2.GIL global interpreter lock

1. Introduction

The first thing that needs to be made clear is that GIL is not a feature of Python. It is a concept introduced when implementing the Python parser (CPython). Just like C++ is a set of language (grammar) standards, but it can be compiled into executable code using different compilers. Famous compilers, such as: GCC, INTEL C++, Visual C++, etc. The same is true for Python. The same piece of code can be executed through different Python execution environments such as CPython, PyPy, and Psyco. For example, JPython does not have GIL. However, CPython is the default Python execution environment in most environments. Therefore, in the concept of many people, CPython is Python, and they take it for granted that GIL is a defect of the Python language. So let’s be clear here: GIL is not a feature of Python, and Python does not need to rely on GIL at all.

“Python was designed to have only one thread executing in the main loop at the same time. Although multiple threads can be “run” in the Python interpreter, only one thread is running in the interpreter at any time.”

2. Background knowledge

(1) Python code runs on the interpreter and is executed or interpreted by the interpreter
(2) Types of Python interpreters: CPython Ipython PyPy Jython IronPython
(3) The most commonly used interpreter in the current market is the CPython interpreter
(4) The GIL global interpreter lock exists in CPython
(5) The conclusion is that the GIL lock ensures that only one thread is executed at the same time. All threads must obtain the GIL lock to have execution permissions.

2. What is the GIL interpreter lock

The essence of GIL is a mutex lock. Since it is a mutex lock, the essence of all mutex locks is the same. They turn concurrent operations into serial ones, and in turn control that shared data can only be modified by one task at the same time. To ensure data security
Access to the Python interpreter is controlled by the Global Interpreter Lock (GIL). It is this lock that ensures that only one thread is running at the same time.

One thing is certain: to protect the security of different data, different locks should be added.

'''
# Verify that python test.py will only generate one process
# test.py content
import os,time
print(os.getpid())
time.sleep(1000)
'''
python3test.py
# Under windows
tasklist |findstr python

# Under linux
ps aux |grep python

In summary, if

If multiple threads have target=work, then the execution process is that multiple threads first access the code of the interpreter, that is, get the execution permission, and then hand over the target code to the code of the interpreter for execution.

The code of the interpreter is shared by all threads, so the garbage collection thread may also access the code of the interpreter for execution. This leads to a problem: for the same data 100, thread 1 may execute x=100 at the same time, and Garbage collection performs the operation of recycling 100. There is no clever way to solve this problem, which is locking. The GIL in the figure below ensures that the python interpreter can only execute the code of one task at the same time.

3.GIL and Lock

GIL protects interpreter-level data. To protect the user’s own data, you need to lock it yourself.

4.GIL and multi-threading

With the existence of GIL, only one thread in the same process is executed at the same time.

  • A worker is equivalent to the CPU. At this time, the calculation is equivalent to the worker working. I/O blocking is equivalent to the process of providing the raw materials required for the worker’s work. If there are no raw materials during the worker’s work, the process of the worker’s work Need to stop until raw materials arrive
  • If most of the tasks your factory does require the process of preparing raw materials (I/O intensive), then it doesn’t make much sense no matter how many workers you have. It’s not as good as one person, letting the workers wait for the materials. Go do other jobs. On the other hand, if your factory has all the raw materials, of course, the more workers you have, the higher the efficiency will be.

in conclusion:

For computing, the more CPUs, the better, but for I/O, it is useless to have more CPUs. Of course, for running a program, the execution efficiency will definitely improve with the increase of CPUs ( No matter how big the improvement is, there will always be improvement). This is because a program is basically not pure calculation or pure I/O, so we can only relatively see whether a program is computationally intensive or I/O intensive. type, so as to further analyze whether Python’s multi-threading has any use.

  • We have four tasks that need to be processed. The processing method must be to achieve concurrency effects. The solution can be:
    • Option 1: Start four processes
    • Option 2: Start four threads in one process
  • In the case of single core:
    • If the four tasks are computationally intensive and there are no multiple cores for parallel computing, option one only increases the cost of creating processes, and option two is better.
    • If the four tasks are I/O-intensive, option one is expensive to create a process, and the process switching speed is much slower than threads, option two is better.
  • In multi-core case:
    • If the four tasks are computationally intensive, multi-core means parallel computing. In Python, there is only one thread executing at the same time in a process and multi-core is not needed. Solution 1 is better.
    • If the four tasks are I/O intensive and no amount of cores can solve the I/O problem, option 2 is better.

in conclusion:
Today’s computers are basically multi-core. The efficiency of running multi-threads in Python for computing-intensive tasks cannot bring much improvement in performance, and is not even as good as serial (without a lot of switching). However, for I/O-intensive tasks, The task efficiency has been significantly improved

5. What you need to remember

1. The reason why python has GIL lock is that multiple threads in the same process are actually executing at the same time.
2. Only Python is used to open processes. Other languages generally do not open multiple processes. It is enough to open multiple threads.
3. The cpython interpreter cannot take advantage of multi-cores when running multiple threads. Only by running multiple processes can it take advantage of multi-core advantages. This problem does not exist in other languages.
4. 8-core CPU computer, make full use of my 8-core, at least 8 threads, all 8 threads are calculations—>The computer CPU usage rate is 100%,
5. If there is no GIL lock, if 8 threads are started in one process, it can make full use of CPU resources and run to full capacity of the CPU.
6. A lot of code and modules in the cpython interpreter are written based on the GIL lock mechanism and cannot be changed —》We cannot have 8 cores, but I can only use 1 core now, —-》Enable multiple Process—》Threads started under each process can be scheduled and executed by multiple CPUs
7. cpython interpreter: io-intensive uses multi-threading, and computing-intensive uses multi-process
# -IO-intensive, the CPU will be switched when encountering IO operations. Suppose you open 8 threads, and all 8 threads have IO operations—》IO operations do not consume CPU—》For a period of time, it seems that, in fact, 8 All threads have been executed. It is better to choose multi-threading.

# – Computationally intensive, consumes CPU. If 8 threads are opened, the first thread will always occupy the CPU and will not be scheduled to other threads for execution. The other 7 threads are not executed at all, so we open 8 processes. Each process has one thread, and the threads under 8 processes will be executed by 8 CPUs, resulting in high efficiency.
”’It is better to choose multi-process for computationally intensive tasks. In other languages, multi-thread is chosen instead of multi-process.”’

3. Mutex lock

In the case of multi-threading, if one data is executed at the same time, data confusion will occur.

n = 10
from threading import Lock
import time
def task(lock):
    lock.acquire()
    global n
    temp=n
    time.sleep(0.5)
    n=temp-1
    lock.release()

Trade time for space or trade space for time

from threading import Thread

if __name__ == '__main__':
    tt = []
    lock=Lock()
    for i in range(10):
        t = Thread(target=task, args=(lock, ))
        t.start()
        tt.append(t)
    for j in tt:
        j.join()

    print("main", n)

4. Thread Queue

1. Why queues are still used in threads

The data of multiple threads in the same process is shared, because the queue is a pipe + lock, so the queue is used to ensure the security of the data.

2. Use of queue

class queue.Queue(maxsize=0) first in first out

import queue

q=queue.Queue()
q.put('first')
q.put('second')
q.put('third')

print(q.get())
print(q.get())
print(q.get())
'''
Result (first in, first out):
first
second
third
'''

class queue.LifoQueue(maxsize=0) first in first out # last in fisrt out last in first out

import queue

q=queue.LifoQueue()
q.put('first')
q.put('second')
q.put('third')

print(q.get())
print(q.get())
print(q.get())
'''
Result (last in, first out):
third
second
first
'''

class queue.PriorityQueue(maxsize=0) # A queue that can set priority when storing data

import queue

q=queue.PriorityQueue()
#put enters a tuple. The first element of the tuple is the priority (usually a number, or it can be a comparison between non-numbers). The smaller the number, the higher the priority.
q.put((20,'a'))
q.put((10,'b'))
q.put((30,'c'))

print(q.get())
print(q.get())
print(q.get())
'''
Result (the smaller the number, the higher the priority, and those with higher priority will be dequeued first):
(10, 'b')
(20, 'a')
(30, 'c')
'''

Summarize:

Constructor for a priority queue. maxsize is an integer that sets the upperbound limit on the number of items that can be placed in the queue. Insertion will block once this size has been reached, until queue items are consumed. If maxsize is less than or equal to zero, the queue size is infinite.

Construct a priority queue, where maxsize is an integer that sets the upper limit on the number of items that can be put into the queue. Once this upper limit is reached, insertion blocks until an item in the queue is consumed. If maxsize is less than or equal to 0, the queue length is infinite.

The lowest valued entries are retrieved first (the lowest valued entry is the one returned by sorted(list(entries))[0]). A typical pattern for entries is a tuple in the form: (priority_number, data).

First retrieve the entry with the lowest value (the entry with the lowest value refers to the element with index 0 obtained after sorting the list. Generally, the entry is in the form of a tuple (priority number, data)

exception queue.Empty
Exception raised when non-blocking get() (or get_nowait()) is called on a Queue object which is empty.

When get() or get_nowait() indicating non-blocking is called on an empty queue object, an exception will be thrown.

exception queue.Full
Exception raised when non-blocking put() (or put_nowait()) is called on a Queue object which is full.

When put() or put_nowait() indicating non-blocking is called on a full queue object, an exception will be thrown.

Queue.qsize()
Queue.empty() #return True if empty

Returns True when the queue is empty

Queue.full() # return True if full

Returns True when the queue is full

Queue.put(item, block=True, timeout=None)
Put item into the queue. If optional args block is true and timeout is None (the default), block if necessary until a free slot is available. If timeout is a positive number, it blocks at most timeout seconds and raises the Full exception if no free slot was available within that time. Otherwise (block is false), put an item on the queue if a free slot is immediately available, else raise the Full exception (timeout is ignored in that case).

Put an item into the queue. If the optional argument block is true and timeout is None (the default), blocks if necessary until a free slot becomes available. If the parameter timeout is a positive number, it blocks for up to timeout seconds. If there are no free slots available during this time, a Full exception will be thrown. Otherwise (block is false), if a free slot is available, an item is placed in the queue, otherwise a Full exception is raised (in this case, timeout is ignored).

Queue.put_nowait(item)
Equivalent to put(item, False).

Queue.get(block=True, timeout=None)
Remove and return an item from the queue. If optional args block is true and timeout is None (the default), block if necessary until an item is available. If timeout is a positive number, it blocks at most timeout seconds and raises the Empty exception if no item was available within that time. Otherwise (block is false), return an item if one is immediately available, else raise the Empty exception (timeout is ignored in that case).

Removes and returns an item from the queue. If the optional argument block is true and timeout is None (the default), blocks if necessary until an item is available. If timeout is a positive number, it will block for up to timeout seconds, and if no items are available within that time, an Empty exception will be thrown. Otherwise (block is false), if an item is available, that item is returned, otherwise an Empty exception is raised (in this case, timeout is ignored).

Queue.get_nowait()
Equivalent to get(False).

Two methods are offered to support tracking whether enqueued tasks have been fully processed by daemon consumer threads.

Two methods are provided to support tracking whether tasks entering the queue have been fully processed by the producer's daemon thread.

Queue.task_done()
Indicate that a formerly enqueued task is complete. Used by queue consumer threads. For each get() used to fetch a task, a subsequent call to task_done() tells the queue that the processing on the task is complete.

Tasks previously enqueued are assumed to have completed. and used by queue producers. For each get() used to obtain a task, subsequent calls to task_done() will tell the queue that processing of the task is complete.

If a join() is currently blocking, it will resume when all items have been processed (meaning that a task_done() call was received for every item that had been put() into the queue).

If join() is currently blocking, it will resume when all items have been processed (this means that a task_done() call is received for each item that has been put() into the queue).

Raises a ValueError if called more times than there were items placed in the queue.

If the number of calls exceeds the number of items put into the queue, a ValueError will be raised.

Queue.join()

Block until the queue is consumed

5. Use of process pool and thread pool

1. What is a process pool and what is a thread pool

Pool: Pool, container type, can hold multiple elements
Process pool: Define a pool in advance, and then add processes to this pool. In the future, just put tasks into this pool, and then use any process in this process pool to execute the task.
Thread pool: Define a pool in advance, and then add threads to this pool. In the future, just put tasks into this pool, and then use any process in this thread pool to execute the task.

2. What are the benefits of process pool and thread pool

Process pools and thread pools are commonly used technologies in concurrent programming. They can both be used to manage and reuse the execution of multiple concurrent tasks.

  • Improving performance:
    • Process pools and thread pools can perform multiple tasks in parallel, thereby improving the overall performance of your program. By distributing tasks to multiple processes or threads, you can take advantage of the parallel computing power of multi-core processors to speed up task execution
  • Resource Management:

    • Process pools and thread pools can manage and reuse system resources to avoid the overhead of frequently creating and destroying processes or threads. Creating and destroying processes or threads consumes system resources, and process pools and thread pools can pre-create a certain number of processes or threads when the program starts, and reuse the created processes or threads when tasks need to be performed, thereby reducing resource consumption. waste

  • Control concurrency:

    • Process pools and thread pools can control the number of concurrent tasks and avoid excessive occupation of system resources. By limiting the number of concurrent tasks, you can avoid competition and overload of system resources and ensure program stability and reliability

  • Simplified programming:

    • Process pools and thread pools provide high-level interfaces and abstractions that simplify the complexity of concurrent programming. By using process pools and thread pools, developers can focus on the implementation of tasks without paying attention to underlying concurrency details, improving development efficiency and code readability

In general, process pools and thread pools can improve program performance, simplify concurrent programming, and optimize resource management. They are commonly used technical means in concurrent programming. Choosing to use a process pool or a thread pool requires weighing and choosing based on specific application scenarios and needs.

def task(n, m):
    return n + m

def task1():
    return {'username':'kevin', 'password':123}
"""Open process pool"""
from concurrent.futures import ThreadPoolExecutor,ProcessPoolExecutor


def callback(res):
    print(res) # Future at 0x1ed5a5e5610 state=finished returned int>
    print(res.result()) # 3

def callback1(res):
    print(res) # Future at 0x1ed5a5e5610 state=finished returned int>
    print(res.result()) # {'username': 'kevin', 'password': 123}
    print(res.result().get('username'))
if __name__ == '__main__':
    pool=ProcessPoolExecutor(3) #Define a process pool with 3 processes in it
    ## 2. Throw tasks into the pool

    pool.submit(task, m=1, n=2).add_done_callback(callback)
    pool.submit(task1).add_done_callback(callback1)
    pool.shutdown() # join + close
    print(123)

6. Coroutine

1. The difference between process threads and coroutines

Process: Resource Allocation
Thread: the smallest unit of execution

Coroutines: Concurrency under a single thread (the programmer came up with it himself, it does not actually exist in the operating system)

Concurrency: switching + saving state (previous concurrent switching was actually switching processes or threads)

Coroutines are the most resource-saving, processes are the most resource-consuming, followed by threads

2. Advantages and Disadvantages of Coroutines

  • advantage:
    • The switching overhead of coroutine is smaller, it is a program-level switching, and the operating system cannot sense it at all, so it is more lightweight.
    • Concurrency effects can be achieved within a single thread, maximizing utilization of the CPU.
  • shortcoming:
    • The essence of coroutines is that multiple cores cannot be used in a single thread. One program can open multiple processes, multiple threads can be opened in each process, and coroutines can be opened in each thread.
    • Coroutine refers to a single thread, so once the coroutine blocks, the entire thread will be blocked.

Summarize:

  1. Concurrency must be achieved in only a single thread
  2. Modifying shared data does not require locking
  3. The user program saves multiple control flow context stacks.
  4. Additional: A coroutine automatically switches to other coroutines when encountering I/O operations (how to detect I/O, yield, and greenlet cannot be implemented, so the gevent module (select mechanism) is used)

3. Coroutine to achieve high concurrency

Server:
from gevent import monkey;

monkey.patch_all()
import gevent
from socket import socket
# from multiprocessing import Process
from threading import Thread


def talk(conn):
    while True:
        try:
            data = conn.recv(1024)
            if len(data) == 0: break
            print(data)
            conn.send(data.upper())
        except Exception as e:
            print(e)
    conn.close()


def server(ip, port):
    server = socket()
    server.bind((ip, port))
    server.listen(5)
    while True:
        conn, addr = server.accept()
        # t=Process(target=talk,args=(conn,))
        # t=Thread(target=talk,args=(conn,))
        #t.start()
        gevent.spawn(talk, conn)


if __name__ == '__main__':
    g1 = gevent.spawn(server, '127.0.0.1', 8080)
    g1.join()

Client:
import socket
from threading import current_thread, Thread


def socket_client():
    cli = socket.socket()
    cli.connect(('127.0.0.1', 8080))
    while True:
        ss = '%s say hello' % current_thread().getName()
        cli.send(ss.encode('utf-8'))
        data = cli.recv(1024)
        print(data)


for i in range(5000):
    t = Thread(target=socket_client)
    t.start()