GIL global interpreter lock

Article directory

  • GIL global interpreter lock
    • 1. Introduction:
    • 2. What are the commonly used types of Python interpreters?
      • 1. CPython
      • 2. IPython
      • 3.PyPy
      • 4.Jython
      • 5.IronPython
    • 3. Introduction to GIL
    • 4. GIL and Lock
    • 5. GIL and multi-threading
    • Summarize

GIL global interpreter lock

1. Introduction:

First of all, you must understand that GIL is not a feature of Python. In fact, what we usually call the Python interpreter is actually the CPython interpreter, because most Python programs are executed based on this interpreter, and of course the JPython interpreter ( Written in Java), and this GIL is a feature of the CPython interpreter, not a feature of Python.

The full name of GIL: Global Interpreter Lock, official explanation

In CPython, the global interpreter lock, or GIL, is a mutex that prevents multiple native threads from executing Python bytecodes at once. This lock is necessary mainly because CPython’s memory management is not thread-safe. ( However, since the GIL exists, other features have grown to depend on the guarantees that it enforces.)

Conclusion: In the CPython interpreter, when multiple threads are started in the same process, only one thread can be executed at the same time, and the advantages of multi-core cannot be taken advantage of.

2. What are the commonly used types of Python interpreters?

1. CPython

After downloading and installing Python2.7 from the official Python website, you indirectly obtain an official version of the interpreter: Cpython. This interpreter is developed in C language, so it is called CPython. Run python under the named line to start. CPython interpreter, CPython is a particularly widely used Python interpreter.

2. IPython

IPython is an interactive interpreter based on CPython. In other words, IPython is only enhanced in the interactive mode, but the function of executing Python code is exactly the same as CPython. For example, although many domestic browsers have different appearances, But the kernel actually calls IE.

3. PyPy

PyPy is another Python interpreter. Its goal is execution speed. PyPy uses JIT technology to dynamically compile Python code, so it may significantly improve the execution speed of Python code.

4、Jython

Jython is a Python interpreter running on the Java platform, which can directly compile Python code into Java bytecode for execution.

5.IronPython

IronPython is similar to Jython, except that IronPython is a Python interpreter running on the Microsoft .Net platform and can directly compile Python code into .Net bytecode.

To sum up, there are five commonly used programmers for Python. Among Python interpreters, CPython is widely used. For Python compilation

3. Introduction to GIL

GIL is essentially a mutex lock. Since it is a mutex lock, it essentially turns concurrency into serialization, thereby controlling that shared data can only be modified by one task at the same time, thereby ensuring data security.

The effect achieved by GIL is that only one thread can obtain the GIL lock at the same time. After obtaining the GIL lock, the thread can use the interpreter to operate.

Use a small example to illustrate a small part of the role of GIL:

If there is no GIL lock, if there are two threads, one thread is to add data to variables, and the other thread is the thread of the garbage collection mechanism, a problem may arise at this time. One thread has not yet come after the definition data is generated. and bind the memory address of the data to a variable, it will be recycled by the garbage collection mechanism (because the data count is detected to be 0). Since they are all threads under the same process and are executed concurrently, if it is really If executed in this way, it will cause great hidden dangers to our program.

4. GIL and Lock

There may be doubts. Since there is a GIL lock to ensure that only one thread can run at the same time, why do we need a Lock lock?

First of all, we need to reach a consensus: the purpose of locks is to protect shared data, and only one thread can modify shared data at the same time.

Then, we can conclude that different locks should be added to protect different data.

Finally, the problem becomes clear. GIL and Lock are two locks that protect different data. The former is at the interpreter level (of course it protects interpreter-level data, such as garbage collection data), while the latter is at the interpreter level. For the data of applications developed by users themselves, it is obvious that GIL is not responsible for this. Users can only customize the locking process, that is, Lock

GIL protects interpreter-level data. To protect the user’s own data, you need to lock it yourself, as shown below:


Let’s analyze our questions:

  1. If there are 100 threads now, then one of them will grab the GIL lock, let’s call it: Thread 1.
  2. When thread 1 gets the GIL and can use the interpreter, thread 1 uses a function to perform a mutex lock before performing a +1 operation on the global variable count.
  3. At this time, thread 1 has not finished executing the + 1 operation, and thread 2 has snatched the GIL lock. At this time, thread 2 got the GIL and wanted to perform the + 1 operation, but when executing this function, it found that there was a Lock If it is not released, it can only be blocked and the GIL is released.
  4. When thread 1 grabs the GIL again, it continues the + 1 operation completed at the last paused position. After completion, it releases the Lock and GIL. After that, other threads repeat steps 1, 2, 3, and 4.

To protect different data, you need to add different locks

5. GIL and multi-threading

With the existence of GIL, only one thread in the same process is executed at the same time.

The overhead of creating a process is high, but the overhead of threads is small, but it cannot take advantage of multi-core. Is Python going to die?
To solve this problem, we need to agree on several points:

  1. Is the CPU used for calculations or I/O?
  2. Multi-CPU means that multiple cores can complete calculations in parallel, so multi-core improves computing performance.
  3. Once each CPU encounters I/O blocking, it still needs to wait, so multi-core is of little use for I/O operations.

So in our Python, multi-threads operate concurrently, and multiple processes can operate in parallel.

Then when encountering I/O operations in Python, the resources and speed used by threads will be better than those of processes.

And if you calculate, multi-process is more advantageous

Code example: Computationally intensive, multi-process is more efficient

from multiprocessing import Process
from threading import Thread
import os, time


def work():
    res=0
    for i in range(100000000):
        res *= i

if __name__ == '__main__':
    l = []
    print(os.cpu_count()) # This machine has 8 cores
    start = time.time()
    for i in range(4):
        p = Process(target=work) #Multiple processes, takes more than 6 seconds
        # p = Thread(target=work) # Multi-threading, takes more than 19s
        l.append(p)
        p.start()
    for p in l:
        p.join()
    stop = time.time()
    print('run time is %s' % (stop - start))

Code example: I/O intensive, simulates I/O operation waiting time, thread concurrency is more efficient

from multiprocessing import Process
from threading import Thread
import os, time


def work():
    time.sleep(4)


if __name__ == '__main__':
    l = []
    print(os.cpu_count()) # This machine has 8 cores
    start = time.time()
    for i in range(4):
        # p = Process(target=work) #Multiple processes, takes about 4.1s
        p = Thread(target=work) # Multi-threading, takes about 4.005s
        l.append(p)
        p.start()
    for p in l:
        p.join()
    stop = time.time()
    print('run time is %s' % (stop - start))

Because the cost of starting a process will be much greater than that of starting a thread, multi-threading will have more advantages when doing such I/O-intensive operations.

Examples of application scenarios:

Multi-threading is used for IO-intensive tasks such as sockets, crawlers, and web
Multiple processes are used for computationally intensive tasks such as financial analysis

Summary

  1. Because of the existence of GIL, only multi-threading in scenarios with a lot of IO will get better performance.
  2. If you want a program with high parallel computing performance, you can consider converting the core part into a C module, or use multi-process implementation.
  3. The GIL will continue to exist for a longer period of time, but it will be continuously improved.