Learn how to speed up Python running, and learn how to run it much faster!

Python is a scripting language that has some shortcomings in efficiency and performance compared to compiled languages such as C/C++.
However, there are many times when Python’s efficiency is not as exaggerated as imagined. This article summarizes some tips for speeding up the running of Python code.

Article directory

  • 0. Code optimization principles
          • The first basic principle: don’t optimize prematurely
          • The second basic principle: weigh the cost of optimization
          • The third principle: don’t optimize the parts that don’t matter
  • 1. Avoid global variables
  • 2. Avoid `.`
      • 2.1 Avoid module and function attribute access
      • 2.2 Avoid intra-class attribute access
    • 3. Avoid unnecessary abstractions
    • 4. Avoid data duplication
      • 4.1 Avoid meaningless data copying
      • 4.2 Do not use intermediate variables when exchanging values
      • 4.3 Use join instead of + for string concatenation
    • 5. Take advantage of the short-circuit characteristics of `if` conditions
    • 6. Loop optimization
      • 6.1 Use `for` loop instead of `while` loop
      • 6.2 Use implicit `for` loops instead of explicit `for` loops
      • 6.3 Reduce the calculation of the inner `for` loop
    • 7. Using `numba.jit
    • 8. Choose the right data structure

0. Code optimization principles

This article will introduce many techniques to speed up the running of Python code. Before going into the details of code optimization, you need to understand some basic principles of code optimization.

The first basic principle: don’t optimize prematurely

Many people start writing code with the goal of performance optimization. “It is much easier to make a correct program faster than to make a fast program correct.” Therefore, the prerequisite for optimization is that the code can work properly. Optimizing prematurely may neglect grasping the overall performance indicators. Don’t reverse priorities before getting global results.

The second basic principle: weigh the cost of optimization

Optimization comes at a cost, and it is almost impossible to solve all performance problems. The choice usually faced is time for space or space for time. In addition, development costs also need to be considered.

The third principle: Don’t optimize those irrelevant parts

If every part of the code were to be optimized, these changes would make the code difficult to read and understand. If your code is running slowly, first find where the code is slow, usually the inner loop, and focus on optimizing where it is slow. Elsewhere, a little loss of time makes little difference.

1. Avoid global variables

# This writing method is not recommended. Code time: 26.8 seconds
import math

size=10000
for x in range(size):
    for y in range(size):
        z = math.sqrt(x) + math.sqrt(y)

Many programmers will start by writing some simple scripts in Python. When writing scripts, they are usually accustomed to writing them directly as global variables, such as the above code. However, due to the different implementations of global variables and local variables, code defined in the global scope will run much slower than code defined in a function. By putting script statements into functions, you can typically get a 15% – 30% speedup.

# Recommended writing method. Code time: 20.6 seconds
import math

def main(): # Define it into a function to reduce the use of all variables
    size=10000
    for x in range(size):
        for y in range(size):
            z = math.sqrt(x) + math.sqrt(y)

main()

2. Avoid .

2.1 Avoid module and function attribute access

# This writing method is not recommended. Code time: 14.5 seconds
import math

def computeSqrt(size: int):
    result = []
    for i in range(size):
        result.append(math.sqrt(i))
    return result

def main():
    size=10000
    for _ in range(size):
        result = computeSqrt(size)

main()

Each time . (attribute access operator) is used, specific methods, such as __getattribute__() and __getattr__(), will be triggered. Dictionary operations are performed, so additional time overhead is incurred. Attribute access can be eliminated through the from import statement.

# Optimize the writing method for the first time. Code time: 10.9 seconds
from math import sqrt

def computeSqrt(size: int):
    result = []
    for i in range(size):
        result.append(sqrt(i)) # Avoid the use of math.sqrt
    return result

def main():
    size=10000
    for _ in range(size):
        result = computeSqrt(size)

main()

In Section 1, we mentioned that local variables are searched faster than global variables, so for frequently accessed variables sqrt, you can speed up the operation by changing them to local variables.

# Second optimization of writing. Code time: 9.9 seconds
import math

def computeSqrt(size: int):
    result = []
    sqrt = math.sqrt #Assign value to local variable
    for i in range(size):
        result.append(sqrt(i)) # Avoid the use of math.sqrt
    return result

def main():
    size=10000
    for _ in range(size):
        result = computeSqrt(size)

main()

In addition to math.sqrt, there is also . in the computeSqrt function, which is the that calls list >append method. By assigning this method to a local variable, the use of . inside the for loop in the computeSqrt function can be completely eliminated.

# Recommended writing method. Code time: 7.9 seconds
import math

def computeSqrt(size: int):
    result = []
    append = result.append
    sqrt = math.sqrt #Assign value to local variable
    for i in range(size):
        append(sqrt(i)) # Avoid the use of result.append and math.sqrt
    return result

def main():
    size=10000
    for _ in range(size):
        result = computeSqrt(size)

main()

2.2 Avoid intra-class attribute access

# This writing method is not recommended. Code time: 10.4 seconds
import math
from typing import List

classDemoClass:
    def __init__(self, value: int):
        self._value = value
    
    def computeSqrt(self, size: int) -> List[float]:
        result = []
        append = result.append
        sqrt = math.sqrt
        for _ in range(size):
            append(sqrt(self._value))
        return result

def main():
    size=10000
    for _ in range(size):
        demo_instance = DemoClass(size)
        result = demo_instance.computeSqrt(size)

main()

The principle of avoiding . also applies to in-class properties. Accessing self._value will be slower than accessing a local variable. By assigning frequently accessed in-class properties to a local variable, you can speed up your code.

# Recommended writing method. Code time: 8.0 seconds
import math
from typing import List

classDemoClass:
    def __init__(self, value: int):
        self._value = value
    
    def computeSqrt(self, size: int) -> List[float]:
        result = []
        append = result.append
        sqrt = math.sqrt
        value = self._value
        for _ in range(size):
            append(sqrt(value)) # Avoid the use of self._value
        return result

def main():
    size=10000
    for _ in range(size):
        demo_instance = DemoClass(size)
        demo_instance.computeSqrt(size)

main()

3. Avoid unnecessary abstractions

# Not recommended, code takes 0.55 seconds
classDemoClass:
    def __init__(self, value: int):
        self.value = value

    @property
    def value(self) -> int:
        return self._value

    @value.setter
    def value(self, x: int):
        self._value = x

def main():
    size=1000000
    for i in range(size):
        demo_instance = DemoClass(size)
        value = demo_instance.value
        demo_instance.value = i

main()

Any time you wrap code with additional layers of processing (such as decorators, property access, descriptors), you’re going to make the code slower. In most cases, it is necessary to re-examine the definition of using property accessors. Using getter/setter functions to access properties is usually a coding style left by C/C++ programmers. If it’s really not necessary, use simple attributes.

# Recommended writing method, code time: 0.33 seconds
classDemoClass:
    def __init__(self, value: int):
        self.value = value # Avoid unnecessary property accessors

def main():
    size=1000000
    for i in range(size):
        demo_instance = DemoClass(size)
        value = demo_instance.value
        demo_instance.value = i

main()

4. Avoid data duplication

4.1 Avoid meaningless data copy

# This writing method is not recommended. The code takes 6.5 seconds.
def main():
    size=10000
    for _ in range(size):
        value = range(size)
        value_list = [x for x in value]
        square_list = [x * x for x in value_list]

main()

The value_list in the above code is completely unnecessary and would create unnecessary data structures or copies.

# Recommended writing method, code time: 4.8 seconds
def main():
    size=10000
    for _ in range(size):
        value = range(size)
        square_list = [x * x for x in value] # Avoid meaningless copying

main()

Another situation is that you are too paranoid about Python’s data sharing mechanism, do not understand or trust Python’s memory model well, and abuse functions such as copy.deepcopy(). Usually the copy operation can be eliminated in these codes.

4.2 Do not use intermediate variables when exchanging values

# Not recommended, code takes 0.07 seconds
def main():
    size=1000000
    for _ in range(size):
        a=3
        b = 5
        temp=a
        a = b
        b=temp

main()

The above code creates a temporary variable temp when exchanging values. Without the help of intermediate variables, the code is more concise and runs faster.

# Recommended writing method, code time: 0.06 seconds
def main():
    size=1000000
    for _ in range(size):
        a=3
        b = 5
        a, b = b, a # without intermediate variables

main()

4.3 Use join instead of + for string concatenation

# This writing method is not recommended. The code takes 2.6 seconds.
import string
from typing import List

def concatString(string_list: List[str]) -> str:
    result = ''
    for str_i in string_list:
        result + = str_i
    return result

def main():
    string_list = list(string.ascii_letters * 100)
    for _ in range(10000):
        result = concatString(string_list)

main()

When using a + b to splice strings, since strings in Python are immutable objects, they will allocate a memory space and combine a and b are copied to the newly applied memory space respectively. Therefore, if you want to splice n strings, n-1 intermediate results will be generated. Each intermediate result needs to apply for and copy memory, which seriously affects the operating efficiency. . When using join() to splice strings, the total memory space that needs to be applied for will be calculated first, then the required memory will be applied for all at once, and each string element will be copied to the memory. go.

# Recommended writing method, code time: 0.3 seconds
import string
from typing import List

def concatString(string_list: List[str]) -> str:
    return ''.join(string_list) # Use join instead of +

def main():
    string_list = list(string.ascii_letters * 100)
    for _ in range(10000):
        result = concatString(string_list)

main()

5. Utilize the short-circuit characteristics of if conditions

# Not recommended, code takes 0.05 seconds
from typing import List

def concatString(string_list: List[str]) -> str:
    abbreviations = {<!-- -->'cf.', 'e.g.', 'ex.', 'etc.', 'flg.', 'i.e.', 'Mr.', 'vs.'}
    abbr_count = 0
    result = ''
    for str_i in string_list:
        if str_i in abbreviations:
            result + = str_i
    return result

def main():
    for _ in range(10000):
        string_list = ['Mr.', 'Hat', 'is', 'Chasing', 'the', 'black', 'cat', '.\ ']
        result = concatString(string_list)

main()

The short-circuit characteristic of if conditions refers to statements such as if a and b. When a is False, Return directly without calculating b; for statements such as if a or b, when a is True Will be returned directly without calculating b. Therefore, in order to save running time, for the or statement, variables with a higher probability of being True should be written before or, and and should be pushed back.

# Recommended writing method, code takes 0.03 seconds
from typing import List

def concatString(string_list: List[str]) -> str:
    abbreviations = {<!-- -->'cf.', 'e.g.', 'ex.', 'etc.', 'flg.', 'i.e.', 'Mr.', 'vs.'}
    abbr_count = 0
    result = ''
    for str_i in string_list:
        if str_i[-1] == '.' and str_i in abbreviations: # Take advantage of the short circuit characteristic of if condition
            result + = str_i
    return result

def main():
    for _ in range(10000):
        string_list = ['Mr.', 'Hat', 'is', 'Chasing', 'the', 'black', 'cat', '.\ ']
        result = concatString(string_list)

main()

6. Loop optimization

6.1 Use for loop instead of while loop

# This writing method is not recommended. Code time: 6.7 seconds
def computeSum(size: int) -> int:
    sum_ = 0
    i = 0
    while i < size:
        sum_ + = i
        i+=1
    return sum_

def main():
    size=10000
    for _ in range(size):
        sum_ = computeSum(size)

main()

Python’s for loop is much faster than the while loop.

# Recommended writing method. Code time: 4.3 seconds
def computeSum(size: int) -> int:
    sum_ = 0
    for i in range(size): # for loop instead of while loop
        sum_ + = i
    return sum_

def main():
    size=10000
    for _ in range(size):
        sum_ = computeSum(size)

main()

6.2 Use implicit for loops instead of explicit for loops

For the above example, you can go one step further and use an implicit for loop instead of an explicit for loop.

# Recommended writing method. Code time: 1.7 seconds
def computeSum(size: int) -> int:
    return sum(range(size)) # Implicit for loop instead of explicit for loop

def main():
    size=10000
    for _ in range(size):
        sum = computeSum(size)

main()

6.3 Reduce the calculation of the inner for loop

# This writing method is not recommended. Code time: 12.8 seconds
import math

def main():
    size=10000
    sqrt = math.sqrt
    for x in range(size):
        for y in range(size):
            z = sqrt(x) + sqrt(y)

main()

In the above code, sqrt(x) is located inside the for loop. It will be recalculated during each training process, which increases the time overhead.

# Recommended writing method. Code time: 7.0 seconds
import math

def main():
    size=10000
    sqrt = math.sqrt
    for x in range(size):
        sqrt_x = sqrt(x) # Reduce the calculation of the inner for loop
        for y in range(size):
            z = sqrt_x + sqrt(y)

main()

7. Using `numba.jit

We follow the example introduced above and use numba.jit on this basis. numba can JIT compile Python functions into machine code for execution, greatly improving the code running speed. For more information about numba see the home page below:

http://numba.pydata.org/numba.pydata.org

# Recommended writing method. Code time: 0.62 seconds
import numba

@numba.jit
def computeSum(size: float) -> int:
    sum = 0
    for i in range(size):
        sum + = i
    return sum

def main():
    size=10000
    for _ in range(size):
        sum = computeSum(size)

main()

8. Choose the appropriate data structure

Python’s built-in data structures such as str, tuple, list, set, dict The bottom layer is implemented in C, which is very fast. It is almost impossible to implement a new data structure yourself and achieve the built-in speed in terms of performance.

list is similar to std::vector in C++ and is a dynamic array. It will pre-allocate a certain amount of memory space. When the pre-allocated memory space is used up and elements are added to it, it will apply for a larger memory space, then copy all the original elements there, and then destroy the previous memory. space before inserting new elements. The operation is similar when deleting elements. When the used memory space is less than half of the pre-allocated memory space, an additional small memory will be applied for, an element copy will be made, and then the original large memory space will be destroyed. Therefore, if there are frequent addition and deletion operations, and the number of added and deleted elements is large, the efficiency of the list will not be high. In this case, you should consider using collections.deque. collections.deque is a double-ended queue that has the characteristics of both stack and queue. It can perform insertion and deletion operations of O(1) complexity at both ends.

The search operation of list is also very time-consuming. When you need to frequently search for certain elements in list, or frequently access these elements in order, you can use bisect to maintain the order of the list object and store it in order. A binary search is performed to improve search efficiency.

Another common requirement is to find minimum or maximum values. At this time, you can use the heapq module to convert list into a heap, so that the time complexity of obtaining the minimum value is O(1).

Interested friends will receive a complete set of Python learning materials, including interview questions, resume information, etc. See below for details.

1. Python learning routes in all directions

The technical points in all directions of Python have been compiled to form a summary of knowledge points in various fields. Its usefulness is that you can find corresponding learning resources according to the following knowledge points to ensure that you learn more comprehensively.

img
img

2. Python essential development tools

The tools have been organized for you, and you can get started directly after installation! img

3. Latest Python study notes

When I learn a certain basic and have my own understanding ability, I will read some books or handwritten notes compiled by my seniors. These notes record their understanding of some technical points in detail. These understandings are relatively unique and can be learned. to a different way of thinking.

img

4. Python video collection

Watch a comprehensive zero-based learning video. Watching videos is the fastest and most effective way to learn. It is easy to get started by following the teacher’s ideas in the video, from basic to in-depth.

img

5. Practical cases

What you learn on paper is ultimately shallow. You must learn to type along with the video and practice it in order to apply what you have learned into practice. At this time, you can learn from some practical cases.

img

6. Interview Guide

Resume template

If there is any infringement, please contact us for deletion.