Python is a scripting language that has some shortcomings in efficiency and performance compared to compiled languages such as C/C++. However, there are many times when Python’s efficiency is not as exaggerated as imagined. This article summarizes some tips for speeding up the running of Python code.
0. Code optimization principles
This article will introduce many techniques to speed up the running of Python code. Before going into the details of code optimization, you need to understand some basic principles of code optimization.
The first basic principle: don’t optimize prematurely
Many people start writing code with the goal of performance optimization. “It is much easier to make a correct program faster than to make a fast program correct.” Therefore, the prerequisite for optimization is that the code can work properly. Optimizing prematurely may neglect grasping the overall performance indicators. Don’t reverse priorities before getting global results.
The second basic principle: weigh the cost of optimization
Optimization comes at a cost, and it is almost impossible to solve all performance problems. The choice usually faced is time for space or space for time. In addition, development costs also need to be considered.
The third principle: Don’t optimize those irrelevant parts
If every part of the code were to be optimized, these changes would make the code difficult to read and understand. If your code is running slowly, first find where the code is slow, usually the inner loop, and focus on optimizing where it is slow. Elsewhere, a little loss of time makes little difference.
1. Avoid global variables
# This writing method is not recommended. Code time: 26.8 seconds import math size=10000 for x in range(size): for y in range(size): z = math.sqrt(x) + math.sqrt(y)
Many programmers will start by writing some simple scripts in Python. When writing scripts, they are usually accustomed to writing them directly as global variables, such as the above code. However, due to the different implementations of global variables and local variables, code defined in the global scope will run much slower than code defined in a function. By putting script statements into functions, you can typically get a 15% – 30% speedup.
# Recommended writing method. Code time: 20.6 seconds import math def main(): # Define it into a function to reduce the use of all variables size=10000 for x in range(size): for y in range(size): z = math.sqrt(x) + math.sqrt(y) main()
2. Avoid .
2.1 Avoid module and function attribute access
# This writing method is not recommended. Code time: 14.5 seconds import math def computeSqrt(size: int): result = [] for i in range(size): result.append(math.sqrt(i)) return result def main(): size=10000 for _ in range(size): result = computeSqrt(size) main()
Each time .
(attribute access operator) is used, specific methods, such as __getattribute__()
and __getattr__()
, will be triggered. Dictionary operations are performed, so additional time overhead is incurred. Attribute access can be eliminated through the from import
statement.
# Optimize the writing method for the first time. Code time: 10.9 seconds from math import sqrt def computeSqrt(size: int): result = [] for i in range(size): result.append(sqrt(i)) # Avoid the use of math.sqrt return result def main(): size=10000 for _ in range(size): result = computeSqrt(size) main()
In Section 1, we mentioned that the search for local variables will be faster than that of global variables, so for frequently accessed variables sqrt
, you can speed up the operation by changing it to a local variable.
# Second optimization of writing. Code time: 9.9 seconds import math def computeSqrt(size: int): result = [] sqrt = math.sqrt #Assign value to local variable for i in range(size): result.append(sqrt(i)) # Avoid the use of math.sqrt return result def main(): size=10000 for _ in range(size): result = computeSqrt(size) main()
In addition to math.sqrt
, there is also .
in the computeSqrt
function, which is the that calls
method. By assigning this method to a local variable, the use of list
>append.
inside the for
loop in the computeSqrt
function can be completely eliminated.
# Recommended writing method. Code time: 7.9 seconds import math def computeSqrt(size: int): result = [] append = result.append sqrt = math.sqrt #Assign value to local variable for i in range(size): append(sqrt(i)) # Avoid the use of result.append and math.sqrt return result def main(): size=10000 for _ in range(size): result = computeSqrt(size) main()
2.2 Avoid intra-class attribute access
# This writing method is not recommended. Code time: 10.4 seconds import math from typing import List classDemoClass: def __init__(self, value: int): self._value = value def computeSqrt(self, size: int) -> List[float]: result = [] append = result.append sqrt = math.sqrt for _ in range(size): append(sqrt(self._value)) return result def main(): size=10000 for _ in range(size): demo_instance = DemoClass(size) result = demo_instance.computeSqrt(size) main()
The principle of avoiding .
also applies to in-class properties. Accessing self._value
will be slower than accessing a local variable. By assigning frequently accessed intra-class properties to a local variable, you can improve code running speed.
# Recommended writing method. Code time: 8.0 seconds import math from typing import List classDemoClass: def __init__(self, value: int): self._value = value def computeSqrt(self, size: int) -> List[float]: result = [] append = result.append sqrt = math.sqrt value = self._value for _ in range(size): append(sqrt(value)) # Avoid the use of self._value return result def main(): size=10000 for _ in range(size): demo_instance = DemoClass(size) demo_instance.computeSqrt(size) main()
3. Avoid unnecessary abstractions
# Not recommended, code takes 0.55 seconds classDemoClass: def __init__(self, value: int): self.value = value @property def value(self) -> int: return self._value @value.setter def value(self, x: int): self._value = x def main(): size=1000000 for i in range(size): demo_instance = DemoClass(size) value = demo_instance.value demo_instance.value = i main()
Any time you wrap code with additional layers of processing (such as decorators, property access, descriptors), you’re going to make the code slower. In most cases, it is necessary to re-examine the definition of using property accessors. Using getter/setter
functions to access properties is usually a coding style left by C/C++ programmers. If it’s really not necessary, use simple attributes.
# Recommended writing method, code time: 0.33 seconds classDemoClass: def __init__(self, value: int): self.value = value # Avoid unnecessary property accessors def main(): size=1000000 for i in range(size): demo_instance = DemoClass(size) value = demo_instance.value demo_instance.value = i main()
4. Avoid data duplication
4.1 Avoid meaningless data copy
# This writing method is not recommended. The code takes 6.5 seconds. def main(): size=10000 for _ in range(size): value = range(size) value_list = [x for x in value] square_list = [x * x for x in value_list] main()
The value_list
in the above code is completely unnecessary and would create unnecessary data structures or copies.
# Recommended writing method, code time: 4.8 seconds def main(): size=10000 for _ in range(size): value = range(size) square_list = [x * x for x in value] # Avoid meaningless copying main()
Another situation is that you are too paranoid about Python’s data sharing mechanism, do not understand or trust Python’s memory model well, and abuse functions such as copy.deepcopy()
. Usually the copy operation can be eliminated in these codes.
4.2 Do not use intermediate variables when exchanging values
# Not recommended, code takes 0.07 seconds def main(): size=1000000 for _ in range(size): a=3 b = 5 temp=a a = b b=temp main()
The above code creates a temporary variable temp
when exchanging values. Without the help of intermediate variables, the code is more concise and runs faster.
# Recommended writing method, code time: 0.06 seconds def main(): size=1000000 for _ in range(size): a=3 b = 5 a, b = b, a # without intermediate variables main()
4.3 Use join instead of + for string concatenation
# This writing method is not recommended. The code takes 2.6 seconds. import string from typing import List def concatString(string_list: List[str]) -> str: result = '' for str_i in string_list: result + = str_i return result def main(): string_list = list(string.ascii_letters * 100) for _ in range(10000): result = concatString(string_list) main()
When using a + b
to splice strings, since strings in Python are immutable objects, they will allocate a memory space and combine a
and b
are copied to the newly applied memory space respectively. Therefore, if you want to splice n
strings, n-1
intermediate results will be generated. Each intermediate result needs to apply for and copy memory, which seriously affects the operating efficiency. . When using join()
to splice strings, the total memory space that needs to be applied for will be calculated first, then the required memory will be applied for all at once, and each string element will be copied to the memory. go.
# Recommended writing method, code time: 0.3 seconds import string from typing import List def concatString(string_list: List[str]) -> str: return ''.join(string_list) # use join instead of + def main(): string_list = list(string.ascii_letters * 100) for _ in range(10000): result = concatString(string_list) main()
5. Utilize the short-circuit characteristics of if
conditions
# Not recommended, code takes 0.05 seconds from typing import List def concatString(string_list: List[str]) -> str: abbreviations = {'cf.', 'e.g.', 'ex.', 'etc.', 'flg.', 'i.e.', 'Mr.', 'vs.'} abbr_count = 0 result = '' for str_i in string_list: if str_i in abbreviations: result + = str_i return result def main(): for _ in range(10000): string_list = ['Mr.', 'Hat', 'is', 'Chasing', 'the', 'black', 'cat', '.'] result = concatString(string_list) main()
The short-circuit characteristic of if
conditions refers to statements such as if a and b
. When a
is False
, Return directly without calculating b
; for statements such as if a or b
, when a
is True
Will be returned directly without calculating b
. Therefore, in order to save running time, for the or
statement, variables with a higher probability of being True
should be written before or
, and and
should be pushed back.
# Recommended writing method, code takes 0.03 seconds from typing import List def concatString(string_list: List[str]) -> str: abbreviations = {'cf.', 'e.g.', 'ex.', 'etc.', 'flg.', 'i.e.', 'Mr.', 'vs.'} abbr_count = 0 result = '' for str_i in string_list: if str_i[-1] == '.' and str_i in abbreviations: # Take advantage of the short circuit characteristic of if condition result + = str_i return result def main(): for _ in range(10000): string_list = ['Mr.', 'Hat', 'is', 'Chasing', 'the', 'black', 'cat', '.'] result = concatString(string_list) main()
6. Loop optimization
6.1 Use for
loop instead of while
loop
# This writing method is not recommended. Code time: 6.7 seconds def computeSum(size: int) -> int: sum_ = 0 i = 0 while i < size: sum_ + = i i+=1 return sum_ def main(): size=10000 for _ in range(size): sum_ = computeSum(size) main()
Python’s for
loop is much faster than the while
loop.
# Recommended writing method. Code time: 4.3 seconds def computeSum(size: int) -> int: sum_ = 0 for i in range(size): # for loop instead of while loop sum_ + = i return sum_ def main(): size=10000 for _ in range(size): sum_ = computeSum(size) main()
6.2 Use implicit for
loops instead of explicit for
loops
For the above example, you can go one step further and use an implicit for
loop instead of an explicit for
loop.
# Recommended writing method. Code time: 1.7 seconds def computeSum(size: int) -> int: return sum(range(size)) # Implicit for loop instead of explicit for loop def main(): size=10000 for _ in range(size): sum = computeSum(size) main()
6.3 Reduce the calculation of the inner for
loop
# This writing method is not recommended. Code time: 12.8 seconds import math def main(): size=10000 sqrt = math.sqrt for x in range(size): for y in range(size): z = sqrt(x) + sqrt(y) main()
In the above code, sqrt(x)
is located inside the for
loop. It will be recalculated during each training process, which increases the time overhead.
# Recommended writing method. Code time: 7.0 seconds import math def main(): size=10000 sqrt = math.sqrt for x in range(size): sqrt_x = sqrt(x) # Reduce the calculation of the inner for loop for y in range(size): z = sqrt_x + sqrt(y) main()
7. Use numba.jit
We follow the example introduced above and use numba.jit
on this basis. numba
can JIT compile Python functions into machine code for execution, greatly improving the code running speed. For more information about numba
see the home page below:
http://numba.pydata.org/numba.pydata.org
# Recommended writing method. Code time: 0.62 seconds import numba @numba.jit def computeSum(size: float) -> int: sum = 0 for i in range(size): sum + = i return sum def main(): size=10000 for _ in range(size): sum = computeSum(size) main()
8. Choose the appropriate data structure
Python’s built-in data structures such as str
, tuple
, list
, set
, dict
The bottom layer is implemented in C, which is very fast. It is almost impossible to implement a new data structure yourself and achieve the built-in speed in terms of performance.
list
is similar to std::vector
in C++ and is a dynamic array. It will pre-allocate a certain amount of memory space. When the pre-allocated memory space is used up and elements are added to it, it will apply for a larger memory space, then copy all the original elements there, and then destroy the previous memory. space before inserting new elements. The operation is similar when deleting elements. When the used memory space is less than half of the pre-allocated memory space, an additional small memory will be applied for, an element copy will be made, and then the original large memory space will be destroyed. Therefore, if there are frequent addition and deletion operations, and the number of added and deleted elements is large, the efficiency of the list will not be high. In this case, you should consider using collections.deque
. collections.deque
is a double-ended queue that has the characteristics of both stack and queue. It can perform insertion and deletion operations of O(1)
complexity at both ends.
The search operation of list
is also very time-consuming. When you need to frequently find certain elements in list
, or frequently access these elements in order, you can use bisect
to maintain the order of list
objects and store them in order. A binary search is performed to improve search efficiency.
Another common requirement is to find minimum or maximum values. At this time, you can use the heapq
module to convert list
into a heap, so that the time complexity of obtaining the minimum value is O(1)
.
The following web page gives the time complexity of various operations on commonly used Python data structures:
TimeComplexity – Python Wikiwiki.python.org
Reference materials
-
https://zhuanlan.zhihu.com/p/143052860
-
David Beazley & Brian K. Jones. Python Cookbook, Third edition. O’Reilly Media, ISBN: 9781449340377, 2013.
-
Zhang Ying & Lai Yonghao. Writing high-quality code: 91 suggestions for improving Python programs. Machinery Industry Press, ISBN: 9787111467045, 2014.
The knowledge points of the article match the official knowledge files, and you can further learn relevant knowledge. Python entry skill treeHomepageOverview 386872 people are learning the system