Python memory optimization

In Python, memory management and optimization is a complex topic because it involves the internal mechanisms of the Python interpreter, especially Python’s garbage collection and memory allocation strategies. Python manages memory through an automatic garbage collection mechanism, which mainly includes reference counting and mark-clear algorithms.

Python memory management mechanism:

1. Reference counting
Python uses reference counting internally to track the number of times an object has been referenced. Whenever you create an object, its reference count is set to 1. Whenever the object is referenced by another variable name, or is added to a container (such as a list, tuple, or dictionary, etc.), its reference count is will increase. The reference count is decremented when a variable that references the object is deleted, or when a reference is assigned to a new object. When the object’s reference count drops to 0, the memory is released.

2. Garbage collection
Reference counting cannot solve the problem of circular references (two or more objects refer to each other, but are no longer referenced by other objects). To solve this problem, Python has a garbage collector that tracks these circular references and removes them.
Python’s garbage collector uses a generational collection algorithm, which allocates objects to three different “generations”. Newly created objects are put into the first generation, and if they survive a garbage collection, they are moved to the second generation, and so on. This approach is based on the observation that objects that live longer are likely to survive longer.

3. Memory pool
Python uses a memory pool memory management mechanism to avoid making system calls for each small object. This mechanism is called “partitioning” and it specializes in objects smaller than 256 bytes. This approach improves efficiency by reducing the number of system calls, since requesting and freeing memory is a relatively expensive operation.

Optimization strategy:

1. Use built-in containers:

Python’s built-in containers, such as lists, tuples, and dictionaries, are highly optimized and are often more memory efficient than custom data structures.

2. Data structure selection:

Choose a data structure appropriately based on the amount of data and type of operations. For example, for dense data sets that contain only numeric values, using an array.array or numpy array is often more efficient than using a list.

3. Object pool:

For small objects that are frequently created and destroyed, using an object pool can avoid constant memory allocation and recycling.
Object Pool is a design pattern used to manage the creation and recycling of object caches. The purpose of this mode is to avoid frequent creation and destruction of objects by reusing already created objects, thereby reducing memory allocation and recycling overhead during program running and improving performance.

How to implement object pool

A simple object pool can be implemented using a queue or stack data structure. The following is a simple Python object pool implementation example:

import queue

class ObjectPool:
    def __init__(self, create_func, max_size):
        self._create_func = create_func
        self._pool = queue.Queue(max_size)

    def get(self):
        try:
            return self._pool.get_nowait()
        except queue.Empty:
            return self._create_func()

    def put(self, item):
        try:
            self._pool.put_nowait(item)
        except queue.Full:
            # If the queue is full, ignore or optionally destroy the object
            pass

# Assume there is a creation function for complex objects
def create_expensive_object():
    returnSomeExpensiveObject()

#Create an object pool with a capacity of 10
pool = ObjectPool(create_expensive_object, 10)

# Get an object
obj = pool.get()

# ... user target audience ...

# When finished using it, put the object back into the pool
pool.put(obj)

In the above code, create_func is a function used to create new objects. The object pool will try to get an existing object from the queue, and if the queue is empty, it will create a new object. After using the object, call the put() method to return the object to the pool.

4. Lazy loading:

Loading data only when needed can reduce memory usage.

Lazy loading is a design pattern and optimization strategy that is usually used to postpone the creation of an object, the execution of a calculation, or the occurrence of a process until it is actually needed. This can significantly reduce the startup time and runtime memory footprint of the program, because instead of loading all the resources that may be needed at the beginning, they are loaded at the exact moment they are needed.

Technology to implement lazy loading
Using Properties
class LazyProperty:
    def __init__(self, method):
        self.method = method
        self.method_name = method.__name__

    def __get__(self, obj, cls):
        if not obj:
            return None
        value = self.method(obj)
        setattr(obj, self.method_name, value)
        return value

classMyClass:
    @LazyProperty
    def expensive_to_compute(self):
        print("Computing value...")
        return sum(i * i for i in range(10000))

obj = MyClass()
print(obj.expensive_to_compute) # Calculate and return the result
print(obj.expensive_to_compute) # Return the result directly without calculation

In the above example, the result of the expensive_to_compute method will be calculated and cached on the first access, and subsequent accesses will directly return the cached value.

Use module-level lazy import
# lazy_module.py

def expensive_import():
    from some_expensive_module import ExpensiveClass
    returnExpensiveClass()

In this example, some_expensive_module will only be imported when the expensive_import function is called, not when the module is loaded.

Using Generators
def read_large_file(file_name):
    """Lazy loading lines for large files"""
    for line in open(file_name, "r"):
        yield line

# This allows you to read one line at a time without having to load the entire file into memory at once
for line in read_large_file("large_file.txt"):
    process(line)

Lazy loading is especially useful when working with large data sets or resource-intensive operations, as it can help avoid unnecessary memory consumption and computational overhead. When implementing lazy loading, developers should pay attention to ensuring code clarity and maintainability, and ensuring that lazy-loaded resources can be loaded and initialized correctly when used.

5. Memory analysis and profiling:

Regularly use memory profiling tools, such as memory_profiler or tracemalloc, to find and fix memory problems.

  1. memory_profiler:

    • This is a library for monitoring the memory usage of Python code.
    • Can be run as a standalone program or added as a decorator to your functions.
    • It provides detailed row-level memory usage reporting.
      how to use:
    from memory_profiler import profile
    
    @profile
    def my_func():
        a = [1] * (10**6)
        b = [2] * (2 * 10**7)
        del b
        return a
    
    if __name__ == '__main__':
        my_func()
    

    Running this script will generate a memory usage report for each row.

  2. objgraph:

    • objgraph is a library used to display object reference relationships in Python programs, which can help analyze memory leaks.
    • It can generate object reference diagrams to help visualize memory usage.

    how to use:

    import objgraph
    x = [1]
    y = [x, [x], {<!-- -->'x': x}]
    objgraph.show_refs([y], filename='ref_graph.png') # Create a graphics file to display the reference graph of y.
    
  3. Pympler:

    • Pympler is a library for analyzing Python memory usage, providing a convenient web interface.
    • It can track memory usage, detect memory leaks, and help developers understand memory consumption.

    how to use:

    from pympler import summary, muppy
    all_objects = muppy.get_objects()
    sum1 = summary.summarize(all_objects)
    summary.print_(sum1)
    
  4. tracemalloc:

    • tracemalloc is part of the Python standard library and can trace memory allocations.
    • It can tell you how much memory is allocated on which lines.

    how to use:

    import tracemalloc
    
    tracemalloc.start()
    
    # ... execute code ...
    
    snapshot = tracemalloc.take_snapshot()
    top_stats = snapshot.statistics('lineno')
    
    print("[Top 10]")
    for stat in top_stats[:10]:
        print(stat)
    

6. Reduce citations:

Reduce unnecessary object references and release objects no longer needed in a timely manner.

7. Avoid circular references:

Avoid circular references where possible, or use the Weak Referencesweakref module to handle them.
In Python, a weak reference is a special reference that does not increase an object’s reference count. It allows an object to be referenced without preventing it from being reclaimed by the garbage collector. This is particularly useful in caches or maps, as they can store objects without hampering the object’s lifecycle.

Usage scenarios of weak references:
  • Caching: Weak references are useful when you want to cache a large number of objects without these caches preventing the objects from being recycled.
  • Circular references: Weak references can help solve circular reference problems that can cause memory leaks.
  • Observer Pattern: When you use the Observer pattern, weak references can be used to refer to observers without actually owning them.
Implementation of weak references:

Python provides the weakref module to support weak references. Here are some examples of using weakref:

Weak reference object
import weakref

class MyClass:
    pass

obj = MyClass() # Create an object
r = weakref.ref(obj) # Create a weak reference

print(r()) #Access the object pointed to by the weak reference

del obj # Explicitly delete the object
print(r()) # After the object is deleted, the weak reference returns None

In this example, r is a weak reference to obj. When obj is deleted, r() will return None.

Weak reference dictionary
import weakref

class MyClass:
    pass

obj = MyClass()
weak_dict = weakref.WeakValueDictionary()

weak_dict['primary'] = obj # Add to weak reference dictionary

print(weak_dict['primary']) # Get the object as long as it is alive

del obj # Delete object
print(weak_dict.get('primary')) # The object is recycled and is no longer included in the dictionary.

WeakValueDictionary is a special dictionary whose values only hold weak references to objects. This means that if there are no other strong references pointing to these objects, they can be collected by the garbage collector.
You need to be careful when using weak references because if you don’t hold another active reference to the object, the object may be collected when you don’t expect it. Additionally, not all objects can be weakly referenced; for example, lists and dictionaries cannot be weakly referenced directly unless they are subclassed.

8. Use __slots__:

Use __slots__ when defining a class to limit the attributes that an instance can have. This avoids the use of dynamic dictionaries and thus reduces memory usage.

__slots__ is a class attribute used to declare a fixed set of attributes and can significantly save memory. By default, every class in Python will have a __dict__ attribute, which is a dynamic dictionary that allows us to add arbitrary new attributes to the instance at runtime. While this provides great flexibility, this dynamic allocation also comes with additional memory overhead. When you know in advance the set of attributes a class instance will have and wish to restrict those attributes, you can use __slots__ in place of the instance’s __dict__. By doing this, each instance no longer has its own attribute dictionary, but instead stores its attributes in a small, fixed array, thus reducing memory consumption.

Advantages of using __slots__:
  1. Memory Savings: If you have millions of instances, using __slots__ will significantly save memory.
  2. Faster attribute access: Accessing attributes in a fixed collection is faster than accessing attributes in __dict__ because it does not require going through a hash table.
  3. Prevent dynamic creation of attributes: __slots__ can also prevent the dynamic creation of attributes that are not defined in __slots__, thereby avoiding incorrect attribute assignments.
How to use __slots__:
class MyClass:
    __slots__ = ['name', 'description']

    def __init__(self, name, description):
        self.name = name
        self.description = description

obj = MyClass("Example", "This is an example.")

# Attempts to dynamically add properties will fail
try:
    obj.new_attribute = "Value"
except AttributeError as e:
    print(e) # 'MyClass' object has no attribute 'new_attribute'

In the above example, trying to add new_attribute to obj is not allowed because new_attribute is not in the __slots__ declaration .

Note:
  • Classes using __slots__ can no longer dynamically add attributes not declared in __slots__ to their instances.
  • If a class defines __slots__, then its subclass also needs to define __slots__ to extend the behavior of the parent class, otherwise the subclass instance will regain the default __dict__.
  • __slots__ provides substantial memory savings only for programs with a large number of attributes.
  • Property names listed in __slots__ must be strings.
  • __slots__ should not be used solely to prevent users of a class from adding attributes. The interface of a design class should be controlled through documentation and convention, not through enforced restrictions.

9. String optimization:

In Python, strings are immutable, which means that once created, their contents cannot be changed. Immutability brings some advantages, such as thread safety and only keeping one instance of the same string in memory, but it also means that modification operations to the string may incur unnecessary performance overhead. Here are some strategies for optimizing string operations:

1. Avoid concatenating strings continuously in a loop

Every time you perform a concatenation operation on a string, because strings are immutable, Python actually creates a new string and copies the contents of the old string. Concatenating strings continuously within a loop is particularly inefficient because it produces more and more temporary strings as the loop progresses.

Not recommended:

s = ""
for substring in list_of_strings:
    s + = substring #Inefficient

Recommended method:

Use the str.join() method to create a new string all at once after completing the loop.

s = "".join(list_of_strings)
2. Use string formatting

When you need to create a string containing multiple variables or expressions, it is recommended to use the string formatting function. Python provides a variety of string formatting methods.

Old style % formatting:

name = "John"
age=30
s = "%s is %d years old." % (name, age)

str.format()method:

s = "{} is {} years old.".format(name, age)

f-strings (Python 3.6+):

s = f"{<!-- -->name} is {<!-- -->age} years old."

Not only are f-strings cleaner to read, they are also generally faster than other string formatting methods because they are converted to valid bytecode at runtime.

3. Use generator expressions instead of list comprehensions for string concatenation

When string concatenation comes from iterating over a collection, using a generator expression can save memory because it avoids creating the entire list.

s = "".join(str(number) for number in range(100))
4. Pay attention to string invariance

Some “in-place” modifications to a string may not appear to create new string instances, but in fact they do. For example, methods such as str.replace(), str.lower(), and str.upper() will create new strings, even if The resulting string is the same as the original string.

5. Use built-in methods to process strings

The built-in string methods (such as str.split(), str.strip(), str.find(), etc.) have been optimized. Typically faster and more efficient than manually implemented methods.

6. Consider using the intern method

For a large number of repeated strings, you can use the intern method. This technique ensures that only one copy of the string is stored in memory. This can save memory in certain scenarios.

import sys
s = sys.intern('some long string')

10. Disable debugging tools:

Make sure to disable debugging tools and verbose logging in production environments, as they can take up a lot of memory (Flask and Django offer a debug mode that adds additional logging, error checking, and other diagnostic information).

11. Remove or limit introspection and introspection:

Introspection and self-inspection techniques, such as dir(), type(), repr(), locals(), globals(), etc., are very useful when debugging, but should be avoided or Minimize their use.