.NET/C#/GC and memory management (including in-depth analysis)

For details, please refer to the reference article: Analysis of .NET interview questions (06)-GC and memory management

1. Object creation and life cycle

A simple summary of the life cycle of an object is: create > use > release, the life cycle of an object in .NET:

  • new creates an object and allocates memory

  • object initialization

  • Object manipulation, use

  • Resource cleanup (unmanaged resources)

  • GC garbage collection

The goal of GC memory management is mainly reference type objects. Reference objects are allocated on the managed heap. Objects in the managed heap are stored sequentially. The managed heap maintains a pointer NextObjPtr, which points to the next object in the heap. allocation location. The basic structure of the managed heap is as follows:

Take the following code as an example to simulate the creation process of an object:

public class User
{
    public int Age { get; set; }
    public string Name { get; set; }

    public string _Name = "123" + "abc";
    public List<string> _Names;
}

Its creation works as follows

  • Object size estimation, 40 bytes in total:

  • Attribute Age value type Int, 4 bytes;

  • Attribute Name, reference type, initially NULL, 4 bytes, pointing to an empty address;

  • The field _Name is initially assigned, and the code will be optimized by the compiler as _Name=”123abc”. A character is two bytes, and the string occupies 2×6 + 8 (additional member: 4-byte TypeHandle address, 4-byte synchronization index block) = 20 bytes, total memory size = string object 20 bytes + _Name points to The memory address of the string is 4 bytes = 24 bytes;

  • The reference type field List_Names initially defaults to NULL, 4 bytes;

  • The initial additional members of the User object (4-byte TypeHandle address, 4-byte synchronization index block) 8 bytes;

  • Memory application: apply for a memory block of 44 bytes, verify from the pointer NextObjPtr whether the space is sufficient, and trigger garbage collection if not enough.

  • Memory allocation: divide the 44-byte memory block from the pointer NextObjPtr.

  • Object initialization: first initialize the additional members of the object, and then call the constructor of the User object to initialize the members. The value type is initialized to 0 by default, and the reference type is initialized to NULL by default;

  • The managed heap pointer is moved backward: the pointer NextObjPtr is moved backward by 44 bytes.

  • Return memory address: Return the memory address of the object to the reference variable.

Second, GC garbage collection

GC is the abbreviation of Garbage Collect, which is an important part of the .NET core mechanism. Her basic working principle is to traverse the objects in the managed heap, mark which objects are used (the ones that are not used are so-called garbage), and then transfer the reachable objects to a continuous address space (also called compression), and the rest All unused object memory is reclaimed.

First of all, it is necessary to emphasize again the structure of the managed heap memory, as shown in the figure below, which clearly shows that only the GC heap is the jurisdiction of the GC. In order to improve memory management efficiency and other factors, the GC heap is divided into multiple parts, of which there are two main parts:

  • Generation 0/1/2: Generation;

  • Large Object Heap (Large Object Heap), large objects larger than 85,000 bytes will be allocated to this area. The main feature of this area is that it will not be easily recycled; even if it is recycled, it will not be compressed (because the object is too large and cannot be moved the cost of copying is too high);

What is trash? The simple understanding is that there are no referenced objects.

The basic process of garbage collection includes the following three key steps:

① tag

First assume that all objects are garbage, traverse each reference object on the heap according to the application root pointer Root, generate a reachable object graph, and mark the objects (reachable objects) that are still in use (actually, in the object synchronization index block turn on a flag in the ).

Among them, the Root pointer saves all the current object references that need to be used. It is actually just a general term, which means that these objects are still in use, mainly including: static object/static field references; thread stack references (local variables, method parameters, stack frame); CPU registers of any referenced objects; objects referenced in root referenced objects; GC Handle table; Freachable queue, etc.

② Clear

Clear operations are performed on all unreachable objects, memory is directly reclaimed for ordinary objects, and objects that implement finalizers (objects that implement destructors) need to be reclaimed separately. After clearing, the memory will become discontinuous, which is the work of step 3.

③ Compression

Transfer the remaining objects to a continuous memory, because the addresses of these objects have changed, and the addresses of the Root and pointers need to be modified to the new addresses after the move.

The schematic diagram of the garbage collection process is as follows:

Is the process of garbage collection quite hard, so it is recommended not to manually call garbage collection GC.Collect() at will, GC will choose the right time and the right way to reclaim memory.

Unmanaged resource recycling

The main ways to release unmanaged resources in .NET are: Finalize() and Dispose().

Dispose():

Dispose needs to be called manually. There are two calling methods in .NET:

//Method 1: display interface calls
SomeType st1 = new SomeType();
//do sth
st1. Dispose();

//Method 2: using() syntax call, automatically execute the Dispose interface
using (var st2 = new SomeType())
{
    //do sth
}

The first method, displaying calls, has obvious disadvantages. If the programmer forgets to call the interface, resources will not be released. Or an exception occurs before the call, of course, this can be avoided by using try…finally.

It is generally recommended to use the second implementation method. It can guarantee that the Dispose interface can be called no matter what. The principle is actually very simple. The IL code of using() is as shown in the figure below, because using is just a grammatical form, and it is essentially try…finally Structure.

Finalize() : finalizer (destructor)

First of all, understand the source of the Finalize method. It comes from the protected virtual method Finalize in System.Object. It cannot be overridden by subclasses, nor can it be displayed and called. Isn’t it a bit strange? . Her role is to release unmanaged resources, and the GC performs recycling, so it can ensure that unmanaged resources can be released.

A brief summary: Finalize() can ensure that unmanaged resources will be released, but it requires a lot of extra work (such as special management of finalized objects), and GC needs to be executed twice to actually release resources. It sounds like there are many shortcomings, her only advantage is that she does not need to display calls.

Some programming opinions or programmers do not recommend that you use Finalize, and try to use Dispose instead. I think the main reason may be: first, the performance of Finalize itself is not good; second, many people do not understand the principle of Finalize, and may abuse it, causing memory loss Give way. So don’t use it at all. In fact, Microsoft recommends that you use it, but it is used together with Dispose, and at the same time implements the IDisposable interface and Finalize (destructor). In fact, many class libraries in FCL are implemented in this way.

This gives you the best of both worlds:

If Dispose is called, the finalizer of the object can be ignored, and the object will be recycled once;

If the programmer forgets to call Dispose, there is another layer of guarantee, GC will be responsible for the release of object resources;

Third, performance optimization suggestions

Try not to manually perform garbage collection: GC.Collect()

The running cost of garbage collection is relatively high (involving the movement of object blocks, traversal to find objects that are no longer used, the setting of many state variables, the call of the Finalize method, etc.), and it also has a great impact on performance, so we are writing programs , you should avoid unnecessary memory allocation, and minimize or avoid using GC.Collect() to perform garbage collection. Generally, GC will perform garbage collection at the most suitable time.

And one more thing to note is that when garbage collection is performed, all threads will be suspended (if the code is still executing when recycling, the state of the object will be unstable and there is no way to recycle it).

Dispose is recommended instead of Finalize

If you understand the principles of GC memory management and Finalize, you can use both Dispose and Finalize for double insurance, otherwise try to use Dispose.

Choose the appropriate garbage collection mechanism: workstation mode, server mode

Personal learning summary:

First understand the creation and life cycle of objects

  • new creates an object and allocates memory

  • object initialization

  • Object manipulation, use

  • Resource cleanup (unmanaged resources)

  • GC garbage collection

Second, understand the basic process of allocating to the managed heap

Object Size Estimation

memory application

memory allocation

object initialization

Managed heap pointer moved back

return memory address

Then the basic working principle of GC is to traverse all reference objects in the managed heap, mark the used objects (also called reachable objects), and then clear the unreachable objects (the memory becomes no longer continuous after clearing), and then put the Transfer objects to a contiguous address space (also called compression)

Finally some interface suggestions about GC:

Try not to manually perform garbage collection: GC.Collect()

Dispose is recommended instead of Finalize

If you understand the principles of GC memory management and Finalize, you can use both Dispose and Finalize for double insurance, otherwise try to use Dispose.