What are the jvm garbage collection algorithms and their principles?

Directory

  • Garbage collector
    • 1 Serial collector
    • 2 Parallel collector
    • 3 ParNew Collector
    • 4 CMS collector
    • 5 G1 Recycler
    • Three-color labeling algorithm
      • The process of labeling algorithm
      • Three-color marking algorithm defects
        • Multiple standards
        • Missing mark

Garbage Collector

Garbage collection mechanism, we already know what kind of objects will become garbage. What goes through object recycling – garbage collection algorithm. So who is responsible for recycling?

1 Serial Collector

-XX: + UseSerialGC -XX: + UseSerialOldGC

If a single thread performs garbage collection, there will be a relatively large STW (stop the world) during the collection process, and the operating thread cannot operate during GC. Although STW is relatively simple, it is simple and direct.

The new generation adopts the copying algorithm, and the previous generation adopts the mark-organizing algorithm.

Since the Serial garbage collector is single-threaded, it has the advantage of being simple and less resource intensive; it is suitable for small applications such as mobile applications and desktop applications.

2 Parallel Collector

-XX: + UseParallelGC,-XX: + UseParallelOldGC

Using multi-threads to run GC in parallel will make full use of the CPU, but there will still be stw, which is the new generation and generation garbage collector used by jdk8 by default. Make full use of CPU resources and throughput.

The new generation adopts the copying algorithm, and the previous generation adopts the mark-organizing algorithm.

Parallel garbage collector is suitable for medium-sized applications, especially those that require high throughput, such as: web applications and large-scale enterprise applications
sequence.

3 ParNew Collector

-XX: + UseParNewGC

The operating principle is the same as that of the Parallel collector, which uses multiple threads to perform GC, but the difference is that the ParNew collector can cooperate with the CMS collector. Mainstream solutions:

The ParNew collector is responsible for collecting the new generation, and CMS is responsible for collecting the old generation.

4 CMS Collector

-XX: + UseConcMarkSweepGC

Goal: Minimize stw time and improve user experience. The gc thread and the user thread can truly operate at almost the same time. CMS uses a mark-and-sweep algorithm.

Tagged here are useful objects.

  • Initial mark: Pause all other threads (STW) and record objects that gc roots can directly reference?.

    For example: there is a reference in the local variable table of the thread stack frame that points to the heap space object A, and the heap space variable refers to another object B, then only A is recorded, not B.

  • Concurrency Marking: The process of traversing the entire object graph starting from the directly associated objects of GC Roots. This process is time-consuming, but STW is not required and it can be concurrent with the application thread. transport?. During this process, the user thread and the GC thread are concurrent, which may cause the status of the marked object to change, so the next step needs to be remarking

    The last sentence means that it will cause the omission of the mark:

  • Remarking: In order to correct the marking records of those objects that have changed due to the user program continuing to run during concurrent marking, the pause time of this phase is generally longer than that of the initial marking phase. The time is slightly longer, which is much shorter than the concurrent marking phase. The main method is to use the three-dimensional marking algorithm for re-marking.

  • Concurrent cleaning: Start the user thread, and at the same time, the GC thread starts cleaning the unmarked area. If there are new objects at this stage, they will be marked black and no processing will be done. (Everything that ends up being marked white is garbage)

  • Concurrency reset: Reset the mark data in this GC process.

Summary:

  • Among these steps, the most time-consuming is Concurrent Cleanup, so concurrent processing is adopted
  • The STW format is used for initial marking and re-marking because the marking speed is very fast.
  • Re-marking adopts STW mode, because it is the last step of marking, and it is necessary to ensure that everything marked must be used.

5 G1 Recycler

Used for recycling large objects, and the jdk8 version is not very complete for G1

Three-color marking algorithm

The process of marking algorithm

In the above cms collector, the algorithm used is the three-color marking method.

The three-color labeling method divides objects in memory into three colors: white, gray, and black. The specific labeling process is as follows:

  1. According to the reachability analysis algorithm, traversal access starts from GC Roots. In the initial state, all objects are white and only GC Roots are black.

  2. In the Initial Marking phase, objects directly related to the GC Roots mark are set to Gray.

  3. The Concurrent Marking phase scans the entire reference chain.

    • If there are no child nodes, turn this node black.

    • If there are child nodes, the current node will turn black and the child nodes will turn gray.

  4. Repeat the concurrent marking phase until the gray object has no other child node references.

The scan is completed. At this time, black objects are living objects, and white objects are dead and recyclable objects.

That is, (A, D, E, F, G) are reachable, which are living objects, and (B, C, H) are unreachable and recyclable objects.

The final status corresponding to the three colors is as follows:

Defects in the three-color marking algorithm

The three-color marking algorithm may produce multiple markings or missing markings during the concurrent marking phase because the user thread and the GC thread run at the same time.

Multiple standards

Assume that E has been traversed (turned gray), and the application executes objD.fieldE = null (the reference of D → E is disconnected).

After the reference of D → E is disconnected, the three objects E, F, and G are unreachable and should be recycled. However, because E has turned gray, it will still be treated as a living object and continue to be traversed. The final result is that this part of the object will still be marked as alive, that is, this part of the memory will not be reclaimed by this round of GC.

This part of memory that should have been recycled but was not is called Floating garbage. Floating garbage does not affect the correctness of the application, it just needs to wait until the next round of garbage collection to be cleared.

In addition, for new objects after concurrent marking starts, the usual approach is to directly treat them all as black and not clear them this round. This part of the object may become garbage during the period, which is also considered a part of floating garbage.

Missing mark

Assume that the GC thread has traversed to E (turns gray), and the application thread executes first:

var G = objE.fieldG; objE.fieldG = null; // Gray E breaks reference to white G objD.fieldG = G; // Black D references white G

At this time, switch back to the GC thread. Because E no longer has a reference to G, G will not be set to gray. Although D re-references G, because D is already black, the traversal process will not be redone. .

The final result is: G will always be white, and will eventually be cleared as garbage. This directly affects the correctness of the application and is unacceptable.

Missing bids will only occur when the following two conditions are met at the same time:

  • One or more black objects re-reference the white object; that is, a new reference is added to the black object member variable.
  • The gray object has broken the reference of the white object (direct or indirect reference); that is, the reference of the original member variable of the gray object has changed.

There are also ways to solve it:

You need to record object G in any of the above three steps, and then traverse it as a gray object. For example, put it in a specific collection, wait for the initial GC Roots to be traversed (concurrent marking), and then traverse the objects in the collection during the re-marking stage (re-marking).

var G = objE.fieldG; // 1. Read objE.fieldG = null; // 2. Write objD.fieldG = G; // 3. Write