JVM source code analysis: processing details of soft, weak and virtual references

Table of Contents

Write in front:

Source code analysis:

Java level:

JVM level:

Danger points of use:

Summarize:


Version Information:

jdk version: jdk8u40
Garbage collector: Serial new/old

Write in front:

Different garbage collectors have different algorithms and different efficiencies. The default is ParallelScavenge new/old in JDK8. The author used Serial new/old when writing the article. The algorithms of the two are the same, but ParallelScavenge new/old takes advantage of multi-threading, so the algorithm details are similar.

For most Java business scenarios, they are strong references, and soft, weak, and virtual references are basically not used. Most of the soft, weak, and virtual references introduced in JDK1.2 occur in the cache, in the JDK class libraries ThreadLocal and WeakHashMap. Framework: Mybatis, Netty, various caching frameworks, etc. As for why they are used in the cache, it is easy to understand, because these references are actually dispensable and perfectly fit the cache. They can sometimes speed up the system and clear the cache for core business use when the system memory is tight.

Source code analysis:

This article will be relatively long and not easy to understand. Because the details of soft, weak, and virtual reference processing are reflected in the Java level and the JVM level, and the JVM level happens to be strongly related to the GC garbage collection details, so the author can only do my best~

Java level:

At the Java level, you have to add some pre-knowledge and how the Java level handles these references.

Basic representation of soft, weak and virtual references

The above figure is the most basic representation of soft, weak, and virtual references. Two different objects need to be distinguished here, one is a soft, weak, and virtual object, and the other is an object with soft, weak, and virtual references.

Soft, weak, virtual objects
Soft, weak, and virtual reference objects

Therefore, the following article needs to introduce the recovery mechanism of soft, weak, and virtual objects and distinguish specific usage scenarios (I believe that everyone has memorized the Eight-Parted Article to some extent, and there will be a slight difference from the Eight-Parted Article here)

Soft: When system resources are tight but not that tight, soft references are recycled based on the least recently used (LRU algorithm). When system resources are very, very tight, all soft references are recycled directly. You can carry reference objects, or you can use ReferenceQueue to handle companion objects.

Weak: It will be recycled whenever GC occurs. You can carry reference objects, or you can use ReferenceQueue to handle companion objects.

Virtual: It will be recycled as long as GC occurs. Reference objects cannot be carried. You can only use ReferenceQueue to handle companion objects

The recycling mechanism of soft, weak, and virtual objects is introduced above. The ReferenceQueue queue is mentioned here, so the following article will introduce how to use ReferenceQueue for recycling at the Java level.

// Static methods in the java.lang.ref.Reference class
static {
    //Create a ReferenceHandler thread.
    Thread handler = new ReferenceHandler(tg, "Reference Handler");

    handler.setPriority(Thread.MAX_PRIORITY);
    handler.setDaemon(true);
    handler.start();
}

A ReferenceHandler thread is created in the static method in the java.lang.ref.Reference class. So let’s look at the execution body of the thread.

public void run() {
    while (true) {
        tryHandlePending(true);
    }
}

static boolean tryHandlePending(boolean waitForNotify) {
    Reference<Object> r;
    …………

    synchronized (lock) {
        if (pending != null) {
            // If pedding is not null, it means that the GC has recovered soft, weak, and virtual references.
            r = pending;
            pending = r.discovered;
            r.discovered = null;
        } else {
            if (waitForNotify) {
                // When the pending linked list has not been generated (that is, the GC has not been triggered to recycle soft, weak, and virtual references)
                // The current thread blocks directly and waits to be awakened by the JVM.
                lock.wait();
            }
            return waitForNotify;
        }
    }
    …………

    // Put the soft, weak, and virtual references recovered by GC into the corresponding ReferenceQueue.
    // Wait for the business to process the ReferenceQueue queue by itself.
    ReferenceQueue<? super Object> q = r.queue;
    if (q != ReferenceQueue.NULL) q.enqueue(r);
    return true;
}

boolean enqueue(Reference<? extends T> r) {
    synchronized (lock) {
        ReferenceQueue<?> queue = r.queue;
        if ((queue == NULL) || (queue == ENQUEUED)) {
            return false;
        }
        //Head insertion method
        r.queue = ENQUEUED;
        r.next = (head == null) ? r : head;
        head = r;
        queueLength + + ;
        lock.notifyAll();
        return true;
    }
}
  1. Determine whether the current pedding is empty
  2. If it is empty, it means that the current GC does not trigger the recovery of soft, weak, and virtual references.
  3. If it is not empty, it means that the current GC recycles soft, weak, and virtual references and puts them into pedding.
  4. Put the pedding value into the ReferenceQueue queue
  5. The ReferenceQueue queue maintained by the business, polls the value from the queue to do the corresponding processing.

Therefore, the ReferenceQueue queue is maintained by the business level itself. When it is passed into Reference, the GC will put the current Reference into the ReferenceQueue queue after recycling soft, weak, and virtual references. At the business level, the Reference is retrieved through poll for corresponding processing (it can be processing of companion objects)

The following is WeakHashMap’s use of ReferenceQueue.

Use of WeakHashMap

At this point, we have finished the processing at the Java level. Next, we need to understand how the JVM GC handles soft, weak, and virtual references, and puts them into pedding, so that the whole loop is closed~

JVM level:

The specific GC recycling process will definitely be ignored in this article and can be treated as a black box~

/hotspot/src/share/vm/memory/genCollectedHeap.cpp file

// /hotspot/src/share/vm/memory/genCollectedHeap.cpp
// Here is the process of GC garbage collection
void GenCollectedHeap::do_collection(bool full,
                                     bool clear_all_soft_refs,
                                     size_t size,
                                     bool is_tlab,
                                     int max_level) {
  …………

  // Do you need to clean up all soft references?
  const bool do_clear_all_soft_refs = clear_all_soft_refs ||
                          collector_policy()->should_clear_all_soft_refs();
  {
    …………

    for (int i = starting_level; i <= max_level; i + + ) {
      if (_gens[i]->should_collect(full, size, is_tlab)) {
        {
          // As you can see from here, different bands have a referenced handler.
          ReferenceProcessor* rp = _gens[i]->ref_processor();

          rp->enable_discovery(true /*verify_disabled*/, true /*verify_no_refs*/);
          //Change recycling strategy
          rp->setup_policy(do_clear_all_soft_refs);
          
          // Garbage collection is performed in different generations.
          _gens[i]->collect(full, do_clear_all_soft_refs, size, is_tlab);

          // After gc recycling, assign the recycled soft, weak, and virtual references to pedding and hand them over to the Java level for processing.
          // This also corresponds to the context.
          if (!rp->enqueuing_is_done()) {
            rp->enqueue_discovered_references();
          } else {
            rp->set_enqueuing_is_done(false);
          }
        }

      }
    }
    …………
  }
}
  1. Here, it is decided based on the policy whether to clean up all soft references (usually only when memory resources are extremely insufficient)
  2. The new generation or old generation garbage collector performs garbage collection (this also corresponds to YGC and FullGC)
  3. After GC recycling, the recovered soft, weak, and virtual references are assigned to pedding and handed over to the Java level for processing.

So next we need to see how the old generation garbage collector handles soft, weak, and virtual references when performing garbage collection.

/hotspot/src/share/vm/memory/defNewGeneration.cpp file

//Garbage collection of the new generation
void DefNewGeneration::collect(bool full,
                               bool clear_all_soft_refs,
                               size_t size,
                               bool is_tlab) {
  …………

  // Used to scan whether soft, weak, and virtual references are alive.
  ScanWeakRefClosure scan_weak_ref(this);

  // Object scanner, used for GC root replication
  FastScanClosure fsc_with_no_gc_barrier(this, false);
  FastScanClosure fsc_with_gc_barrier(this, true);

  // Klass's GC root scan.
  KlassScanClosure klass_scan_closure( & amp;fsc_with_no_gc_barrier,
                                      gch->rem_set()->klass_rem_set());

  // GC Root breadth search scanner
  // That is, find the reference of GC Root as the next batch of GC Root until all surviving objects are found.
  FastEvacuateFollowersClosure evacuate_followers(gch, _level, this,
                                                   &fsc_with_no_gc_barrier,
                                                   & amp;fsc_with_gc_barrier);
  // Find the root GC Root.
  // Because it is a new generation algorithm, the root GC Root will be copied to the to area or the old generation.
  gch->gen_process_strong_roots(_level,
                                true, // Process younger gens, if any,
                                       // as strong roots.
                                true, // activate StrongRootsScope
                                true, // is scavenging
                                SharedHeap::ScanningOption(so),
                                 &fsc_with_no_gc_barrier,
                                true, // walk *all* scavengable nmethods
                                 &fsc_with_gc_barrier,
                                 &klass_scan_closure);

  // Find all references of GC Root based on GC Root
  // Because this is dealing with references, soft, weak, virtual, etc. references will be processed here.
  evacuate_followers.do_void();

  // Used to handle the survival of reference objects.
  FastKeepAliveClosure keep_alive(this, & amp;scan_weak_ref);
  
  ReferenceProcessor* rp = ref_processor();
  //Determine whether to clear all soft references based on the bool field clear_all_soft_refs.
  rp->setup_policy(clear_all_soft_refs);
  // Specific processing details.
  const ReferenceProcessorStats & stats =
  rp->process_discovered_references( & amp;is_alive, & amp;keep_alive, & amp;evacuate_followers,
                                    NULL, _gc_timer);
                                    
  …………
}

The above is the recycling of the new generation during YGC. Whether it is Full GC or YGC, it will process soft, weak, and virtual references, so YGC is selected for analysis (because YGC is simpler, but it does not process soft, weak, and virtual references. it’s the same)

Since the processing of soft, weak, and virtual references will be strongly related to the details of GC recycling, many of them are detailed codes of GC recycling. The author will comment them and treat them as a black box.

  1. Create various scanners required for GC recycling
  2. These scanners ultimately have a common task, which is to copy surviving objects to the to area or the old generation.
  3. Scanning of GC Root
  4. Perform a breadth traversal based on the existing GC Root, find the objects referenced by the GC Root, and continue to find references for the next batch of GC Roots until the entire heap is traversed.
  5. Handling of soft, weak, and false quotes (this is also the focus of the next step)

After all GC Root searches, the object arrangement of the Java heap may be as follows:

Note that there is a difference between the soft, weak, and virtual objects here and the objects referenced by soft, weak, and virtual objects. The copy algorithm will only copy soft, weak, and virtual objects. The object needs to be processed later.

Before looking at the process_discovered_references method of the ReferenceProcessor class, we need to introduce the ReferenceProcessor class.

/hotspot/src/share/vm/memory/referenceProcessor.hpp file

class ReferenceProcessor : public CHeapObj<mtGC> {
protected:
static ReferencePolicy* _default_soft_ref_policy;

static ReferencePolicy* _always_clear_soft_ref_policy;

ReferencePolicy* _current_soft_ref_policy;

uint _num_q;

uint _max_num_q;

// as base address.
DiscoveredList* _discovered_refs;

DiscoveredList* _discoveredSoftRefs; // Based on the first part of the base address
DiscoveredList* _discoveredWeakRefs; // The second part based on the base address
DiscoveredList* _discoveredFinalRefs; // The third part based on the base address
DiscoveredList* _discoveredPhantomRefs; // The fourth part based on the base address
}

You can clearly see that there are policy objects and several DiscoveredList linked lists. The linked list stores the processed soft, weak, and virtual Java objects. And after traversing all GC Roots, soft, weak, and virtual Java objects will be placed in the following linked list.

So next, see the specific processing details of the process_discovered_references method.

/hotspot/src/share/vm/memory/referenceProcessor.cpp file

ReferenceProcessorStats ReferenceProcessor::process_discovered_references(
  BoolObjectClosure* is_alive,
  OopClosure* keep_alive,
  VoidClosure* complete_gc,
  AbstractRefProcTaskExecutor* task_executor,
  GCTimer* gc_timer) {

  _soft_ref_timestamp_clock = java_lang_ref_SoftReference::clock();

  //Soft reference processing
  size_t soft_count = 0;
  {
    GCTraceTime tt("SoftReference", trace_time, false, gc_timer);
    soft_count =
      process_discovered_reflist(_discoveredSoftRefs, _current_soft_ref_policy, true,
                                 is_alive, keep_alive, complete_gc, task_executor);
  }

  //Modify timestamp.
  // The timestamp is used in the LRU algorithm to find the least recently used soft reference.
  update_soft_ref_master_clock();

  // Handling of weak references
  size_t weak_count = 0;
  {
    GCTraceTime tt("WeakReference", trace_time, false, gc_timer);
    weak_count =
      process_discovered_reflist(_discoveredWeakRefs, NULL, true,
                                 is_alive, keep_alive, complete_gc, task_executor);
  }

  // Final reference processing, this is generally used for finishing work
  size_t final_count = 0;
  {
    GCTraceTime tt("FinalReference", trace_time, false, gc_timer);
    final_count =
      process_discovered_reflist(_discoveredFinalRefs, NULL, false,
                                 is_alive, keep_alive, complete_gc, task_executor);
  }

  // Virtual reference processing
  size_t phantom_count = 0;
  {
    GCTraceTime tt("PhantomReference", trace_time, false, gc_timer);
    phantom_count =
      process_discovered_reflist(_discoveredPhantomRefs, NULL, false,
                                 is_alive, keep_alive, complete_gc, task_executor);
  }

  return ReferenceProcessorStats(soft_count, weak_count, final_count, phantom_count);
}

It can be seen that the process_discovered_reflist method is called to handle soft, weak, and virtual references.

/hotspot/src/share/vm/memory/referenceProcessor.cpp file

size_t
ReferenceProcessor::process_discovered_reflist(
  DiscoveredList refs_lists[],
  ReferencePolicy* policy,
  bool clear_referent,
  BoolObjectClosure* is_alive,
  OopClosure* keep_alive,
  VoidClosure* complete_gc,
  AbstractRefProcTaskExecutor* task_executor)
{
  //Determine whether the reference can be processed based on the policy.
  // Strategies are only available for soft references.
  // Weak and virtual references are not equipped with policies. Weak and virtual references will be recycled as long as GC occurs.
  if (policy != NULL) {
    for (uint i = 0; i < _max_num_q; i + + ) {
      process_phase1(refs_lists[i], policy,
                     is_alive, keep_alive, complete_gc);
    }
  }

  // Traverse the remaining queues and continue filtering
  // This filtering is to determine whether the object referenced by the soft, weak, or virtual object is still alive. If it is alive, the reference cannot be processed.
  for (uint i = 0; i < _max_num_q; i + + ) {
    process_phase2(refs_lists[i], is_alive, keep_alive, complete_gc);
  }

  //Determine whether to finally process the reference based on the clear_referent variable.
  for (uint i = 0; i < _max_num_q; i + + ) {
    process_phase3(refs_lists[i], clear_referent,
                   is_alive, keep_alive, complete_gc);
  }
  return total_list_count;
}
  1. Soft references only have policies, and the policy determines whether to recycle the object. If the policy does not allow the object to be recycled, then it needs to be removed from the DiscoveredList list and kept alive until the next GC tries to recycle it.
  2. After the policy decision-making, the surviving objects continue to be filtered. This filtering is to determine whether the objects referenced by soft, weak, and virtual objects are still alive. If they are alive, the reference cannot be processed (so if it is not used well, memory leaks may occur at any time) ), if the reference object is alive, then it needs to be removed from the DiscoveredList linked list and kept alive until the next GC tries to recycle it.
  3. After the second step of filtering, the surviving objects will still decide whether to process the reference object based on the clear_referent variable. In this step, only virtual references cannot handle references (because virtual objects cannot reference objects). If clear_reference is false, then you need to remove it from the DiscoveredList list and keep it alive until the next GC tries to recycle it. However, if the virtual reference is false, it will not be processed again. It doesn’t matter, because it points to null.

So next we can look at the strategic processing of soft references.

It’s relatively simple here. Either recycle forever, never recycle, or obtain the least recently used soft references based on the LRU algorithm and recycle useless ones first~

So I wrote at the top of this article: Soft references, when the memory is tight but not very tight, the least used ones will be recycled (according to the LRU algorithm). When the memory is very, very tight, the policy is directly the AlwaysCLearPolicy policy, and all will be recycled. Soft reference~

After layers of filtering, the final surviving soft, weak, and virtual objects are stored in different DiscoveredList linked lists. We obtain the object from pedding at the Java level, so here we also need to set different DiscoveredList linked lists into pedding.

So next go back to the GenCollectedHeap::do_collection method and see the enqueue_discovered_references method

/hotspot/src/share/vm/memory/referenceProcessor.cpp file

bool ReferenceProcessor::enqueue_discovered_references(AbstractRefProcTaskExecutor* task_executor) {
  return enqueue_discovered_ref_helper<oop>(this, task_executor);
}

template <class T>
bool enqueue_discovered_ref_helper(ReferenceProcessor* ref,
                                   AbstractRefProcTaskExecutor* task_executor) {
  // Get the address of the pedding variable in the Reference class. Because pending is a static variable, get it from the mirror.
  T* pending_list_addr = (T*)java_lang_ref_Reference::pending_list_addr();

  // Link the linked list to pedding
  ref->enqueue_discovered_reflists((HeapWord*)pending_list_addr, task_executor);
  return old_pending_list_value != *pending_list_addr;
}

void ReferenceProcessor::enqueue_discovered_reflists(HeapWord* pending_list_addr,
  AbstractRefProcTaskExecutor* task_executor) {
  // Serialize traverse 4 linked lists.
  for (uint i = 0; i < _max_num_q * number_of_subclasses_of_ref(); i + + ) {
    // Just link the head of each linked list to pending.
    enqueue_discovered_reflist(_discovered_refs[i], pending_list_addr);
    _discovered_refs[i].set_head(NULL);
    _discovered_refs[i].set_length(0);
  }
}

This is to link the objects in the soft, weak, and virtual linked lists that have been screened layer by layer to the pedding field in the Reference class. Finally, it is handed over to the Java-level ReferenceHandler thread for processing.

Use Danger Point:

We have analyzed all the processing details above, so let’s recall one detail.

In the /hotspot/src/share/vm/memory/referenceProcessor.cpp file, the process_discovered_reflist method is used for filtering. When the process_phase2 method is used for filtering, it will determine whether the reference objects of soft, weak, and virtual objects are alive. If it is alive, it cannot be recycled. So it is easy for memory leaks to occur here. See the following Java code.

public class ReferenceTest {

    public static void main(String[] args) {

        WeakHashMap<Object,User> weakHashMap = new WeakHashMap<>();
        Object o1 = new Object();

        weakHashMap.put(o1,new User("lihayyds")); // As long as o1 is not released, this is a memory leak.
        weakHashMap.put("1",new User("lihayyds")); // "1" is pointed to by the JVM string constant pool, so this is also a memory leak

        byte[] bytes1 = new byte[1024 * 1024 * 1024];
        byte[] bytes2 = new byte[1024 * 1024 * 1024];
        byte[] bytes3 = new byte[1024 * 1024 * 1024];
        byte[] bytes4 = new byte[1024 * 1024 * 1024];
        byte[] bytes5 = new byte[1024 * 1024 * 1024];

        // Manual Full GC.
        System.gc();

        // Reference Queue processed size, because it will be processed in size
        System.out.println("The processed size of Reference Queue is: " + weakHashMap.size());
    }
}

class User{

    String name;

    public User(String name) {
        this.name = name;
    }

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }
}

The result is as shown in the figure above. After GC occurs, the weak reference is not recycled at all. This is because the object pointed to by the weak reference is strongly referenced by other places. As a result, it is filtered out during the screening process and cannot be recycled. Then if the external strong reference is not released, then the object referenced by the weak reference and the weak reference object can never be recycled, so that the advantages of the weak reference cannot be achieved. In disguise, this is a memory leak~

Then let’s improve the Java code.

public class ReferenceTest {

    public static void main(String[] args) {

        WeakHashMap<Object,User> weakHashMap = new WeakHashMap<>();
        
        // This Object is not allowed to be referenced externally.
        weakHashMap.put(new Object(),new User("lihayyds"));
        weakHashMap.put(new Object(),new User("lihayyds"));

        byte[] bytes1 = new byte[1024 * 1024 * 1024];
        byte[] bytes2 = new byte[1024 * 1024 * 1024];
        byte[] bytes3 = new byte[1024 * 1024 * 1024];
        byte[] bytes4 = new byte[1024 * 1024 * 1024];
        byte[] bytes5 = new byte[1024 * 1024 * 1024];

        // Manual Full GC.
        System.gc();

        // Reference Queue processed size, because it will be processed in size
        System.out.println("The processed size of Reference Queue is: " + weakHashMap.size());
    }
}

class User{

    String name;

    public User(String name) {
        this.name = name;
    }

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }
}

As shown in the figure above, after the object referenced by the weak reference is not allowed to have strong references from the outside, it will be normal immediately, and it will be recycled when GC occurs~

Summary:

Because the process is very large and strongly related to the GC recycling part, the author can only try my best to describe it as clearly as possible through source code comments + summary + drawing~