Translation-Jdk’s implementation of the default hashCode method [How does the default hashCode() work?]

Original link: How does the default hashCode() work?
Translation:

A minor issue

At work last week I submitted a trivial change to a class implementing a toString() method to make logging more useful. To my surprise, the change caused about a 5% drop in coverage. I know that all new code is covered by existing unit tests, but the coverage drops, so what’s wrong?
Comparing the previous coverage report, an astute colleague found that the unit test covered the implementation of HashCode() before the code, but not after the change. Of course, that’s right: the default ToString() calls hashCode(), the modified one doesn’t.

public String toString() {
    return getClass().getName() + "@" + Integer.toHexString(hashCode());
}

After rewriting toString, our custom hashCode is no longer called, so the coverage rate drops. Everyone knows how the default toString is implemented, but…

How is the default hashCode method implemented?

The default hashCode() returns a unique hash code (identity hash code). Note that this is not the same as the hash code returned by rewriting hashCode. If we rewrite the hashCode method for a certain class, we can also use System.identityHashCode( o) to get its unique hash code (it feels like this is the ID number of the object).
It is generally believed that the only hash code uses the corresponding integer of the object’s memory address (what if the memory object is moved?), but the java api document says this:

... is typically implemented by converting the internal address of the object into an integer,
but this implementation technique is not required by the Java? programming language.
A typical implementation is to convert the memory address of the object into an integer, but this implementation technique is not required for the Java platform

Given that the JVM will relocate the object (e.g. due to promotion or compaction during garbage collection), after we compute the object’s identity hashcode, we must preserve it.

Default hashCode implementation

For the default hashCode method, different JVMs may implement it in different ways. This article only looks at the source code of openJDK. hashCode is a native method. The entry is as follows: src/share/vm/prims/jvm.h and src/share/vm/prims/jvm.cpp

508 JVM_ENTRY(jint, JVM_IHashCode(JNIEnv* env, jobject handle))
509 JVMWrapper("JVM_IHashCode");
510 // as implemented in the classic virtual machine; return 0 if object is NULL
511 return handle == NULL ? 0 : ObjectSynchronizer::FastHashCode (THREAD, JNIHandles::resolve_non_null(handle)) ;
512 JVM_END

Then there is ObjectSynchronizer::FastHashCode() The file is src/share/vm/runtime/synchronizer.cpp People may be naive to think that the method is as simple as this:

if (obj. hash() == 0) {
    obj.set_hash(generate_new_hash());
}
return obj.hash();

But in fact there are hundreds of lines… Looking at the file name, you probably know that synchronization is involved here, that is, the implementation of synchronized, yes, it is the object’s built-in lock. This will be discussed later, let’s first look at how to generate a unique hash code

static inline intptr_t get_next_hash(Thread* self, oop obj) {
  intptr_t value = 0;
  if (hashCode == 0) {
    // This form uses global Park-Miller RNG.
    // On MP system we'll have lots of RW access to a global, so the
    // mechanism induces lots of coherency traffic.
    value = os::random();
  } else if (hashCode == 1) {
    // This variation has the property of being stable (idempotent)
    // between STW operations. This can be useful in some of the 1-0
    // synchronization schemes.
    intptr_t addr_bits = cast_from_oop<intptr_t>(obj) >> 3;
    value = addr_bits ^ (addr_bits >> 5) ^ GVars.stw_random;
  } else if (hashCode == 2) {
    value = 1; // for sensitivity testing
  } else if (hashCode == 3) {
    value = + + GVars.hc_sequence;
  } else if (hashCode == 4) {
    value = cast_from_oop<intptr_t>(obj);
  } else {
    // Marsaglia's xor-shift scheme with thread-specific state
    // This is probably the best overall implementation -- we'll
    // likely make this the default in future releases.
    unsigned t = self->_hashStateX;
    t ^= (t << 11);
    self->_hashStateX = self->_hashStateY;
    self->_hashStateY = self->_hashStateZ;
    self->_hashStateZ = self->_hashStateW;
    unsigned v = self->_hashStateW;
    v = (v ^ (v >> 19)) ^ (t ^ (t >> 8));
    self->_hashStateW = v;
    value = v;
  }

  value & amp;= markWord::hash_mask;
  if (value == 0) value = 0xBAD;
  assert(value != markWord::no_hash, "invariant");
  return value;
}
0. A randomly generated number. Random number
1. A function of memory address of the object. Memory address function
2. A hardcoded 1 (used for sensitivity testing.)
3. A sequence. Auto-increment sequence
4. The memory address of the object, cast to int.
5. Thread state combined with xorshift (https://en.wikipedia.org/wiki/Xorshift) Thread state combined with xorshift

According to src/share/vm/runtime/globals.hpp , the production environment is 5, that is, xorshift, which should also be a random number scheme

1127 product(intx, hashCode, 5, \
1128 "(Unstable) select hashCode generation algorithm")\

openjdk8 and 9 use 5, and openjdk7 and 6 use the first scheme (that is, the random number scheme).

Object headers and synchronization

In openjdk, the description of mark word is as follows: see here for details

30 // The markOop describes the header of an object.
31 //
32 // Note that the mark is not a real oop but just a word.
33 // It is placed in the oop hierarchy for historical reasons.
34 //
35 // Bit-format of an object header (most significant first, big endian layout below):
36 //
37 // 32 bits:
38 // --------
39 // hash:25 ------------>| age:4 biased_lock:1 lock:2 (normal object)
40 // JavaThread*:23 epoch:2 age:4 biased_lock:1 lock:2 (biased object)
41 // size:32 --------------------------------------------->| (CMS free block)
42 // PromotedObject*:29 ---------->| promo_bits:3 ----->| (CMS promoted object)
43 //
44 // 64 bits:
45 // --------
46 // unused:25 hash:31 -->| unused:1 age:4 biased_lock:1 lock:2 (normal object)
47 // JavaThread*:54 epoch:2 unused:1 age:4 biased_lock:1 lock:2 (biased object)
48 // PromotedObject*:61 --------------------->| promo_bits:3 ----->| (CMS promoted object)
49 // size:64 -------------------------------------------- --------->| (CMS free block)
50 //
51 // unused:25 hash:31 -->| cms_free:1 age:4 biased_lock:1 lock:2 (COOPs & amp; & amp; normal object)
52 // JavaThread*:54 epoch:2 cms_free:1 age:4 biased_lock:1 lock:2 (COOPs & amp; & amp; biased object)
53 // narrowOop:32 unused:24 cms_free:1 unused:4 promo_bits:3 ----->| (COOPs & amp; & amp; CMS promoted object)
54 // unused:21 size:35 -->| cms_free:1 unused:7 ------------------>| (COOPs & amp; & amp; CMS free block )

The mark word format is slightly different in 32 and 64 bits. The latter has two variants, depending on whether compressed object pointers are enabled. Both Oracle and OpenJDK 8 execute by default. If the object is in a biased lock state, then 23 bits are stored as thread-biased pointers, so where to get the unique hash code?

Bias lock

The biased state of the object is caused by the biased lock. Starting from hotspot6, try to reduce the cost of locking an object. These operations are expensive because their implementation typically relies on atomic CPU instructions (CAS) to safely handle lock/unlock requests on objects on different threads. But according to the analysis, in most applications, most objects will only be locked by one thread, so the execution of the above atomic instructions is a waste (cas instructions are already very fast, much faster than context switching, and it is also a waste . . . ), in order to avoid this waste, the JVM with biased locking allows threads to make objects bias themselves. If an object is eccentric, the lucky thread does not even need to execute the cas instruction for locking and unlocking. As long as there are no multiple threads fighting for the same object, the performance of the biased lock will be very good. Continue to look at FastHashCode:

601 intptr_t ObjectSynchronizer::FastHashCode (Thread * Self, oop obj) {
602 if (UseBiasedLocking) {
610 if (obj->mark()->has_bias_pattern()) {
          ...
617 BiasedLocking::revoke_and_rebias(hobj, false, JavaThread::current());
          ...
619 assert(!obj->mark()->has_bias_pattern(), "biases should be revoked by now");
620}
621}

When generating a unique hash code, the existing bias will be revoked, and the bias ability of this object will be disabled (false means do not try to re-bias). After a few lines of the above code, this is indeed unchanged:

637 // object should remain ineligible for biased locking
638 assert (!mark->has_bias_pattern(), "invariant");

This means that requesting an object’s unique hash code disables biased locks on that object, and attempting to lock the object requires the use of expensive atomic instructions, even if only one thread requests the lock.

Why does the bias lock conflict with the unique hash code?

To answer this question, we have to understand what are the possible locations of the marker words, depending on the lock state of the object. From the example diagram of HotSpot Wiki, there is the following transformation: There is no translation behind, just say focus