Analysis of the underlying principle of HashMap

Table of Contents

  • 1. Introduction to HashMap
    • 1.1 Features
    • 1.2 Underlying implementation
  • 2. structure and corresponding method analysis
      • 2.1.2.1 Linked list Node class
    • 2.2 Method implementation
      • 2.2.1 Array initialization of HashMap
      • 2.2.2 Calculate the hash value
      • 2.2.3 Add element put (K key, V value) method
      • 2.2.4 Array expansion
  • 3. Summary

1. Introduction to HashMap

1.1 Features

HashMap is the interface implementation class of the Map interface. It is implemented using the hash algorithm and is the most commonly used implementation class of the Map interface. Since the bottom layer uses a hash table to store data, it is required that the key cannot be repeated. If there is a repetition, the new value will replace the old value. HashMap is very efficient in searching, deleting, and modifying.

1.2 Underlying implementation

The underlying implementation of HashMap uses a hash table, which not only combines the advantages of an array (Continuous space, easy addressing, fast query speed?), but also a linked list (Extremely efficient addition and deletion) The advantages. In fact, the essence of the hash table is “array + linked list”.

2. structure and corresponding method analysis

In HashMap, when the number of inter-linked list nodes is maintained, when the number of linked list nodes is greater than 8, it will be converted into a red-black tree to store , thereby improving query efficiency.
2.1 Structure composition
2.1.1 Member variables
DEFAULT_INITIAL_CAPACITY = 1 << 4: The default initial capacity is 16, and the annotation states that this default initial capacity must be a multiple of 2?.
MAXIMUM_CAPACITY = 1 << 30:? The maximum initialization capacity is 2^30?.
DEFAULT_LOAD_FACTOR = 0.75f: ? Load factor, used to determine when the array starts to expand, that is, when the array length reaches 75%, it will expand?.
TREEIFY_THRESHOLD = 8: ? threshold, the current array length is > 64, and the linked list with more than 8 nodes will be converted into a red-black tree?.
UNTREEIFY_THRESHOLD = 6:?Similarly, when the number of red-black tree nodes is less than 6, convert this red-black tree into a linked list?.
MIN_TREEIFY_CAPACITY = 64: ?Set when the length of the array exceeds how much, the red-black tree conversion will be performed on the number of linked list nodes greater than 8?.
transient Node[] table: It is the mysterious array mentioned above. (Why is it Nodel type?) /**
* The default initial capacity – MUST be a power of two.
*/
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

/**
 * The maximum capacity, used if a higher value is implicitly specified
 * by either of the constructors with arguments.
 * MUST be a power of two <= 1<<30.
 */
static final int MAXIMUM_CAPACITY = 1 << 30;

/**
 * The load factor used when none specified in constructor.
 */
static final float DEFAULT_LOAD_FACTOR = 0.75f;
/**
 * The bin count threshold for using a tree rather than list for a
 * bin. Bins are converted to trees when adding an element to a
 * bin with at least this many nodes. The value must be greater
 * than 2 and should be at least 8 to mesh with assumptions in
 * tree removal about conversion back to plain bins upon
 * shrinkage.
 */
static final int TREEIFY_THRESHOLD = 8;

/**
 * The bin count threshold for untreeifying a (split) bin during a
 * resize operation. Should be less than TREEIFY_THRESHOLD, and at
 * most 6 to mesh with shrinkage detection under removal.
 */
static final int UNTREEIFY_THRESHOLD = 6;

/**
 * The smallest table capacity for which bins may be treeified.
 * (Otherwise the table is resized if too many nodes in a bin.)
 * Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts
 * between resizing and treeification thresholds.
 */
static final int MIN_TREEIFY_CAPACITY = 64;

/**
* The table, initialized on first use, and resized as
* necessary. When allocated, length is always a power of two.
* (We also tolerate length zero in some operations to allow
* bootstrapping mechanics that are currently not needed.)
*/
transient Node[] table; 2.1.2 Node type for storing elements
Since it is said that the hash table is composed of an array + a linked list, and it will be converted into a red-black tree later, it must have a corresponding node class. Its source code type is as follows:

2.1.2.1 Linked list Node class

/**
* Basic hash bin node, used for most entries. (See below for
* TreeNode subclass, and in LinkedHashMap for its Entry subclass.)
*/
static class Node implements Map.Entry {
final int hash;
final K key;
V value;
Node next;

 Node(int hash, K key, V value, Node<K,V> next) {
        this.hash = hash;
        this.key = key;
        this.value = value;
        this. next = next;
    }

    public final K getKey() { return key; }
    public final V getValue() { return value; }
    public final String toString() { return key + "=" + value; }

    public final int hashCode() {
        return Objects.hashCode(key) ^ Objects.hashCode(value);
    }

    public final V setValue(V newValue) {
        V oldValue = value;
        value = newValue;
        return oldValue;
    }

    public final boolean equals(Object o) {
        if (o == this)
            return true;
        if (o instanceof Map.Entry) {
            Map.Entry<?,?> e = (Map.Entry<?,?>)o;
            if (Objects.equals(key, e.getKey()) & amp; & amp;
                Objects. equals(value, e. getValue()))
                return true;
        }
        return false;
    }
} It can be seen from the source code that the Node node type of the linked list implements the internal interface class Entry<K, V> of the Map interface. This interface defines some behaviors that can operate a key->value storage structure of the HashMap (such as obtaining the key value right key).

member traversal
hash: record the hash value of the storage key, which cannot be changed (final modification)
key: record key, which cannot be changed (final modification), so the key of the hashmap is unique and cannot be repeated.
value: record value, which can be changed.
next: The current node records the address of the next node (from this we can see that the linked list is a one-way linked list) 2.1.2.2 Tree node class
/**

  • Entry for Tree bins. Extends LinkedHashMap. Entry (which in turn
  • extends Node) so can be used as extension of either regular or
  • linked node.
    */
    static final class TreeNode extends LinkedHashMap.Entry {
    TreeNode parent; // red-black tree links
    TreeNode left;
    TreeNode right;
    TreeNode prev; // needed to unlink next upon deletion
    boolean red;
    TreeNode(int hash, K key, V val, Node next) {
    super(hash, key, val, next);
    }

/**

  • Returns root of tree containing this node.
    */
    final TreeNode root() {
    for (TreeNode r = this, p; {
    if ((p = r. parent) == null)
    return r;
    r = p;
    }
    }Member variables:
    parent: record parent node
    left: left subtree
    right: right subtree
    prev: previous node
    red: Record the state of the red-black tree (true is a red tree, and vice versa.) 2.1.2.3 Inheritance relationship

The array of HashMap has both a linked list and a red-black tree. Why is this mysterious array of Node type? I think it makes sense here:
The linked list node class Node implements the Entry interface, and the internal class Entry of LinkedHashMap inherits the Node class, and TreeNode inherits the Entry class, so the node class of the red-black tree has an inheritance relationship with the Node of the linked list, and can be unified as one It can be viewed by type, so an array of type Node can store both a linked list and a red-black tree.

2.2 Method Implementation

2.2.1 Array initialization of HashMap

In JDK11’s HashMap, the initialization of arrays is lazy initialization. via the resize method
Implement initialization processing. The resize method implements both array initialization and array expansion processing. tips: What is delayed initialization?
The array is initialized when the first element is added to the array. /**
* Initializes or doubles table size. If null, allocates in
* accord with initial capacity target held in field threshold.
* Otherwise, because we are using power-of-two expansion, the
* elements from each bin must either stay at same index, or move
* with a power of two offset in the new table.
*
* @return the table
*/
final Node[] resize() {
Node[] oldTab = table;
int oldCap = (oldTab == null) ? 0 : oldTab. length;
int oldThr = threshold;
int newCap, newThr = 0;
if (oldCap > 0) {
if (oldCap >= MAXIMUM_CAPACITY) {
threshold = Integer.MAX_VALUE;
return oldTab;
}
else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY & amp; & amp;
oldCap >= DEFAULT_INITIAL_CAPACITY)
newThr = oldThr << 1; // double threshold
}
else if (oldThr > 0) // initial capacity was placed in threshold
newCap = oldThr;
else { // zero initial threshold signs using defaults
newCap = DEFAULT_INITIAL_CAPACITY;
newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
}
if (newThr == 0) {
float ft = (float)newCap * loadFactor;
newThr = (newCap < MAXIMUM_CAPACITY & & ft < (float)MAXIMUM_CAPACITY ?
(int)ft : Integer.MAX_VALUE);
}
threshold = newThr;
@SuppressWarnings({“rawtypes”, “unchecked”})
Node[] newTab = (Node[])new Node[newCap];
table = newTab;
if (oldTab != null) {
for (int j = 0; j < oldCap; + + j) {
Node e;
if ((e = oldTab[j]) != null) {
oldTab[j] = null;
if (e. next == null)
newTab[e.hash & amp; (newCap – 1)] = e;
else if (e instanceof TreeNode)
((TreeNode)e).split(this, newTab, j, oldCap);
else { // preserve order
Node loHead = null, loTail = null;
Node hiHead = null, hiTail = null;
Node next;
do {
next = e. next;
if ((e.hash & amp; oldCap) == 0) {
if (loTail == null)
loHead = e;
else
loTail. next = e;
loTail = e;
}
else {
if (hiTail == null)
hiHead = e;
else
hiTail. next = e;
hiTail = e;
}
} while ((e = next) != null);
if (loTail != null) {
loTail. next = null;
newTab[j] = loHead;
}
if (hiTail != null) {
hiTail. next = null;
newTab[j + oldCap] = hiHead;
}
}
}
}
}
return newTab;
} First, when returning to the member variable of the HashMap just now, the member variable table just made a statement, as shown in the figure:

So the table is null, so when executing int oldCap = (oldTab == null) ? 0 : oldTab.length, oldCap=0, and the threshold is also 0 at this time, so when executing the first if , both variables are 0, so execute the statement in else directly.
newCap = DEFAULT_INITIAL_CAPACITY; newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY); The member variable (16) that initializes the array length is about to be assigned to newCap, and the next sentence is to assign the length of the next expansion to newThr (12 at this time), then skip the if statement and reassign a value to the member variable threshold. Re-execute
Node[] newTab = (Node[])new Node[newCap], assign newTab to the member variable table, and then return newTab, so that the initialization is complete.

2.2.2 Calculate hash value

In map storage, we store elements based on the hash value of the key. Therefore, a series of operations need to be performed on the hash value of the key:
1. Get the hashCode of the key.
2. Calculate the hash value according to the hashCode. (However, since it is required to be converted into the range of the length -1 of the table array, a series of operations are required.)
3. Conversion algorithm: hash = hashcode & amp; (n-1) to get the storage location in the array. /**

  • Associates the specified value with the specified key in this map.
  • If the map previously contained a mapping for the key, the old
  • value is replaced.
  • @param key key with which the specified value is to be associated
  • @param value value to be associated with the specified key
  • @return the previous value associated with key, or
  • <tt>null</tt> if there was no mapping for <tt>key</tt>.
    
  • (A <tt>null</tt> return can also indicate that the map
    
  • previously associated <tt>null</tt> with <tt>key</tt>.)
    

*/
public V put(K key, V value) {
return putVal(hash(key), key, value, false, true);
}static final int hash(Object key) {
int h;
return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}Here is the method to calculate the hash value of the key. First assign the hashcode value of the key to h, and then perform an XOR operation with the upper 16 bits of h (that is, perform an XOR operation on the lower 16 bits and upper 16 bits of h).
Let’s demonstrate the operation process:
Assume: key = 123456, use a calculator to calculate its binary value as:
Then perform XOR operation (same as 0, different as 1):

Calculated in decimal as:

At this point, 123457 is returned, and the putVal() method is returned below:
/**

  • Implements Map.put and related methods
  • @param hash hash for key
  • @param key the key
  • @param value the value to put
  • @param onlyIfAbsent if true, don’t change existing value
  • @param evict if false, the table is in creation mode.
  • @return previous value, or null if none
    */
    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
    boolean evict) {
    Node[] tab; Node p; int n, i;
    if ((tab = table) == null || (n = tab. length) == 0)
    n = (tab = resize()). length;
    if ((p = tab[i = (n – 1) & amp; hash]) == null)
    tab[i] = newNode(hash, key, value, null);
    else {
    Node e; K k;
    if (p.hash == hash & amp; & amp;
    ((k = p.key) == key || (key != null & amp; & amp; key.equals(k))))
    e = p;
    else if (p instanceof TreeNode)
    e = ((TreeNode)p).putTreeVal(this, tab, hash, key, value);
    else {
    for (int binCount = 0; ; + + binCount) {
    if ((e = p. next) == null) {
    p.next = newNode(hash, key, value, null);
    if (binCount >= TREEIFY_THRESHOLD – 1) // -1 for 1st
    treeifyBin(tab, hash);
    break;
    }
    if (e.hash == hash & amp; & amp;
    ((k = e.key) == key || (key != null & amp; & amp; key.equals(k))))
    break;
    p = e;
    }
    }
    if (e != null) { // existing mapping for key
    V oldValue = e.value;
    if (!onlyIfAbsent || oldValue == null)
    e. value = value;
    afterNodeAccess(e);
    return oldValue;
    }
    }
    + + modCount;
    if ( + + size > threshold)
    resize();
    afterNodeInsertion(evict);
    return null;
    After the }putVal method gets the hashCode of the key, it performs the & operation with 15 (the same is 1, and the difference is 0):

Finally, the position stored in the array is 1.

2.2.3 Add element put (K key, V value) method

The putVal() method is called (the source code is above)

  1. putVal() is mainly to calculate the hash value to obtain the position of the element in the array (which has been analyzed before), if there is no element in the position array, put the new node into it;
  2. We all know that hashMap overwrites its value for the same value of the key, and the key remains unchanged. The following is the implementation method:
  3. If the p node is of the same type as the TreeNode node (red-black tree), hang it on the red-black tree:
  4. If none of the previous executions are executed, the end of the linked list that is mounted to the location of the array is finally:
    Let’s take a look at the linked list –> the method of red-black tree ( treeifyBin(Node[] tab, int hash))
    /**
    • Replaces all linked nodes in bin at index for given hash unless
    • The table is too small, in which case resizes instead.
      */
      final void treeifyBin(Node[] tab, int hash) {
      int n, index; Node e;
      if (tab == null || (n = tab. length) < MIN_TREEIFY_CAPACITY)
      resize();
      else if ((e = tab[index = (n – 1) & amp; hash]) != null) {
      TreeNode hd = null, tl = null;
      do {
      TreeNode p = replacementTreeNode(e, null);
      if (tl == null)
      hd = p;
      else {
      p.prev = tl;
      tl.next = p;
      }
      tl = p;
      } while ((e = e. next) != null);
      if ((tab[index] = hd) != null)
      hd. treeify(tab);
      }
      }Here, we see that the linked list does not do red-black tree conversion immediately, but first judges whether the length of the array is greater than MIN_TREEIFY_CAPACITY (this is explained earlier), if it is less than MIN_TREEIFY_CAPACITY, the resize() method will be called to expand the array .

2.2.4 Array expansion

Three. Summary

The underlying layer of HashMap is implemented by a hash algorithm (that is, in the form of an array + linked list). When the length of the array is greater than 64 and the number of nodes in the linked list is greater than 8, the linked list will be converted into a red-black tree, which greatly reduces the traversal Time, improve efficiency, the reason why an array can store two data structures is because the data type of the array is the node Node of the linked list, and the red-black tree node TreeNode has an inheritance relationship with Node . In addition, HashMap uses delayed initialization to initialize the array, that is, when the user adds the first element, resize() is called to initialize the array length (16), and the next expansion length of the array is scheduled (12). There is also the principle of calculating the hash value and adding elements, etc., waiting for the exploration of friends!
Learning source code knowledge will help us solidify our internal strength and improve programmers’ self-cultivation. If you don’t want to directly view the source code in idea, but also want to know him, you can follow the blogger and organize it for you. Okay, the article is here it’s over