LinkedHashMap source code analysis

Table of Contents

I. Introduction

2. Source code analysis

2.1. Class structure

2.2. Member variables

2.3. Construction method

2.4. accessOrder

2.5. Adding elements

2.6. Get elements

2.7. Delete elements

2.8. Iterators

3. Simple implementation of LRU


1. Foreword

HashMap element insertion is unordered. In order to make the traversal order and insertion order consistent, we can use LinkedHashMap, which maintains a doubly linked list internally to store the element order, and can control the traversal order to insertion order or access order through the accessOrder attribute. This article will record the internal implementation principle of LinkedHashMap, based on JDK1.8, and use LinkedHashMap to implement a simple LRU.

LinkedHashMap implements the Map interface, which allows elements with a null key to be put in and elements with a null value to be inserted. As can be seen from the name, this container is a mixture of LinkedList and HashMap, which means that it satisfies certain characteristics of both HashMap and LinkedList. Think of LinkedHashMap as a HashMap enhanced with LinkedList.

In fact, LinkedHashMap is a direct subclass of HashMap. The only difference between the two is that LinkedHashMap uses a doubly-linked list to connect all entries based on HashMap. This is to ensure the iteration order and insertion of elements. The order is the same. The above figure shows the structure diagram of LinkedHashMap. The main part is exactly the same as HashMap, with the addition of a header pointing to the head of the doubly linked list (a dummy element). The iteration order of the doubly linked list is the insertion order of entries.

In addition to preserving the iteration order, this structure has another advantage: when iterating a LinkedHashMap, you do not need to traverse the entire table like a HashMap, but only need to directly traverse the doubly linked list pointed to by the header. That is to say, the iteration time of the LinkedHashMap is only as long as It is related to the number of entries and has nothing to do with the size of the table.

There are two parameters that can affect the performance of LinkedHashMap: initial capacity and load factor. The initial capacity specifies the size of the initial table, and the load factor is used to specify the critical value for automatic expansion. When the number of entries exceeds capacity*load_factor, the container will automatically expand and rehash. For scenarios where a large number of elements are inserted, setting a larger initial capacity can reduce the number of rehashes.

When putting objects into LinkedHashMap or LinkedHashSet, there are two methods that require special attention: hashCode() and equals(). The hashCode() method determines which bucket the object will be placed in. When the hash values of multiple objects conflict, the equals() method determines whether these objects are “the same object”. Therefore, if you want to put a custom object into a LinkedHashMap or LinkedHashSet, you need to override the hashCode() and equals() methods.

For performance reasons, LinkedHashMap is asynchronous (not synchronized). If it needs to be used in a multi-threaded environment, the programmer needs to manually synchronize; or the LinkedHashMap can be packaged as synchronized as follows: Map m = Collections.synchronizedMap(new LinkedHashMap(. ..));

LinkedHashSet: Inherits from HashSet, but the map in LinkedHashSet will be instantiated into LinkedHashMap (adapter mode) in the construction method.

2. Source code analysis

2.1. Class Structure

LinkedHashMap inherits from HashMap, and most methods use HashMap directly.

2.2. Member variables

//The head node of the doubly linked list (the earliest inserted, the oldest node)
transient LinkedHashMap.Entry<K,V> head;

//The tail node of the doubly linked list (the latest inserted, the youngest node)
transient LinkedHashMap.Entry<K,V> tail;

// Used to control the access order. When it is true, it is in the insertion order; when it is false, it is in the access order.
final boolean accessOrder;

Why do head and tail use transient modification?
Fields modified by transient will be excluded during serialization. So how does HashMap recover data when it is deserialized after serialization? HashMap customizes serialization and deserialization operations through custom readObject/writeObject methods. This is mainly due to the following two considerations:
1. The table is generally not full, that is, the capacity is greater than the actual number of key-value pairs. Serializing the unused parts of the table not only wastes time but also wastes space;
2. If the type corresponding to the key does not override the hashCode method, it will call the hashCode method of Object. This method is a native method and may be implemented differently under different JVMs; in other words, the same key-value pair will be used in different JVM environments. Under the circumstances, the storage location in the table may be different, then an error may occur during the deserialization table operation.

So in the HashXXX classes (such as HashTable, HashSet, LinkedHashMap, etc.), we can see that the fields used by these classes to store data are decorated with transient, and all have customized readObject/writeObject methods. The source code analysis of the readObject/writeObject method will not be carried out here. If you are interested, you can study it yourself.

LinkedHashMap inherits from HashMap, so the internal data storage method is the same as HashMap. It uses the structure of array plus linked list (red-black tree) to store data. Compared with HashMap, LinkedHashMap maintains an additional two-way linked list to store the order of nodes. The type of this doubly linked list is LinkedHashMap.Entry:

static class Entry<K,V> extends HashMap.Node<K,V> {
    Entry<K,V> before, after;
    Entry(int hash, K key, V value, Node<K,V> next) {
        super(hash, key, value, next);
    }
}

LinkedHashMap.Entry inherits from the Node class of HashMap, and adds before and after attributes to maintain the predecessor and successor nodes to form a two-way linked list.

2.3. Construction method

The constructor of LinkedHashMap is actually nothing special. It is the process of calling the constructor of the parent class to initialize HashMap. It just has the additional operation of initializing the accessOrder attribute of LinkedHashMap:

public LinkedHashMap(int initialCapacity, float loadFactor) {
    super(initialCapacity, loadFactor);
    accessOrder = false;
}

public LinkedHashMap(int initialCapacity) {
    super(initialCapacity);
    accessOrder = false;
}

public LinkedHashMap() {
    super();
    accessOrder = false;
}

public LinkedHashMap(Map<? extends K, ? extends V> m) {
    super();
    accessOrder = false;
    putMapEntries(m, false);
}

public LinkedHashMap(int initialCapacity,
                     float loadFactor,
                     boolean accessOrder) {
    super(initialCapacity, loadFactor);
    this.accessOrder = accessOrder;
}

2.4. accessOrder

Before analyzing the implementation of other methods of LinkedHashMap, let us first experience the characteristics of LinkedHashMap through examples:

LinkedHashMap<String, Object> map = new LinkedHashMap<>(16, 0.75f, false);
map.put("1", "a");
map.put("6", "b");
map.put("3", "c");
System.out.println(map);

map.get("6");
System.out.println(map);

map.put("4", "d");
System.out.println(map);


Output:
{1=a, 6=b, 3=c}
{1=a, 6=b, 3=c}
{1=a, 6=b, 3=c, 4=d}

You can see that the output order of the elements is the order in which we inserted them.

Change the accessOrder attribute to true and output:

{1=a, 6=b, 3=c}
{1=a, 3=c, 6=b}
{1=a, 3=c, 6=b, 4=d}

As you can see, {1=a, 6=b, 3=c} is output at the beginning. When we access the key-value pair with key 6 through the get method, the program outputs {1=a, 3=c, 6=b}. That is, when the accessOrder attribute is true, the elements are arranged in access order, that is, the most recently accessed element will be moved to the end of the two-way list. The so-called “access” is not only the get method. The operations that conform to the word “access” include put, putIfAbsent, get, getOrDefault, compute, computeIfAbsent, computeIfPresent and merge methods.

2.5. Add elements

put(K key, V value): LinkedHashMap does not override the put(K key, V value) method and directly uses the put(K key, V value) method of HashMap. So the question is, since LinkedHashMap does not rewrite put(K key, V value), how does it maintain the order of elements through the internal doubly linked list? We can find out the reason by looking at the source code of the put(K key, V value) method (because the source code of put(K key, V value) has been analyzed in the article “HashMap Source Code Analysis”, so below we only focus on the code related to the LinkedHashMap function. Add comments):

public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    if ((p = tab[i = (n - 1) & amp; hash]) == null)
        //Create node
        tab[i] = newNode(hash, key, value, null);
    else {
        Node<K,V> e; K k;
        if (p.hash == hash & amp; & amp;
            ((k = p.key) == key || (key != null & amp; & amp; key.equals(k))))
            e = p;
        else if (p instanceof TreeNode)
            //The method contains the operations of newTreeNode
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
            for (int binCount = 0; ; + + binCount) {
                if ((e = p.next) == null) {
                    //Create node
                    p.next = newNode(hash, key, value, null);
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                if (e.hash == hash & amp; & amp;
                    ((k = e.key) == key || (key != null & amp; & amp; key.equals(k))))
                    break;
                p = e;
            }
        }
        if (e != null) { // existing mapping for key
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            // Subsequent operations for node access
            afterNodeAccess(e);
            return oldValue;
        }
    }
     + + modCount;
    if ( + + size > threshold)
        resize();
    // Node insertion subsequent operations
    afterNodeInsertion(evict);
    return null;
}

The newNode method is used to create linked list nodes. LinkedHashMap overrides the newNode method:

Node<K,V> newNode(int hash, K key, V value, Node<K,V> e) {
    // Create LinkedHashMap.Entry instance
    LinkedHashMap.Entry<K,V> p =
        new LinkedHashMap.Entry<K,V>(hash, key, value, e);
    //Put the new node at the end of the doubly linked list maintained by LinkedHashMap
    linkNodeLast(p);
    return p;
}

private void linkNodeLast(LinkedHashMap.Entry<K,V> p) {
    LinkedHashMap.Entry<K,V> last = tail;
    tail = p;
    // If the tail node is empty, it means that the doubly linked list is empty, so the node is assigned to the head node and the doubly linked list is initialized.
    if (last == null)
        head = p;
    else {
        // Otherwise, put the node at the end of the doubly linked list
        p.before = last;
        last.after = p;
    }
}

It can be seen that for the LinkedHashMap instance, the node type created inside the put operation is LinkedHashMap.Entry. In addition to inserting data into the internal table of HashMap, data is also inserted into the tail of the doubly linked list of LinkedHashMap.

If data is inserted into a red-black tree structure, then put will call the putTreeVal method to insert nodes into the red-black tree. The putTreeVal method internally creates tree nodes through the newTreeNode method. LinkedHashMap overrides the newTreeNode method:

TreeNode<K,V> newTreeNode(int hash, K key, V value, Node<K,V> next) {
    //Create TreeNode instance
    TreeNode<K,V> p = new TreeNode<K,V>(hash, key, value, next);
    //Put the new node at the end of the doubly linked list maintained by LinkedHashMap
    linkNodeLast(p);
    return p;
}

The node type is TreeNode, so where is this type defined? In fact, TreeNode is defined in HashMap. View its source code:

static final class TreeNode<K,V> extends LinkedHashMap.Entry<K,V> {
    TreeNode<K,V> parent; // red-black tree links
    TreeNode<K,V> left;
    TreeNode<K,V> right;
    TreeNode<K,V> prev; // needed to unlink next upon deletion
    boolean red;
    TreeNode(int hash, K key, V val, Node<K,V> next) {
        super(hash, key, val, next);
    }

    ...
}

TreeNode inherits from LinkedHashMap.Entry, so TreeNode also contains before and after attributes. Even if the inserted node type is TreeNode, LinkedHashMap doubly linked list can still be used to maintain the node order.

In the put method, if the inserted key already exists, the afterNodeAccess operation will also be performed. This method is an empty method in HashMap: void afterNodeAccess(Node p) { }. The afterNodeAccess method, as its name suggests, performs certain operations after the node is accessed. LinkedHashMap overrides this method:

void afterNodeAccess(Node<K,V> e) { // move node to last
    LinkedHashMap.Entry<K,V> last;
    // If the accessOrder attribute is true and the current node is not the tail node of the doubly linked list, execute the logic within the if
    if (accessOrder & amp; & amp; (last = tail) != e) {
        // This part of the logic is also easy to understand. It is to move the current node to the end of the doubly linked list and change the successor relationship of the relevant nodes.
        LinkedHashMap.Entry<K,V> p =
            (LinkedHashMap.Entry<K,V>)e, b = p.before, a = p.after;
        p.after = null;
        if(b==null)
            head = a;
        else
            b.after = a;
        if (a != null)
            a.before = b;
        else
            last = b;
        if (last == null)
            head = p;
        else {
            p.before = last;
            last.after = p;
        }
        tail = p;
         + + modCount;
    }
}

So when accessOrder is true, the put method of LinkedHashMap is called and when a key-value pair with the same key value is inserted, the key-value pair will be moved to the end:

LinkedHashMap<String, Object> map = new LinkedHashMap<>(16, 0.75f, true);
map.put("1", "a");
map.put("6", "b");
map.put("3", "c");
System.out.println(map);
map.put("6", "b");
System.out.println(map);


Program output:
{1=a, 6=b, 3=c}
{1=a, 3=c, 6=b}

At the end of the put method, the afterNodeInsertion method is also called. As the name suggests, the method is used to perform certain operations after inserting the node. This method is also an empty method in HashMap: void afterNodeInsertion(boolean evict) { }.

LinkedHashMap overrides this method:

// Here evict is true
void afterNodeInsertion(boolean evict) { // possibly remove eldest
    LinkedHashMap.Entry<K,V> first;
    // If the head node is not empty and removeEldestEntry returns true
    if (evict & amp; & amp; (first = head) != null & amp; & amp; removeEldestEntry(first)) {
        // Get the key of the head node
        K key = first.key;
        //Call the removeNode method of the parent class HashMap to delete the node
        removeNode(hash(key), key, null, false, true);
    }
}

protected boolean removeEldestEntry(Map.Entry<K,V> eldest) {
    // In LinkedHashMap, this method always returns false
    return false;
}

Based on this feature, we can implement LRU by rewriting the removeEldestEntry method by inheriting LinkedHashMap, and we will implement it below.

You may ask, removeNode deletes the nodes in the HashMap table, so shouldn’t the doubly linked list used to maintain the node order also delete the head node? Why doesn’t the above code see this part of the operation? In fact, when you look at the source code of the removeNode method, you can see this part of the operation:

final Node<K,V> removeNode(int hash, Object key, Object value,
                           boolean matchValue, boolean movable) {
    Node<K,V>[] tab; Node<K,V> p; int n, index;
    if ((tab = table) != null & amp; & amp; (n = tab.length) > 0 & amp; & amp;
        (p = tab[index = (n - 1) & amp; hash]) != null) {
        Node<K,V> node = null, e; K k; V v;
        if (p.hash == hash & amp; & amp;
            ((k = p.key) == key || (key != null & amp; & amp; key.equals(k))))
            node = p;
        else if ((e = p.next) != null) {
            if (p instanceof TreeNode)
                node = ((TreeNode<K,V>)p).getTreeNode(hash, key);
            else {
                do {
                    if (e.hash == hash & amp; & amp;
                        ((k = e.key) == key ||
                         (key != null & amp; & amp; key.equals(k)))) {
                        node = e;
                        break;
                    }
                    p = e;
                } while ((e = e.next) != null);
            }
        }
        if (node != null & amp; & amp; (!matchValue || (v = node.value) == value ||
                             (value != null & amp; & amp; value.equals(v)))) {
            if (node instanceof TreeNode)
                ((TreeNode<K,V>)node).removeTreeNode(this, tab, movable);
            else if (node == p)
                tab[index] = node.next;
            else
                p.next = node.next;
             + + modCount;
            --size;
            //After the node is deleted, perform subsequent operations
            afterNodeRemoval(node);
            return node;
        }
    }
    return null;
}

The afterNodeRemoval method, as its name suggests, is used to perform subsequent operations after node deletion. This method is an empty method in HashMap: void afterNodeRemoval(Node p) { }.

LinkedHashMap overrides this method:

//Change the previous and subsequent references of the node
void afterNodeRemoval(Node<K,V> e) { // unlink
    LinkedHashMap.Entry<K,V> p =
        (LinkedHashMap.Entry<K,V>)e, b = p.before, a = p.after;
    p.before = p.after = null;
    if(b==null)
        head = a;
    else
        b.after = a;
    if(a==null)
        tail = b;
    else
        a.before = b;
}

Through this method, we delete the head node from the doubly linked list of LinkedHashMap.

2.6. Get elements

get(Object key), LinkedHashMap overrides the get method of HashMap:

public V get(Object key) {
    Node<K,V> e;
    if ((e = getNode(hash(key), key)) == null)
        return null;
    // With this additional step, when the accessOrder attribute is true, move the key-value pair node corresponding to the key to the end of the two-way list
    if(accessOrder)
        afterNodeAccess(e);
    return e.value;
}

2.7. Delete element

remove(Object key), LinkedHashMap does not override the remove method, check the remove method of HashMap:

public V remove(Object key) {
    Node<K,V> e;
    //Call removeNode to delete the node. The removeNode method internally calls the afterNodeRemoval method. Put is introduced above.
    // The method has been analyzed before, so I won’t go into details again.
    return (e = removeNode(hash(key), key, null, false, true)) == null ?
        null : e.value;
}

2.8. Iterator

Since LinkedHashMap maintains the order of key-value pairs internally through a doubly linked list, then we can guess that traversing LinkedHashMap is actually traversing the doubly linked list maintained by LinkedHashMap. View the implementation of the entrySet method of the LinkedHashMap class:

public Set<Map.Entry<K,V>> entrySet() {
    Set<Map.Entry<K,V>> es;
    //Create LinkedEntrySet
    return (es = entrySet) == null ? (entrySet = new LinkedEntrySet()) : es;
}

final class LinkedEntrySet extends AbstractSet<Map.Entry<K,V>> {
    public final int size() { return size; }
    public final void clear() { LinkedHashMap.this.clear(); }
    public final Iterator<Map.Entry<K,V>> iterator() {
        //The iterator type is LinkedEntryIterator
        return new LinkedEntryIterator();
    }
    public final boolean contains(Object o) {
if (!(o instanceof Map.Entry))
return false;
Map.Entry<?,?> e = (Map.Entry<?,?>) o;
Object key = e.getKey();
Node<K,V> candidate = getNode(hash(key), key);
return candidate != null & amp; & amp; candidate.equals(e);
}
public final boolean remove(Object o) {
if (o instanceof Map.Entry) {
Map.Entry<?,?> e = (Map.Entry<?,?>) o;
Object key = e.getKey();
Object value = e.getValue();
return removeNode(hash(key), key, value, true, true) != null;
}
return false;
}
public final Spliterator<Map.Entry<K,V>> spliterator() {
return Spliterators.spliterator(this, Spliterator.SIZED |
Spliterator.ORDERED |
Spliterator.DISTINCT);
}
public final void forEach(Consumer<? super Map.Entry<K,V>> action) {
if (action == null)
throw new NullPointerException();
int mc = modCount;
for (LinkedHashMap.Entry<K,V> e = head; e != null; e = e.after)
action.accept(e);
if (modCount != mc)
throw new ConcurrentModificationException();
}
}

// LinkedEntryIterator inherits from LinkedHashIterator
final class LinkedEntryIterator extends LinkedHashIterator
    implements Iterator<Map.Entry<K,V>> {
    // The next method internally calls the nextNode method of LinkedHashIterator
    public final Map.Entry<K,V> next() { return nextNode(); }
}

abstract class LinkedHashIterator {
    LinkedHashMap.Entry<K,V> next;
    LinkedHashMap.Entry<K,V> current;
    int expectedModCount;

    LinkedHashIterator() {
        // During initialization, assign the head node of the doubly linked list to next, indicating that the traversal of LinkedHashMap is from
        //Start from the head of the doubly linked list of LinkedHashMap
        next = head;
        // It also has the feature of fast failure
        expectedModCount = modCount;
        current = null;
    }

    public final boolean hasNext() {
        return next != null;
    }

    final LinkedHashMap.Entry<K,V> nextNode() {
        LinkedHashMap.Entry<K,V> e = next;
        if (modCount != expectedModCount)
            throw new ConcurrentModificationException();
        if(e==null)
            throw new NoSuchElementException();
        current = e;
        //Continuously obtain the after node of the current node and traverse
        next = e.after;
        return e;
    }
    
    public final void remove() {
Node<K,V> p = current;
if(p==null)
throw new IllegalStateException();
if (modCount != expectedModCount)
throw new ConcurrentModificationException();
current = null;
K key = p.key;
removeNode(hash(key), key, null, false, false);
expectedModCount = modCount;
    }
}

The above code matches our guess.

Three. Simple implementation of LRU

LRU (Least Recently Used) refers to the least recently used. It is a cache elimination algorithm. Items that have not been used recently will be eliminated.

We know that the removeEldestEntry method in LinkedHashMap always returns false and does not perform element deletion operations, so we can implement LRU by inheriting LinkedHashMap and overriding the removeEldestEntry method.

Suppose we now have the following needs: use LinkedHashMap to implement caching. The cache can only store up to 5 elements. When the number of elements exceeds 5, the least recently used data will be deleted (eliminated) and only the hotspot data will be saved.

Create a new LRUCache class and inherit LinkedHashMap:

public class LRUCache<K, V> extends LinkedHashMap<K, V> {
    /**
     * Maximum capacity allowed by cache
     */
    private final int maxSize;

    public LRUCache(int initialCapacity, int maxSize) {
        // accessOrder must be true
        super(initialCapacity, 0.75f, true);
        this.maxSize = maxSize;
    }

    @Override
    protected boolean removeEldestEntry(Map.Entry<K, V> eldest) {
        // When the number of key-value pairs exceeds the maximum capacity, return true and trigger the delete operation
        return size() > maxSize;
    }

    public static void main(String[] args) {
        LRUCache<String, String> cache = new LRUCache<>(5, 5);
        cache.put("1", "a");
        cache.put("2", "b");
        cache.put("3", "c");
        cache.put("4", "d");
        cache.put("5", "e");
        cache.put("6", "f");
        System.out.println(cache);
    }
}


Program output:
{2=b, 3=c, 4=d, 5=e, 6=f}

You can see that the earliest inserted 1=a has been deleted.

It is quite common to implement LRU through LinkedHashMap, such as the LRUMessageCache of the logback framework:

class LRUMessageCache extends LinkedHashMap<String, Integer> {
    private static final long serialVersionUID = 1L;

    final int cacheSize;

    LRUMessageCache(int cacheSize) {
        super((int) (cacheSize * (4.0f / 3)), 0.75f, true);
        if (cacheSize < 1) {
            throw new IllegalArgumentException("Cache size cannot be smaller than 1");
        }
        this.cacheSize = cacheSize;
    }

    int getMessageCountAndThenIncrement(String msg) {
        // don't insert null elements
        if (msg == null) {
            return 0;
        }

        Integer i;
        // LinkedHashMap is not LinkedHashMap. See also LBCLASSIC-255
        synchronized (this) {
            i = super.get(msg);
            if (i == null) {
                i = 0;
            } else {
                i = i + 1;
            }
            super.put(msg, i);
        }
        return i;
    }

    // called indirectly by get() or put() which are already supposed to be
    // called from within a synchronized block
    protected boolean removeEldestEntry(Map.Entry eldest) {
        return (size() > cacheSize);
    }

    @Override
    synchronized public void clear() {
        super.clear();
    }
}

The knowledge points of the article match the official knowledge archives, and you can further learn related knowledge. Java skill treeIn-depth study of containersUnderstanding Map138486 people are learning the system