A brief discussion of HashMap and ConcurrentHashMap

As a back-end Java developer, the above-mentioned HashMap and ConcurrentHashMap are essential questions in interviews. Next, let me lead you to take a look.

First of all, map is a key value structure, which is often used to store data in memory.

hashMap:

The data structure is based on **array + linked list**, but the specific implementation is slightly different in jdk1.7 and 1.8

Version 1.7:
There are several core parameters in hashMap:

   
    transient Entry<K,V>[] table = (Entry<K,V>[]) EMPTY_TABLE;
    transient int size;
    int threshold;
    final float loadFactor;
    transient int modCount;
threhold: If the actual load exceeds this value, the capacity will be expanded.
loadFactor: expansion factor.
Initial default capacity (capacity) * loadFactor = threhold
Entry<K,V>[] This is an array of Entry<K,V>, which represents the entire hash table. Each Entry<K,V> represents a bucket, which is a linked list. Because of this, even if a hash collision occurs, they can be placed in the same bucket.
modCount
We know that HashMap is not thread-safe, which means that other threads may be operating the map while you are operating. Use modCount to record the number of modifications to the modified set.
We will encounter an exception ConcurrentModificationException when deleting collection elements while iterating. The reason is that no matter whether you use the entrySet() method or the keySet() method, in fact, the built-in Iterator of the collection will still be used during the for loop. nextEntry() method in the iterator. If you do not use the built-in remove() method of Iterator, then the value of the number of changes recorded inside the iterator will not be synchronized. When you call the nextEntry() method the next time you loop, it will throw Exception occurred.

Load factor:

 public HashMap() {<!-- -->
        this(DEFAULT_INITIAL_CAPACITY, DEFAULT_LOAD_FACTOR);
    }

    public HashMap(int initialCapacity, float loadFactor) {<!-- -->
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);

        this.loadFactor = loadFactor;
        threshold = initialCapacity;
        init();
    }

Summarize:

The default capacity is 16 and the load factor is 0.75. Map continuously stores data during use. When the number reaches 16 * 0.75 = 12, the current capacity of 16 needs to be expanded. The expansion process involves operations such as rehash and copying data, so it consumes very much performance.

Therefore, it is recommended to estimate the size of HashMap in advance to minimize the performance loss caused by expansion.

Store data:

transient Entry<K,V>[] table = (Entry<K,V>[]) EMPTY_TABLE;

How is the above array defined? ? ? ? ?

Entry is an internal class in HashMap, as can be seen from its member variables:
key: The key when writing.
value: It’s the value.
As mentioned above, HashMap is composed of arrays and linked lists, so this next is used to implement the linked list structure.
Hash stores the hashcode of the current key.

Next, let’s look at the put() and get() methods:
1.1: put(): storage

 public V put(K key, V value) {<!-- -->
        if (table == EMPTY_TABLE) {<!-- -->
            inflateTable(threshold);
        }
        if (key == null)
            return putForNullKey(value);
        int hash = hash(key);
        int i = indexFor(hash, table.length);
        for (Entry<K,V> e = table[i]; e != null; e = e.next) {<!-- -->
            Object k;
            if (e.hash == hash & amp; & amp; ((k = e.key) == key || key.equals(k))) {<!-- -->
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }

        modCount + + ;
        addEntry(hash, key, value, i);
        return null;
    }

1. Determine whether the current array needs to be initialized.
2. If key is empty, put a null value in.
3. Calculate the hashcode based on the key.
4. Locate the bucket based on the calculated hashcode.
5. If the bucket is a linked list, you need to traverse to determine whether the hashcode and key inside are equal to the incoming key. If they are equal, overwrite and return the original value.
6. If the bucket is empty, it means that there is no data stored at the current location; add an Entry object to write to the current location.

void addEntry(int hash, K key, V value, int bucketIndex) {<!-- -->
        if ((size >= threshold) & amp; & amp; (null != table[bucketIndex])) {<!-- -->
            resize(2 * table.length);
            hash = (null != key) ? hash(key) : 0;
            bucketIndex = indexFor(hash, table.length);
        }

        createEntry(hash, key, value, bucketIndex);
    }

    void createEntry(int hash, K key, V value, int bucketIndex) {<!-- -->
        Entry<K,V> e = table[bucketIndex];
        table[bucketIndex] = new Entry<>(hash, key, value, e);
        size + + ;
    }

1. When calling addEntry to write to Entry, you need to determine whether expansion is needed.
2. If necessary, double the expansion and re-hash and locate the current key.
In createEntry, the bucket at the current location will be passed into the newly created bucket. If the current bucket has a value, a linked list will be formed at the location.

1.2. get(): Get

 public V get(Object key) {<!-- -->
        if (key == null)
            return getForNullKey();
        Entry<K,V> entry = getEntry(key);

        return null == entry ? null : entry.getValue();
    }

    final Entry<K,V> getEntry(Object key) {<!-- -->
        if (size == 0) {<!-- -->
            return null;
        }

        int hash = (key == null) ? 0 : hash(key);
        for (Entry<K,V> e = table[indexFor(hash, table.length)];
             e != null;
             e = e.next) {<!-- -->
            Object k;
            if (e.hash == hash & amp; & amp;
                ((k = e.key) == key || (key != null & amp; & amp; key.equals(k))))
                return e;
        }
        return null;
    }
First, the hashcode is calculated based on the key, and then located in the specific bucket.
Determine whether the position is a linked list.
If it is not a linked list, the value will be returned based on whether the key and key's hashcode are equal.
If it is a linked list, it needs to be traversed until the key and hashcode are equal and the value is returned.
If nothing is obtained, null will be returned directly.

Next, take a look at what’s in version 1.8:
The problem in 1.7 is:
When hash conflicts are serious, the linked list formed on the bucket will become longer and longer, so the efficiency of query will become lower and lower; the time complexity is O(N).

Core variables:

 static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

    
    static final int MAXIMUM_CAPACITY = 1 << 30;

    static final float DEFAULT_LOAD_FACTOR = 0.75f;

    static final int TREEIFY_THRESHOLD = 8;

    transient Node<K,V>[] table;

    transient Set<Map.Entry<K,V>> entrySet;

   
    transient int size;
TREEIFY_THRESHOLD is the threshold used to determine whether the linked list needs to be converted into a red-black tree.
HashEntry is modified to Node.

put():

Determine whether the current bucket is empty. If it is empty, it needs to be initialized (resize will determine whether to initialize).
Locate the specific bucket according to the hashcode of the current key and determine whether it is empty. If it is empty, it indicates that there is no hash conflict, so just create a new bucket at the current location.
If the current bucket has a value (Hash conflict), then it is necessary to compare the key in the current bucket, the hashcode of the key and the written key. If they are equal, the value is assigned to e. In step 8, the value will be assigned and returned uniformly.
If the current bucket is a red-black tree, data must be written in the red-black tree manner.
If it is a linked list, you need to encapsulate the current key and value into a new node and write it to the back of the current bucket (forming a linked list).
Then it is judged whether the size of the current linked list is greater than the preset threshold. If it is greater, it will be converted into a red-black tree.
If the same keys are found during the traversal, the traversal will be exited directly.
If e != null is equivalent to the existence of the same key, then the value needs to be overwritten.
Finally, determine whether expansion is needed.

get():

    public V get(Object key) {<!-- -->
        Node<K,V> e;
        return (e = getNode(hash(key), key)) == null ? null : e.value;
    }

    final Node<K,V> getNode(int hash, Object key) {<!-- -->
        Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
        if ((tab = table) != null & amp; & amp; (n = tab.length) > 0 & amp; & amp;
            (first = tab[(n - 1) & amp; hash]) != null) {
            if (first.hash == hash & amp; & amp; // always check first node
                ((k = first.key) == key || (key != null & amp; & amp; key.equals(k))))
                return first;
            if ((e = first.next) != null) {
                if (first instanceof TreeNode)
                    return ((TreeNode<K,V>)first).getTreeNode(hash, key);
                do {
                    if (e.hash == hash & amp; & amp;
                        ((k = e.key) == key || (key != null & amp; & amp; key.equals(k))))
                        return e;
                } while ((e = e.next) != null);
            }
        }
        return null;
    }

First hash the key and then obtain the located bucket.
If the bucket is empty, null is returned directly.
Otherwise, it is judged whether the key at the first position of the bucket (which may be a linked list or a red-black tree) is the query key, and if so, the value is returned directly.
If the first one does not match, determine whether the next one is a red-black tree or a linked list.
Red-black trees return values according to the tree search method.
Otherwise, traverse the matching return value according to the linked list.

In 1.8, large linked lists were optimized. After changing to a red-black tree, the query efficiency was directly improved to o(logn).

HashMap’s original problems also exist, such as the tendency for infinite loops to occur when used in concurrent scenarios

final HashMap<String, String> map = new HashMap<String, String>();
for (int i = 0; i < 1000; i + + ) {<!-- -->
    new Thread(new Runnable() {<!-- -->
        @Override
        public void run() {<!-- -->
            map.put(UUID.randomUUID().toString(), "");
        }
    }).start();
}

reason:

The resize() method will be called when HashMap is expanded, which means that concurrent operations here can easily form a circular linked list on a bucket; in this way, when a non-existent key is obtained, the calculated index is exactly the subscript of the circular linked list. An infinite loop will occur.

Traversal method:
method one:

Iterator<Map.Entry<String, Integer>> entryIterator = map.entrySet().iterator();
        while (entryIterator.hasNext()) {<!-- -->
            Map.Entry<String, Integer> next = entryIterator.next();
            System.out.println("key=" + next.getKey() + " value=" + next.getValue());
        }


Method two:

Iterator<String> iterator = map.keySet().iterator();
        while (iterator.hasNext()){<!-- -->
            String key = iterator.next();
            System.out.println("key=" + key + " value=" + map.get(key));

        }

Recommended use: Type 1

The first method can take out the key value at the same time, but the second method still needs to get the value through the key once, which is less efficient.

Whether it is 1.7 or 1.8, it can actually be seen that the JDK does not perform any synchronization operations on it, so concurrency problems will occur, and even an infinite loop will cause the system to be unavailable.

ConcurrentHashMap:

 is composed of Segment array and HashEntry. Like HashMap, it is still an array plus a linked list.
 /**
     * Segment array, when storing data, you first need to locate it in a specific Segment.
     */
    final Segment<K,V>[] segments;

    transient Set<K> keySet;
    transient Set<Map.Entry<K,V>> entrySet;

segment:

 static final class Segment<K,V> extends ReentrantLock implements Serializable {<!-- -->

        private static final long serialVersionUID = 2249069246763182397L;

        // The same function as HashEntry in HashMap, the bucket that actually stores data
        transient volatile HashEntry<K,V>[] table;

        transient int count;

        transient int modCount;

        transient int threshold;

        final float loadFactor;

    }

The composition of HashEntry:
It is very similar to HashMap. The only difference is that the core data such as value and the linked list are volatile-modified, ensuring visibility during acquisition.

In principle: ConcurrentHashMap uses segment lock technology, in which Segment inherits from ReentrantLock. Unlike HashTable, both put and get operations require synchronization. In theory, ConcurrentHashMap supports thread concurrency of CurrencyLevel (number of Segment arrays). Whenever a thread occupies a lock to access a Segment, it will not affect other Segments.

put:

`java
    public V put(K key, V value) {<!-- -->
        Segment<K,V> s;
        if (value == null)
            throw new NullPointerException();
        int hash = hash(key);
        int j = (hash >>> segmentShift) & amp; segmentMask;
        if ((s = (Segment<K,V>)UNSAFE.getObject // nonvolatile; recheck
             (segments, (j << SSHIFT) + SBASE)) == null) // in ensureSegment
            s = ensureSegment(j);
        return s.put(key, hash, value, false);
    }

Locate the Segment by key, and then perform specific put in the corresponding Segment.

 final V put(K key, int hash, V value, boolean onlyIfAbsent) {<!-- -->
            HashEntry<K,V> node = tryLock() ? null :
                scanAndLockForPut(key, hash, value);
            V oldValue;
            try {<!-- -->
                HashEntry<K,V>[] tab = table;
                int index = (tab.length - 1) & amp; hash;
                HashEntry<K,V> first = entryAt(tab, index);
                for (HashEntry<K,V> e = first;;) {<!-- -->
                    if (e != null) {<!-- -->
                        K k;
                        if ((k = e.key) == key ||
                            (e.hash == hash & amp; & amp; key.equals(k))) {<!-- -->
                            oldValue = e.value;
                            if (!onlyIfAbsent) {<!-- -->
                                e.value = value;
                                 + + modCount;
                            }
                            break;
                        }
                        e = e.next;
                    }
                    else {<!-- -->
                        if (node != null)
                            node.setNext(first);
                        else
                            node = new HashEntry<K,V>(hash, key, value, first);
                        int c = count + 1;
                        if (c > threshold & amp; & amp; tab.length < MAXIMUM_CAPACITY)
                            rehash(node);
                        else
                            setEntryAt(tab, index, node);
                         + + modCount;
                        count = c;
                        oldValue = null;
                        break;
                    }
                }
            } finally {<!-- -->
                unlock();
            }
            return oldValue;
        }

Although the value in HashEntry is modified with the volatile keyword, the atomicity of concurrency is not guaranteed, so locking is still required during the put operation.

First, in the first step, it will try to acquire the lock. If the acquisition fails, there will definitely be competition from other threads, and scanAndLockForPut() will be used to spin to acquire the lock.

Try to spin to acquire the lock.
If the number of retries reaches MAX_SCAN_RETRIES, lock acquisition will be blocked to ensure success.

Let’s look at the put process with the diagram:
1. Locate the table in the current Segment to the HashEntry through the hashcode of the key.
2. Traverse the HashEntry. If it is not empty, determine whether the incoming key and the currently traversed key are equal. If they are equal, the old value will be overwritten.
3. If it is not empty, you need to create a new HashEntry and add it to Segment. At the same time, it will first determine whether expansion is needed.
4. Finally, the lock of the current Segment acquired in 1 will be released.

get:

 public V get(Object key) {<!-- -->
        Segment<K,V> s; // manually integrate access methods to reduce overhead
        HashEntry<K,V>[] tab;
        int h = hash(key);
        long u = (((h >>> segmentShift) & amp; segmentMask) << SSHIFT) + SBASE;
        if ((s = (Segment<K,V>)UNSAFE.getObjectVolatile(segments, u)) != null & amp; & amp;
            (tab = s.table) != null) {<!-- -->
            for (HashEntry<K,V> e = (HashEntry<K,V>) UNSAFE.getObjectVolatile
                     (tab, ((long)((tab.length - 1) & amp; h)) << TSHIFT) + TBASE);
                 e != null; e = e.next) {<!-- -->
                K k;
                if ((k = e.key) == key || (e.hash == h & amp; & amp; key.equals(k)))
                    return e.value;
            }
        }
        return null;
    }

You only need to Hash the Key to the specific Segment, and then Hash it again to locate the specific element.

Since the value attribute in HashEntry is modified with the volatile keyword, memory visibility is guaranteed, so the latest value is obtained every time.

The get method of ConcurrentHashMap is very efficient because the entire process does not require locking.

1.8 –>ConCurrentHashMap:
That is, the efficiency of query traversal of the linked list is too low.

The original Segment segment lock was abandoned and CAS + synchronized was used to ensure concurrency security.

put:

Calculate hashcode based on key.
Determine whether initialization is required.
f is the Node located by the current key. If it is empty, it means that data can be written at the current location. Use CAS to try to write. If it fails, the spin is guaranteed to succeed.
If the hashcode of the current location == MOVED == -1, expansion is required.
If neither is satisfied, use synchronized lock to write data.
If the number is greater than TREEIFY_THRESHOLD, it is converted to a red-black tree.

get():
According to the calculated hashcode addressing, if it is on the bucket, the value is returned directly.
If it is a red-black tree, get the value according to the tree method.
If you are not satisfied, then traverse the linked list to obtain the value.

1.8 has made major changes to the data structure of 1.7. The use of red-black trees can ensure query efficiency (O(logn)), and even canceled ReentrantLock and changed it to synchronized. This can be seen from the optimization of synchronized in the new version of JDK. Very spot on.