How the added elements of HashSet are implemented at the bottom level and the details of the changes

Description of the underlying mechanism of HashSet

1.The bottom layer of HashSet is HashiMap

2. To add an element, first get the hash value -> it will be converted into an index value

3. Find the storage data table table and see if there are elements stored at this index position.

4. If not, join directly

5. If there are, call equals to compare. If they are the same, give seven days off. If they are not the same, add them to the end.

6. If the data in a linked list exceeds TREEIFY_THRESHOLD (default value is 8), and the table array size is greater than or equal to MIN_TREEIFY_CAPACITY (default value is 64), it will be treed (black red tree)

Use code analysis

We use a piece of code to analyze the code step by step through debugging. I will explain the process very clearly. After reading this process, you will have a deeper understanding of the underlying mechanism of HashSet.

1. Test code

HashSet hashSet = new HashSet();
        hashSet.add("jack");
        hashSet.add("marry");
        hashSet.add("jack");
        System.out.println("hashSet = " + hashSet);

First we need to have a general understanding of the structure of HashSet–

There is a table at the bottom, which is essentially an array. It saves the linked list as array elements.

We will analyze the details of first addition, second addition, addition of repeated elements and table space.

2.Debug analysis of the process of adding data for the first time

Set the breakpoint at the step of creating the object

Call the HashSet parameterless constructor

The parameterless constructor of HashSet is called here, and a new HashMap object is created in the constructor (in fact, when we talk about HashSet, we are talking about HashMap)

Call the add() method

As you can see, this method returns a boolean value to determine whether the operation is successful.

The parameter passed in is the element to be added

The return value calls a method and compares it with null. Let’s go into it to find out. Finally, the method returns and we will talk about this.

Let’s look back at this PRESENT. In fact, this object has no real effect and serves as a placeholder.

Call the put() method

The formal parameter key passed in by this method is the element object passed in, and the value is the value of PRESENT, the underlying value that appears for the two parameters of HashMap (don’t worry too much)

Then this method will call the putVal method. When passing in parameters, it will call the hash() method to give a value.

Let’s get into this method to find out.

Call the hash() method

Pass in the element object to be added into the method

This method uses an algorithm. In the ternary operation, it first determines whether the incoming object is empty. If it is empty, it returns 0. Otherwise, the hashCode of the element object is unsigned and right-shifted by 16 bits, and then the value is Assign to h and return

Get the index into putVal(method)
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        if ((p = tab[i = (n - 1) & amp; hash]) == null)
            tab[i] = newNode(hash, key, value, null);
        else {
            Node<K,V> e; K k;
            if (p.hash == hash & amp; & amp;
                ((k = p.key) == key || (key != null & amp; & amp; key.equals(k))))
                e = p;
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else {
                for (int binCount = 0; ; + + binCount) {
                    if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    if (e.hash == hash & amp; & amp;
                        ((k = e.key) == key || (key != null & amp; & amp; key.equals(k))))
                        break;
                    p = e;
                }
            }
            if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
         + + modCount;
        if ( + + size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }

You can see that the structure of this method is quite complicated, but don’t be afraid, we will analyze it step by step, step by step

First of all, the first judgment statement is to point the tab to the table to determine whether it is null, or assign the length of the tab array pointing to the table to n to determine whether the length is 0. If the conditions are met, execute the statement within the if

Enter the resize() method
final Node<K,V>[] resize() {
        Node<K,V>[] oldTab = table;
        int oldCap = (oldTab == null) ? 0 : oldTab.length;
        int oldThr = threshold;
        int newCap, newThr = 0;
        if (oldCap > 0) {
            if (oldCap >= MAXIMUM_CAPACITY) {
                threshold = Integer.MAX_VALUE;
                return oldTab;
            }
            else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY & amp; & amp;
                     oldCap >= DEFAULT_INITIAL_CAPACITY)
                newThr = oldThr << 1; // double threshold
        }
        else if (oldThr > 0) // initial capacity was placed in threshold
            newCap = oldThr;
        else { // zero initial threshold signifies using defaults
            newCap = DEFAULT_INITIAL_CAPACITY;
            newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
        }
        if (newThr == 0) {
            float ft = (float)newCap * loadFactor;
            newThr = (newCap < MAXIMUM_CAPACITY & amp; & amp; ft < (float)MAXIMUM_CAPACITY ?
                      (int)ft : Integer.MAX_VALUE);
        }
        threshold = newThr;
        @SuppressWarnings({"rawtypes","unchecked"})
        Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
        table = newTab;
        if (oldTab != null) {
            for (int j = 0; j < oldCap; + + j) {
                Node<K,V> e;
                if ((e = oldTab[j]) != null) {
                    oldTab[j] = null;
                    if (e.next == null)
                        newTab[e.hash & amp; (newCap - 1)] = e;
                    else if (e instanceof TreeNode)
                        ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                    else { // preserve order
                        Node<K,V> loHead = null, loTail = null;
                        Node<K,V> hiHead = null, hiTail = null;
                        Node<K,V> next;
                        do {
                            next = e.next;
                            if ((e.hash & amp; oldCap) == 0) {
                                if (loTail == null)
                                    loHead = e;
                                else
                                    loTail.next = e;
                                loTail = e;
                            }
                            else {
                                if (hiTail == null)
                                    hiHead = e;
                                else
                                    hiTail.next = e;
                                hiTail = e;
                            }
                        } while ((e = next) != null);
                        if (loTail != null) {
                            loTail.next = null;
                            newTab[j] = loHead;
                        }
                        if (hiTail != null) {
                            hiTail.next = null;
                            newTab[j + oldCap] = hiHead;
                        }
                    }
                }
            }
        }
        return newTab;
    }

Similarly, this method body is also very long. Let’s analyze it step by step.

First, some amount of assignment operations are performed. This table is assigned to oldTab to represent the array that was initially passed in.

Next is a ternary operation to determine whether oldTab is empty. If it is empty, it is 0, otherwise it is the length of oldTab.

Get this value and assign it to oldCap, which represents the length of the incoming array.

Assign threshold to oldThr, which is also 0. This represents a threshold. We will mention why this threshold exists below.

For this if statement, odlCap is 0, so it is not executed and continues going down.

Judge again

oldThr is also 0, this if will not be executed, and continues.

So only the statements of this code block can be executed

There is a new quantity newCap here, which represents the new array length (it can be seen here that this method is actually a method of judgment and expansion)

DEFAULT_INITIAL_CAPACITY is a system-defined quantity, its default value is 16

DEFAULT_LOAD_FACTOR is a calculation factor, its default value is 0.75

Here I will talk about the meaning of its existence–

In this method, the source code author is very smart. He did one thing. He made a buffer of data.

Let me give an analogy. We need to collect water from a basin. We have 16 basins and the water is constantly flowing. When we receive the 12th basin, we will bring a new basin and prepare it instead of waiting until all 16 basins are connected. Take it when it’s full

In fact, this is a kind of data buffering, which improves efficiency and speed.

So what this code block means is to first change the array length to 16, and then assign 16 * calculation factor = 12 to newThr as a threshold. This is the threshold just mentioned, which means that once it is reached, this critical value, Just expand the capacity instead of waiting for the array to be filled up before expanding again.

Continue to code

newThr is not 0, so the statement inside the if is not executed and execution continues.

Assign new threshold value to threshold

Create a new array newTab with length newCap

Let table point to this new array newTab

Then determine whether the array passed in at the beginning is not null. Because we started with an empty array, we will not execute it and continue with the code.

and then return this new array

Back to putVal() method

Change this The returned array object is assigned to tab and its length is assigned to n

There will be an algorithm here to get the index of this position in the tab array and determine whether this position is null.

His algorithm is (n-1) & amp;hash

If this place is empty, then put the key, which is the element object to be added, at the index position of the array, and let the next point at this position point to null, because it is essentially a linked list, and it needs a next pointer.

Then modCount records the number of operations

The following if statement determines whether the existing elements in the array plus 1 are greater than the critical value

You can think of it this way, this critical value is used to determine whether expansion is needed. Compare the existing size + 1 with the critical value. Size + 1 is the space required to add another element. If this space is greater than the critical value, expansion is required.

This is not greater than the critical value, so resize is not performed.

Going down, this afterNodeInsertion is an abstract method given by HashMap, which can implement some business logic.

then returns a null

Return to put() method

Return to add() method

The return value null and null are used here to compare. If it is true, it means that the operation was successfully completed.

This is the process of adding data for the first time. Next, we continue to analyze the process of continuing to add data.

3.Debug analysis continues to add data

I won’t demonstrate the above process that has not changed. I will only talk about the process that has changed.

Enter the PutVal() method

Here we see that this time the tab is no longer empty, and the length of the tab is no longer 0, so the if statement is not executed and continues.

The same method is used to determine whether the element exists at the index position. If it does not exist, add it directly.

But if there is an element at the index position here, it will be executed

if (p.hash == hash & amp; & amp;
                ((k = p.key) == key || (key != null & amp; & amp; key.equals(k))))
                e = p;

Make a judgment here, if the incoming hash value is the same as the hash value of the index position, that is, p.hash, and the incoming element is the same as the index position element, or the incoming element is not empty and the incoming object is the same as the index position object. different

Then e will point to p

Then this object will be returned. According to the above process, if the return value is not null, it means that the adding operation failed.

else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);

Or if the element at the index position is a red-black tree, the black-red tree operation will be performed.

If it is neither of these two, then execute (this means that the first node is different, tian’ji to the back of the chain)

else {
                for (int binCount = 0; ; + + binCount) {
                    if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    if (e.hash == hash & amp; & amp;
                        ((k = e.key) == key || (key != null & amp; & amp; key.equals(k))))
                        break;
                    p = e;
                }
            }

Let’s analyze it. First, it is an infinite loop. The first judgment statement makes e point to the element pointed to by next at the index position. If it is empty, then directly connect this element to the back of the element at the index position.

There is also a judgment statement in this if statement, which is to judge whether there are eight elements in the chain where the element is located. If there are eight elements, it will be treed, and then break to exit this infinite loop.

The next if statement, because e points to the element pointed to by next of p, which is the next element of p, and e represents the element after p

If the hash value of e is the same as the hash value of the element to be added and the element at the position of e is the same as the element to be added or the element to be added is not empty and the object pointed by e is the same as the element object to be added, exit this infinite loop.

Otherwise, let p point to e, which means moving p backward and continuing to loop.

Then keep walking

if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }

It can be analyzed here that e itself is empty, but if it enters this large else statement, it will be assigned under certain conditions, that is, when elements cannot be added.

So far we can wait until the process of adding data again is to first determine whether there are elements at the array node. If not, just put them here. There is no need to consider whether the same elements will be added to different indexes because It is an index obtained through hash value and some algorithms. The same element will not be added because the index will be the same. In other words, HashSet only accepts one type of element, because the same element has the same index value and the position is the same. If it is not empty, it will judge one by one whether the elements in this chain are the same. If not, it will be added. If there are, it will not be added!

4.table expansion rules and details

The reseze expansion method is actually written very clearly.

He determines that twice the capacity of the old array is less than the maximum capacity, and the capacity of the old array is greater than 16, so the capacity is expanded, and the value assigned is oldCap, shifted by one to the left, multiplied by two -> assigned to newCap

If it does not reach 16, use the size of the critical value as the array size. If it is none of these, it is the first expansion, which is the following statement

The size of the first expansion

It can be seen that when the critical value is reached, the array will be expanded by twice the size instead of expanding until the array is full.

In order to be able to see the addition of linked lists and tree formation, I came up with an idea

I write a class with a number in it, and override the hashCode method so that they return the same value every time, but the element is different, so that they can be arranged in order!

 for (int i = 1; i <= 7; i + + ) {
            hashSet.add(new A(i));
        }
        for (int i = 1; i <= 7; i + + ) {
            hashSet.add(new B(i));
        }
    }
}

class A {
    private int n;

    public A(int n) {
        this.n = n;
    }

    @Override
    public int hashCode() {
        return 100;
    }
}
class B {
    private int n;

    public B(int n) {
        this.n = n;
    }

    @Override
    public int hashCode() {
        return 200;
    }
}

To get this effect, have them chained

Next we add it from scratch

It can be seen that the size of size is actually the number of all elements, not the number of nodes. Each element in the chain will also increase size.

By 13, it was expanded again

but!

When the array length is 64 and the number of added chains exceeds 8, it will be treed! These two conditions are indispensable

At this point we have reached the conclusion–

1. When the amount of data reaches a critical value, it will be expanded with twice the capacity of the original array until 64

2. If the array reaches length 64 and the chain length on the node is greater than or equal to 8, then the storage structure will change and become a tree.

The knowledge points of the article match the official knowledge files, and you can further learn related knowledge. Algorithm skill tree Home page Overview 57159 people are learning the system