hashCode in Java

1. hashCode method

Java is an object-oriented programming language. All classes inherit from the Object class by default. The Object class contains the hashCode() method.

// Java 1.8
public native int hashCode();

// Java 9
@HotSpotIntrinsicCandidate
public native int hashCode();

This means that all classes will have a hashCode() method, which returns a value of type int. Since the hashCode() method is a native method (a method modified with the native keyword, implemented in C/C++ language, and called by Java), it means that no specific implementation is given in the Object class.

In Java 9, the hashCode() method is decorated with the @HotSpotIntrinsicCandidate annotation, indicating that it has an efficient implementation in the HotSpot virtual machine and is based on CPU instructions.

Then there is a question: Why does the Object class need a hashCode() method?

In Java, the main function of the hashCode() method is to be used with hash tables.

HashTable, also called hash table, is a data structure that can be directly accessed through key-value. Its biggest feature is that it can quickly achieve search, insertion and deletion. The algorithm used is called hashing, which converts input of any length into an output of fixed length, and the output is the hash value. picture
HashSet, Hashtable, and HashMap in Java are all specific implementations of hash tables. Among them, HashMap is the most typical representative.

We can imagine what we should do if there is no hash table, but we need such a data structure, and the data stored in it is not allowed to be repeated.

Obviously, the method we think of is to use the equals() method to compare one by one. However, if the amount of data is abnormally large, the efficiency of using the equals() method to compare one by one must be very, very low. Therefore, the best solution is a hash table.

☆☆☆ Take HashMap as an example. When we want to add an object to it, we first call the hashCode() method of this object to get the corresponding hash value, and then put the hash value and the object together into the HashMap. When we want to add a new object, we will perform the following steps:

Get the hash value of the object;
Compare with the previously existing hash value, if not equal, store it directly;
If there are equals, call the equals() method to compare the objects. If they are equal, they are not stored; if they are not equal, it means a hash conflict and a linked list is added to store the new object (if the length of the linked list is greater than 8, it is converted to a red-black tree for processing).
After such a set of processes, the frequency of calling the equals() method is greatly reduced. In other words, as long as the hash algorithm is efficient enough to minimize the frequency of hash conflicts, the efficiency of the hash table will be particularly high.

Let’s first look at the hash algorithm of HashMap:

static final int hash(Object key) {<!-- -->
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

It first calls the object’s hashCode() method, then right-shifts the value, and then XORs it.

Generally speaking, String will be used as the key of HashMap for hash operation, so let’s take a look at String’s hashCode() method:

public int hashCode() {<!-- -->
    int h = hash;
    if (h == 0 & amp; & amp; value.length > 0) {<!-- -->
        hash = h = isLatin1() ? StringLatin1.hashCode(value)
                : StringUTF16.hashCode(value);
    }
    return h;
}

public static int hashCode(byte[] value) {<!-- -->
    int h = 0;
    int length = value.length >> 1;
    for (int i = 0; i < length; i + + ) {<!-- -->
        h = 31 * h + getChar(value, i);
    }
    return h;
}

Although it is not specifically clear what operations were performed, it can be seen that after such a series of complex operations, coupled with such a design, the probability of hash conflicts may have been reduced to a minimum.

However, in theory, two different objects may have the same value calculated by the hashCode() method. Therefore, you cannot use the hashCode() method to determine whether two objects are equal, you must use the equals() method to determine whether two objects are equal.

That is to say:
If the result of calling the equals() method on two objects is true, the result of calling the hashCode() method must be equal;
If the results obtained by calling the hashCode() method on two objects are not equal, the results obtained by calling the equals() method must not be equal.
If the result of calling the equals() method on two objects is false, the results of calling the hashCode() method may be equal;
If the results obtained by calling the hashCode() method on two objects are equal, the results obtained by calling the equals() method may not be equal.
for example!

Without overriding the equals() method:

/**
 * Employee category
 *
 * @author qiaohaojie
 * @date 2023/3/15 22:03
 */
public class Employee {<!-- -->

    private int age;
    private String name;

    public Employee(Integer age, String name) {<!-- -->
        this.age = age;
        this.name = name;
    }

    @Override
    public String toString() {<!-- -->
        return super.toString().substring(18) + "{" +
                "age=" + age +
                ", name='" + name + '\'' +
                '}';
    }
}
/**
 * hashCode method in Java
 *
 * @author qiaohaojie
 * @date 2023/3/14 21:42
 */
public class HashCodeTest {<!-- -->
    public static void main(String[] args) {<!-- -->
        Employee employee1 = new Employee(23,"green pepper");
        Employee employee2 = new Employee(23,"green pepper");
        System.out.println(employee1); // Employee@5a07e868{age=23, name='Green Pepper'}
        System.out.println(employee2); // Employee@76ed5528{age=23, name='Green Pepper'}
        Map<Employee, Integer> map = new HashMap<>();
        map.put(employee1, 99);
        System.out.println(map.get(employee2)); // null
    }
}

Override the equals() method:

/**
 * Employee category
 *
 * @author qiaohaojie
 * @date 2023/3/15 22:03
 */
public class Employee {<!-- -->

    private int age;
    private String name;

    public Employee(Integer age, String name) {<!-- -->
        this.age = age;
        this.name = name;
    }

    @Override
    public String toString() {<!-- -->
        return super.toString().substring(18) + "{" +
                "age=" + age +
                ", name='" + name + '\'' +
                '}';
    }

    @Override
    public boolean equals(Object o) {<!-- -->
        Employee employee = (Employee) o;
        return age == employee.age & amp; & amp;
                Objects.equals(name, employee.name);
    }
}
/**
 * hashCode method in Java
 *
 * @author qiaohaojie
 * @date 2023/3/14 21:42
 */
public class HashCodeTest {<!-- -->
    public static void main(String[] args) {<!-- -->
        Employee employee1 = new Employee(23,"green pepper");
        Employee employee2 = new Employee(23,"Green Zanthoxylum bungeanum");
        System.out.println(employee1); // Employee@5a07e868{age=23, name='Green Pepper'}
        System.out.println(employee2); // Employee@76ed5528{age=23, name='Green Pepper'}
        Map<Employee, Integer> map = new HashMap<>();
        map.put(employee1, 99);
        System.out.println(map.get(employee2)); // null
    }
}

Obviously, the results of both are the same (neither is null). The reason is that when overriding the equals() method, the hashCode() method is not overridden. By default, the hashCode() method is a local method and will return the storage address of the object, while employee1 and map in put() The objects in .get(employee2) are two objects, so their storage addresses must be different.

The get() method of HashMap will call hash(key.hashCode()) to calculate the hash value of the object. Although two different hashCode() results may get the same result after being calculated by the hash() method, this probability It is very, very small, so map.get(employee2) cannot get the expected value of 99.

How to solve this problem? Override hashCode() method:

@Override
public int hashCode() {<!-- -->
    return Objects.hash(age, name);
}
/**
 * hashCode method in Java
 *
 * @author qiaohaojie
 * @date 2023/3/14 21:42
 */
public class HashCodeTest {<!-- -->
    public static void main(String[] args) {<!-- -->
        Employee employee1 = new Employee(23,"green pepper");
        Employee employee2 = new Employee(23,"green pepper");
        System.out.println(employee1); // Employee@2484ddd{age=23, name='Green Pepper'}
        System.out.println(employee2); // Employee@2484ddd{age=23, name='Green Pepper'}
        Map<Employee, Integer> map = new HashMap<>();
        map.put(employee1, 99);
        System.out.println(map.get(employee2)); // 99
    }
}

This gives the expected result of 99.

The hash() method of the Objects class can generate new hashCode() values for different numbers of arguments:

// Lines 127~129 of the Objects class
public static int hash(Object... values) {<!-- -->
    return Arrays.hashCode(values);
}

// Lines 4139~4149 of the Arrays class
public static int hashCode(Object a[]) {<!-- -->
    if(a==null)
        return 0;

    int result = 1;

    for (Object element : a)
        result = 31 * result + (element == null ? 0 : element.hashCode());

    return result;
}

The mathematical formula summarized is as follows (n is the length of the string):

Note (I don’t quite understand it, I heard people say it this way): 31 is an odd prime number, neither big nor small. Generally, prime numbers are very suitable for hash calculations. Even numbers are equivalent to displacement operations and are prone to overflow, resulting in loss of data information.

This means that when age and name are the same, the same hash value will be obtained. Therefore, map.get(employee2) will return the expected result 99.

There is a passage in “Java Becomes Thought” that describes the hashCode() method:

The most important factor when designing hashCode() is that whenever hashCode() is called on the same object, it should produce the same value. If an object is added to a HashMap using the put() method and a hashCode() value is generated, and another hashCode() value is generated when it is taken out using the get() method, then the object cannot be retrieved. Therefore, if your hashCode() method relies on volatile data in the object, users should be careful, because when this data changes, hashCode() will generate a different hash value, which is equivalent to a different hash value. key.

In other words, if a field in the object is prone to change when overriding the hashCode() and equals() methods, it is best to discard these fields to avoid unpredictable results.

2. Why must the hashCode method be rewritten when overriding the equals method?

As mentioned above, Java is an object-oriented programming language. All classes inherit from the Object class by default. The Object class contains these two methods:

public native int hashCode();

public boolean equals(Object obj) {<!-- -->
    return (this == obj);
}

hashCode() method: is a local method used to return the hash value of the object (an integer). During the execution of a Java program, calling this method multiple times on the same object returns the same hash value.
equals() method: For task non-null references x and y, the equals() method returns true if and only if x and y refer to the same object.
Judging from these explanations, there seems to be no connection between the two methods, but there are two pieces of information from CNOOC in the documents of these two methods:

First, if two objects call the equals() method and return true, then the two objects call the hashCode() method and return the same result.
Second, whenever the equals() method is overridden, the hashCode() method also needs to be rewritten in order to maintain the previous specification.
Why is this?
The function of thehashCode() method is to obtain the hash value, and the function of the hash value is to determine the index position of the object in the hash table.

A typical representative of a hash table is HashMap, which stores key-value pairs and can quickly retrieve the corresponding value according to the key through the get() method:

public V get(Object key) {<!-- -->
    Node<K,V> e;
    return (e = getNode(hash(key), key)) == null ? null : e.value;
}

This is the get method of HashMap, a method to get the value through the key. It calls the getNode() method:

final Node<K,V> getNode(int hash, Object key) {<!-- -->
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    if ((tab = table) != null & amp; & amp; (n = tab.length) > 0 & amp; & amp;
        (first = tab[(n - 1) & amp; hash]) != null) {<!-- -->
        if (first.hash == hash & amp; & amp; // always check first node
            ((k = first.key) == key || (key != null & amp; & amp; key.equals(k))))
            return first;
        if ((e = first.next) != null) {<!-- -->
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            do {<!-- -->
                if (e.hash == hash & amp; & amp;
                    ((k = e.key) == key || (key != null & amp; & amp; key.equals(k))))
                    return e;
            } while ((e = e.next) != null);
        }
    }
    return null;
}

Under normal circumstances (no hash conflict occurs), first = tab[(n – 1) & amp; hash] is the value corresponding to the key. In terms of time complexity, it can be expressed as O(1).

If a hash conflict occurs, that is, in the if ((e = first.next) != null) {} clause, you can see that if the node is not a red-black tree, the do-while loop statement will be used to determine whether it is < em>equals() returns the corresponding value. In terms of time complexity, it can be expressed as O(n).

HashMap solves hash conflicts through the zipper method. That is, if a hash conflict occurs, multiple values will be placed in the pit of the same key. After more than 8 values, it will be changed to a red-black tree in order to improve the efficiency of query.

Obviously, in terms of time complexity, the performance of O(n) is worse than O(1), which is where the value of the hash table lies.

You can think about it, if there is no hash table, but you need such a data structure, the data stored in it is not allowed to be repeated, what should you do? Do we really need to use the equals() method to compare them one by one? If the amount of data is particularly large, performance will be poor, so the best solution is HashMap.

HashMap is essentially implemented through arrays. When we want to get a value from HashMap, we actually need to get the element at a certain position in the array, and the position is determined by the key. When stored in the put() method, Stored key-value pairs:

public V put(K key, V value) {<!-- -->
    return putVal(hash(key), key, value, false, true);
}

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {<!-- -->
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    if ((p = tab[i = (n - 1) & amp; hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else {<!-- -->
        Node<K,V> e; K k;
        if (p.hash == hash & amp; & amp;
            ((k = p.key) == key || (key != null & amp; & amp; key.equals(k))))
            e = p;
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {<!-- -->
            for (int binCount = 0; ; + + binCount) {<!-- -->
                if ((e = p.next) == null) {<!-- -->
                    p.next = newNode(hash, key, value, null);
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                if (e.hash == hash & amp; & amp;
                    ((k = e.key) == key || (key != null & amp; & amp; key.equals(k))))
                    break;
                p = e;
            }
        }
        if (e != null) {<!-- --> // existing mapping for key
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
     + + modCount;
    if ( + + size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}

Usually, p = tab[i = (n – 1) & hash]) is the value corresponding to the key. The index of the array (n – 1) & hash is calculated based on the hashCode() method:

static final int hash(Object key) {<!-- -->
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

So why should we override the hashCode() method when overriding the equals() method?
Then the test class above:

/**
 * hashCode method in Java
 *
 * @author qiaohaojie
 * @date 2023/3/14 21:42
 */
public class HashCodeTest {<!-- -->
    public static void main(String[] args) {<!-- -->
        Employee employee1 = new Employee(23,"green pepper");
        Employee employee2 = new Employee(23,"green pepper");
        System.out.println(employee1); // Employee@2484ddd{age=23, name='Green Pepper'}
        System.out.println(employee2); // Employee@2484ddd{age=23, name='Green Pepper'}
        Map<Employee, Integer> map = new HashMap<>();
        map.put(employee1, 99);
        System.out.println(map.get(employee2)); // null does not override the hashCode() method
        System.out.println(map.get(employee2)); // 99 Override hashCode() method
    }
}

When the hashCode() method is not overridden, the hashCode values of the two objects:

After overriding the hashCode() method, the hashCode values of the two objects:

When overriding the hashCode() method, you can directly call the hash() method of the Objects class:

@Override
public int hashCode() {<!-- -->
    return Objects.hash(age, name);
}

Therefore, every time the equals() method is rewritten, the hashCode() method also needs to be rewritten to ensure that if two objects call the equals() method and the result returned is true, then the two objects call the hashCode() method and return The result must be the same.