[Data structure] The underlying data structure of HashSet

Personal homepage: Ye Luoxianting
My column:
c language
Data structure
javaEE
Operating System
Redis


Stones can be broken, but not strong; elixirs can be ground, but not red.

HashSet

  • 1. The underlying data structure of the HashSet collection
  • 2. The process of adding elements to HashSet
  • 3. Why the order of storage and retrieval of HashSet is different?
  • 4. Why does HashSet have no index?
  • 5. HashSet’s deduplication mechanism
  • Set series collection
    • Out of order: The access sequence is inconsistent
    • No duplication: Duplication can be removed
    • No index: There is no indexed method, so you cannot use ordinary fo loop traversal, nor can you obtain elements through indexes

1. The underlying data structure of the HashSet collection

  • HashSet: Unordered, non-repeating, no index
  • The bottom layer of HashSet uses a hash table to store data. The hash table is a structure with good performance for adding, deleting, modifying and querying data.
  • Before JDK8, the hash table was composed of array + linked list. After JDK8, it was composed of array + linked list + red-black tree.
  • In a hash table, the most important thing is the hash value. The hash value is the integer representation of the object. When HashSet stores data, it will calculate the location to be stored based on the array length and hash value. The hash value It is an int type integer calculated based on the hashCode() method. The hashCode() method is defined in the Object class and can be called by all objects. By default, the address value is used for calculation. In general, custom objects must override hashCode. () method, uses the attribute value inside the object to calculate the hash value.
int index = (array length - 1) & amp; hash value;
  • Object hash value characteristics:
    • If the hashCode() method is not overridden, the hash values calculated by different objects created by the same class are different.
public class Student {<!-- -->
    private String name;
    private int age;

    public Student() {<!-- -->
    }

    public Student(String name, int age) {<!-- -->
        this.name = name;
        this.age = age;
    }

    /**
     * Obtain
     * @return name
     */
    public String getName() {<!-- -->
        return name;
    }

    /**
     * set up
     * @param name
     */
    public void setName(String name) {<!-- -->
        this.name = name;
    }

    /**
     * Obtain
     * @return age
     */
    public int getAge() {<!-- -->
        return age;
    }

    /**
     * set up
     * @param age
     */
    public void setAge(int age) {<!-- -->
        this.age = age;
    }

    public String toString() {<!-- -->
        return "Student{name = " + name + ", age = " + age + "}";
    }
}
public static void main(String[] args) {<!-- -->
        //Create object
        //The hashCode method is not overridden, the calculated hash value is different
        Student s1 = new Student();
        Student s2 = new Student();
        System.out.println(s1.hashCode());//460141958
        System.out.println(s2.hashCode());//1163157884
    }
  • If the hashcode method has been overridden, as long as the attribute values of different objects are the same, the calculated hash values will be the same.
public class Student {<!-- -->
    private String name;
    private int age;

    @Override
    public boolean equals(Object o) {<!-- -->
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) return false;
        Student student = (Student) o;
        return age == student.age & amp; & amp; Objects.equals(name, student.name);
    }

    @Override
    public int hashCode() {<!-- -->
        return Objects.hash(name, age);
    }

    public Student() {<!-- -->
    }

    public Student(String name, int age) {<!-- -->
        this.name = name;
        this.age = age;
    }

    /**
     * Obtain
     * @return name
     */
    public String getName() {<!-- -->
        return name;
    }

    /**
     * set up
     * @param name
     */
    public void setName(String name) {<!-- -->
        this.name = name;
    }

    /**
     * Obtain
     * @return age
     */
    public int getAge() {<!-- -->
        return age;
    }

    /**
     * set up
     * @param age
     */
    public void setAge(int age) {<!-- -->
        this.age = age;
    }

    public String toString() {<!-- -->
        return "Student{name = " + name + ", age = " + age + "}";
    }
}
public static void main(String[] args) {<!-- -->
        //Create object
        //If the hashcode method has been overridden, as long as the attribute values of different objects are the same, the calculated hash values will be the same.
        Student s1 = new Student();
        Student s2 = new Student();
        System.out.println(s1.hashCode());//961
        System.out.println(s2.hashCode());//961
    }
  • In a small number of cases, the hash values calculated from different attribute values or different address values may be the same (hash collision)
public static void main(String[] args) {<!-- -->
        //In a small number of cases, the hash values calculated from different attribute values or different address values may be the same. (hash collision)
        System.out.println("abc".hashCode());//96354
        System.out.println("acD".hashCode());//96354
    }

2. The process of adding elements to HashSet

The underlying principle of HashSet after JDK8:

  • Create an array with a default length of 16 and a default loading factor of 0.75. The array is named table.
  • The location where the element should be stored is calculated based on the hash value of the element and the length of the array.
int index = (array length - 1) & amp; hash value;
  • Determine whether the current position is null. If it is null, store it directly.
  • If the current position is not null, it means there is an element, and the equals() method is called to compare with the properties of the current position.
    • If they are the same, discard them and do not save them.
    • If they are different, store them in an array to form a linked list.
  • Before JDK8, new elements were stored in arrays, and old elements were hung under new elements to form a linked list.
  • After JDK8, new elements are hung under old elements to form a linked list.
  • When the length of the linked list is greater than 8 and the length of the array is greater than or equal to 64, the current linked list will automatically be converted into a red-black tree
  • If the collection stores custom objects, you must override the hashCode and equals methods

3. Why the order of storage and retrieval of HashSet is different

When traversing HashSet, it starts traversing from the 0 index of the array. Under each index, the corresponding linked list under the index must be traversed. When an index is traversed and the value of this index is empty, it will be skipped and the next one will be traversed. Index, when there is a linked list under the index, the linked list will be traversed. If it is a red-black tree, the red-black tree will also be traversed. The array will be traversed according to this principle, because the corresponding element under a certain index is not necessarily the one when it is stored. order, so the order of HashSet storage and retrieval is not necessarily the same.

4. Why HashSet has no index

HashSet is composed of array + linked list + red-black tree. The array is indexed, but the elements in HashSet are hung under each index of the array in the form of linked list or red-black tree, that is, each index may Corresponds to multiple elements, so HashSet cannot find the corresponding element by index.

5. HashSet deduplication mechanism

HashSet uses HashCode to calculate the location where each element should be stored, and then uses the equals method to compare whether the attribute values in the object are consistent to ensure that there will be no duplicate elements.