Collection framework: characteristics of Set collection, underlying principles of HashSet collection, hash table, implementation of deduplication

Characteristics of Set collection

Set is an unordered, non-repeating data structure. Its characteristics are as follows:

1. The elements in the set are unordered: The elements in the Set have no order and cannot be accessed through indexes.

2. The elements in the set are unique: Duplicate elements are not allowed in the Set, and each element can only appear once in the set.

3.The internal implementation uses a hash table or tree structure: Set is usually implemented internally based on data structures such as hash tables or balanced trees.

4. Can be used for deduplication and quick search: Because the elements in the Set are unique, it can be easily used for deduplication. At the same time, since the internal implementation uses a hash table or tree structure, the time complexity of finding an element is O(1) or O(log n).

5. The elements in the Set must be hashable: Since the elements in the Set are implemented based on hash tables, the elements in the set must be hashable, that is, the elements must have a Explicit hash value. If an element does not have a hash value, then it cannot be used as an element of the Set.

Note:

The common methods used by Set are basically provided by Collection! I hardly added any additional common methods!

Practice code

import java.util.Set;
import java.util.TreeSet;

public class Test_set {
    public static void main(String[] args) {
        //1. Create a set collection object

        //HashSet: unordered, no duplication, no index
        //Set<Integer> set = new HashSet<>(); //Created a HashSet collection object. One line of classic code

        //LinkedHashSet: ordered, non-repeating, no index
        //Set<Integer> set = new LinkedHashSet<>(); //Created a LinkedHashSet collection object

        //TreeSet: sortable (default ascending order), no repetition, no index
        Set<Integer> set = new TreeSet<>(); //Created a TreeSet collection object
        set.add(666);
        set.add(555);
        set.add(555);
        set.add(888);
        set.add(888);
        set.add(777);
        set.add(777);
        System.out.println(set);


    }
}

Hash value

Before learning the underlying principles of the HashSet collection, let’s first understand what a hash value is↓↓↓

Concept

Hash value refers to mapping data of any length into a fixed-length value, usually represented by an integer or a fixed-length byte array. The hash value is also called the hash value (Hash Code) or digest (Digest).

Features

In the computer field, hash values are often used for operations such as data storage, indexing, and encryption. It has the following characteristics:

1. Hash values are fixed-length: No matter what the length of the input data is, the hash function will generate a fixed-length hash value. For example, the common hashing algorithm MD5 produces a hash value of 128 bits, and SHA-1 produces a hash value of 160 bits.

2. Small changes in the input data can lead to huge changes in the hash value: Just changing a tiny part of the input data can cause a huge change in the hash value. This property is called the “avalanche effect” and makes hash values very useful when verifying the integrity of data.

3. Hash values are generally irreversible: Normally, the content of the original data cannot be deduced from the hash value. Hash functions are designed to make it very difficult to produce the same hash value of the original data.

4. The same input data generates the same hash value: The hash function always generates the same hash value for the same input data, which facilitates data storage and comparison.

5. The distribution of hash values should be uniform: A good hash function should be able to evenly map the input data into the hash value space and try to avoid collisions (multiple different input data generating the same hash value).

The public int hashCode() method provided by the Object class in Java can return the hash code value of the object.

The underlying principle of the HashSet collection

In a HashSet, elements are stored in an instance of HashMap, where the value of the element serves as the key, and the hash value of the key (by calling the hashCode() method of the element) is used Determine the position of the element in the hash table. When an element is added to a HashSet, the HashSet will first calculate the hash value of the element and then find the corresponding storage location. If an element already exists at this position, HashSet will use the equals() method to check whether the two elements are equal. If they are equal, it will be considered a duplicate element and will not be added to the set.

To put it simply, the underlying principle of HashSet is based on a hash table, using hash values to quickly find elements and providing efficient addition, deletion and search operations.

Hash table

Since the HashSet collection is implemented based on the hash table, let’s learn about the hash table↓↓↓

Hash table (also called hash table) is a data structure that is directly accessed based on the key value. That is, it accesses records by mapping key values to a location in the table to speed up lookups. This mapping function is called a hash function, and the array storing the records is called a hash table.

Given a table M, there is a function f(key). For any given keyword value key, if the address of the record containing the keyword in the table can be obtained after substituting the function, the table M is called a hash (Hash). Table, function f(key) is a hash (Hash) function.

Realize deduplication

Let’s look at a piece of code first

import java.util.HashSet;
import java.util.Set;

public class Test {
    public static void main(String[] args) {
        //In-depth understanding of the deduplication mechanism of HashSet
        Set<Student> students = new HashSet<>();
        Student st1 = new Student("Supreme Treasure",18,167.5);
        Student st2 = new Student("Spider Spirit",22,169.8);
        Student st3 = new Student("Spider Spirit",22,169.8);
        Student st4 = new Student("牛魔王",19,183.5);
        students.add(st1);
        students.add(st2);
        students.add(st3);
        students.add(st4);
        System.out.println(students);

    }
}

Run it

There are two different objects with the same content st1 and st2, so the HashSet collection cannot be repeated by default. In actual operation, we hope to leave only one object to represent it. How to do this?

//For two objects with the same content, HashSet considers them to be non-duplicate.

/*
If you want the Set collection to consider two objects with the same content as duplicates, you must override the hashcode() and equals() methods of the object.
 */

We can go to the Student class to override the hashcode() and equals() methods

import java.util.Objects;

public class Student {
    private String name;
    private int age;
    private double height;

    public Student() {
    }

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) return false;
        Student student = (Student) o;
        return age == student.age & amp; & amp; Double.compare(height, student.height) == 0 & amp; & amp; Objects.equals(name, student.name);
    }

    @Override
    public int hashCode() {
        return Objects.hash(name, age, height);
    }

    public Student(String name, int age, double height) {
        this.name = name;
        this.age = age;
        this.height = height;
    }

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }

    public int getAge() {
        return age;
    }

    public void setAge(int age) {
        this.age = age;
    }

    public double getHeight() {
        return height;
    }

    public void setHeight(double height) {
        this.height = height;
    }

    @Override
    public String toString() {
        return "Student{" +
                "name='" + name + ''' +
                ", age=" + age +
                ", height=" + height +
                '}';
    }
}

This way there will only be one spider spirit left↓

In terms of length, this blog ends here. In the next article, I will introduce the two hash tables before and after JDK8 in detail. Friends who need it can pay attention~