Why use hashing algorithm? Hand-shred HashSet: add, delete, exist, expand

#I have been practicing algorithms and structures recently, and their importance is self-evident. What I’m sharing today is a hand-shredded HashSet. Let’s start from the interviewer’s perspective#

1. Interviewer: Please tell me about your understanding of HashSet and why you should use the hash algorithm

Little Z: Let’s talk about its official definition first (list these to the interviewer first and then explain them one by one)

1. Understanding:

1. HashSet is a set that does not allow duplicate elements.

2. It implements the Set interface and inherits the AbstractSet abstract class.

3. HashSet is implemented based on HashMap. The underlying implementation of the set is a hash table, which uses a hash algorithm to store and manage the elements in the set.

4.The elements in the HashSet collection are not ordered.

5.HashSet allows null values.

6.HashSet is not thread-safe.

2. Why use hash algorithm (benefits)

1.Performance: HashSet has fast lookup performance. Due to the design of hash tables, the time complexity required to find an element is O(1) on average. But in the worst case, when hash collisions occur more often, the lookup time may increase to O(n), where n is the number of elements. The big O notation will not be introduced in detail here. Simply put, it is because of its duplication characteristics and partition comparison, so it is very fast to search. (Don’t understand? Picture above)

This is the case without using hashSet: as shown below

The following is the situation of using hashSet. Imagine that you are traveling with an extra-long suitcase and classify the luggage. Each major category has a hash code. For example, the hash code of a change of clothes is 2. When a new pair of pants is needed, the hash code of the pants is also calculated to be 2. At this time, there is no need to search the entire suitcase (HashSet). You only need to open the bucket with a hash code of 2 and search to see if there are the same pants. If there are the same pants, give up adding them. If not, add them. The strict expression is as follows: When an element is to be inserted, HashSet will first calculate the hash code of the element, and then determine which bucket to put the element into. If multiple elements have the same hash code, they are put into the same bucket and form a linked list.

Benefits: Improved performance, no need to compare each time you add, but partition comparison.

Disadvantages: It takes up space, because it is hashed, so space is exchanged for time. For example, there are 10 empty positions in the suitcase, which may not all be full, and each major category is not closely connected, and the empty positions are randomly selected.

Tip: The hash codes with the same content are the same; conversely, the hash codes are the same, but the content is not necessarily the same. eg: Toothbrush and toothpaste both belong to toiletries and have the same hash code. But the hash codes are the same, they are different.

The interviewer may continue to ask: Why is HashSet thread-unsafe?

Little Z: Because the add method of HashSet calls the put method of HashMap at the bottom. As for why HashMap is thread-unsafe, please jump to the link to continue learning.

3. Shred HashSet by hand

The HashSet class is located in the java.util package and needs to be introduced before use. The syntax format is as follows:

import java.util.HashSet; //Introduce the HashSet class

1. Add elements (from tutorial https://www.runoob.com/java/java-hashset.html)

The HashSet class provides many useful methods. To add elements, you can use the add() method:

//Introduce HashSet class
import java.util.HashSet;

public class RunoobTest {
public static void main(String[] args) {
HashSet sites = new HashSet();
sites.add(“Google”);
sites.add(“Runoob”);
sites.add(“Taobao”);
sites.add(“Zhihu”);
sites.add(“Runoob”); // Duplicate elements will not be added
System.out.println(sites);
}
}

Executing the above code, the output results are as follows:

[Google, Runoob, Zhihu, Taobao]

2. Determine whether the element exists

We can use the contains() method to determine whether an element exists in the collection:

//Introduce HashSet class
import java.util.HashSet;

public class RunoobTest {
public static void main(String[] args) {
HashSet sites = new HashSet();
sites.add(“Google”);
sites.add(“Runoob”);
sites.add(“Taobao”);
sites.add(“Zhihu”);
sites.add(“Runoob”); // Duplicate elements will not be added
System.out.println(sites.contains(“Taobao”));
}
}

Executing the above code, the output results are as follows:

true

3. Delete elements

We can use the remove() method to remove elements from the collection:

//Introduce HashSet class
import java.util.HashSet;

public class RunoobTest {
public static void main(String[] args) {
HashSet sites = new HashSet();
sites.add(“Google”);
sites.add(“Runoob”);
sites.add(“Taobao”);
sites.add(“Zhihu”);
sites.add(“Runoob”); // Duplicate elements will not be added
sites.remove(“Taobao”); // Delete the element. If the deletion is successful, it will return true, otherwise it will be false.
System.out.println(sites);
}
}

Executing the above code, the output results are as follows:

[Google, Runoob, Zhihu]

To delete all elements in the collection, use the clear method:

//Introduce HashSet class
import java.util.HashSet;

public class RunoobTest {
public static void main(String[] args) {
HashSet sites = new HashSet();
sites.add(“Google”);
sites.add(“Runoob”);
sites.add(“Taobao”);
sites.add(“Zhihu”);
sites.add(“Runoob”); // Duplicate elements will not be added
sites.clear();
System.out.println(sites);
}
}

Executing the above code, the output results are as follows:

[]

4. Calculate size

If you want to count the number of elements in a HashSet, you can use the size() method:

//Introduce HashSet class
import java.util.HashSet;

public class RunoobTest {
public static void main(String[] args) {
HashSet sites = new HashSet();
sites.add(“Google”);
sites.add(“Runoob”);
sites.add(“Taobao”);
sites.add(“Zhihu”);
sites.add(“Runoob”); // Duplicate elements will not be added
System.out.println(sites.size());
}
}

Executing the above code, the output results are as follows:

4

5. Iterate HashSet

You can use for-each to iterate over the elements in a HashSet.

//Introduce HashSet class
import java.util.HashSet;

public class RunoobTest {
public static void main(String[] args) {
HashSet sites = new HashSet();
sites.add(“Google”);
sites.add(“Runoob”);
sites.add(“Taobao”);
sites.add(“Zhihu”);
sites.add(“Runoob”); // Duplicate elements will not be added
for (String i : sites) {
System.out.println(i);
}
}
}

Executing the above code, the output results are as follows:

Google
Runoob
Zhihu
Taobao

6. Code implementation (expanded version)

public class SlowHashSet {
    private static final int DEFAULT_CAPACITY = 7; //Default capacity
    private static final double DEFAULT_LOAD_FACTOR = 0.75; //Default load factor
    //Define an array elements
    private LinkedList[] elements;//First it is an array, each element in the array is a collection
    private int capacity; // The capacity of the collection
    private double loadFactor; // load factor
    private int size; // size of collection


    // No parameter constructor, default capacity and load factor are used by default
    public SlowHashSet() {
        this(DEFAULT_CAPACITY, DEFAULT_LOAD_FACTOR);
    }

    // Constructor with parameters allows specifying initial capacity initialCapacity and load factor loadFactor
    public SlowHashSet(int initialCapacity, double loadFactor) {
        this.capacity = initialCapacity;
        this.loadFactor = loadFactor;
        this.elements = new LinkedList[initialCapacity];//Given initial capacity
        this.size = 0;
    }

    //add method (consider expansion factors after changes)
    public boolean add(Object element) {
        /*First determine whether expansion is required.
         * This condition checks whether the size of the current collection (size)
         * Exceeds the product of loadFactor and capacity. */
        //size represents the current number of elements (i.e. size) of the collection
        //capacity represents the capacity of the collection, that is, the upper limit of the number of elements that the collection can store.
        //loadFactor means that when the current number of elements in the collection reaches the value of the capacity multiplied by the load factor, the expansion operation will be triggered.
        //size + 1 means adding another element, the size of the collection will increase by 1. When it is greater than loadFactor, it is time to expand
        if (size + 1 > capacity * loadFactor) {
            expansion(); // If the number of elements exceeds the load factor threshold, perform expansion operations
        }
        //1.Element: Calculate the hash code of the new element to know its category and which bucket to put it in.
        int hashCode = element.hashCode();
        //2. Calculate the bucket position based on the hash code, which is the subscript
        int index = Math.abs(hashCode % capacity);
        //3. Determine whether the bucket position is empty. If it is empty, put it directly.
        if (elements[index] == null) {
            elements[index] = new LinkedList();//Instance collection
            elements[index].add(element);//Add elements to the collection and then put the collection in the bucket
            size + + ;
            return true;
        }
        //If the process reaches this point, it means that there is already an element in the bucket where the current new element hashed.
        //Then you need to compare the new elements one by one with the old elements that already exist in the bucket.
        //Only new elements that are not equal to all old elements are added to the set
        int i;
        for (i = 0; i < elements[index].size(); i + + ) {
            //1.elements is an array. The subscript can extract elements. The elements are sets. i determines which element to extract.
            Object oldElement = elements[index].get(i);
            //2. If equal, end. Avoid pitfalls. The old element and the new element are compared with the content rather than the address
            //Whenever hashcode is rewritten, equals also needs to be rewritten in this class, otherwise the equals of the parent class object is used, which is the address.
            if (element.equals(oldElement)) {
                break;

            }
            //The process reaches this point to indicate that new elements can be placed in the collection
            if (i == elements[index].size()) {
                elements[index].add(element);
                size + + ;
                return true;
            }

        }
        return false;
    }

    public int size() {
        return size;
    }

    // Expansion operation
    private void expansion() {
        int newCapacity = capacity * 2; // Create a new capacity twice the current capacity
        LinkedList<Object>[] newElements = new LinkedList[newCapacity]; // Create a new element array
        int newSize = 0; // new size variable

        //Traverse elements and move the old bucket to the new bucket
        for (LinkedList<Object> bucket : elements) {//Outer loop, traverse each bucket in the current elements array
            if (bucket != null) {//If there are elements in the bucket
                for (Object element : bucket) {//Inner loop, take out each element in the bucket
                    int index = element.hashCode() % newCapacity; // Calculate new position
                    if (newElements[index] == null) {//If there is no bucket in the new location
                        newElements[index] = new LinkedList<>();//, create a new LinkedList bucket and put it in the new array
                    }
                    LinkedList<Object> newBucket = newElements[index]; //Get the new bucket position based on the hash code LinkedList It’s time to put the element
                    if (!newBucket.contains(element)) {//If the new bucket does not exist, add it directly
                        newBucket.add(element); //Add elements to the new bucket
                        newSize + + ; // update new size
                    }
                }
            }
        }

        capacity = newCapacity; // update capacity
        elements = newElements; // Update element array
        size = newSize; // Update size to calculate the size of the new collection
    }


    //encapsulation subscript
    private int getIndex(Object element) {
        int hashCode = element.hashCode();
        return Math.abs(hashCode % elements.length); // Use absolute values to ensure that the index is within the legal range
    }

    //does it exist
    public boolean contains(Object element) {
        int index = getIndex(element);
        return elements[index].contains(element);
    }

    //delete
    public void remove(Object element) {
        int index = getIndex(element);
        if (elements[index] == null) {
            System.out.println("Nothing to delete");
        }
        elements[index].remove(element);
        size--;
    }

   


    public static void main(String[] args) {

        // Expansion test
        Scanner scanner = new Scanner(System.in);
        System.out.print("Do you want to use the default capacity and default load factor (y/n)? ");
        String choice = scanner.nextLine();

        SlowHashSet hashSet;
        if (choice.equalsIgnoreCase("y")) {
            hashSet = new SlowHashSet();
        } else if (choice.equalsIgnoreCase("n")) {
            System.out.print("Please enter custom capacity: ");
            int customCapacity = scanner.nextInt();
            System.out.print("Please enter custom load factor: ");
            double customLoadFactor = scanner.nextDouble();
            hashSet = new SlowHashSet(customCapacity, customLoadFactor);
        } else {
            System.out.println("Invalid selection. Use default capacity and default load factor.");
            hashSet = new SlowHashSet();
        }

        for (int i = 1; i < 15; i + + ) {
            hashSet.add(i);
            String message = "Add " + i + " elements to the set. There are: " + hashSet.size() + " elements in the set. The capacity of the array is: " + hashSet.capacity;
            if (hashSet.size() + 1 > hashSet.capacity * hashSet.loadFactor) {
                message + = "There are "↓↓↓↓↓↓↓↓↓↓" in the set + hashSet.capacity + "*" + hashSet.loadFactor + "=" + (int) (hashSet.capacity * hashSet.loadFactor) + " The element has reached the expansion conditions, so I doubled the expansion.";
            }
            System.out.println(message);
        }

        scanner.close();
    }

}


Thanks for learning, if it is useful, please like + follow it~