hashmap sort hashmap sort according to value_How to sort millions of orders according to amount

Bucket Sort

As the name suggests, “buckets” are used. The core idea is to divide the data to be sorted into several ordered buckets. The data in each bucket is sorted separately. After all the data in the buckets are sorted, the data in the buckets are sorted. are taken out in sequence, and the sequence formed is in order.

In order to make bucket sorting efficient, we need to do the following two things:

  1. Try to increase the number of buckets as additional space becomes available.
  2. The mapping function used can evenly distribute the input n data into k buckets.

At the same time, for the sorting of elements in the bucket, which sorting algorithm is chosen has an important impact on performance.

The basic idea of bucket sorting is: divide the array arr into n subranges (buckets) of the same size, sort each subrange separately, and finally merge.

bdd958e46f6792552d1ab83af679f819.png

Why is the time complexity said to be O(n)? Let’s find out.

If there are n pieces of data to be sorted, we need to use function mapping to evenly distribute them into k buckets, and the number of elements in each bucket is y = n / k.

Then, quick sort is used inside each bucket, and the time complexity is O(ylogy). The time complexity of sorting k buckets is O(kylogy), because y = n /k. So the time complexity of the entire bucket sort is O(nlog(n / k)). When the number of buckets k is close to the number of data n, log(n / k) is a small constant, and the time complexity of bucket sorting is close to O(n).

It seems so good, can it replace the O(nlogn) complexity sorting algorithm introduced by the code brother?

Unfortunately, the answer is no. The tracks that sports cars can run on are special and cannot replace family cars. In fact, its application scenarios are very harsh.

  1. The data to be sorted can be easily divided into k buckets evenly, and there is a natural order of size between buckets. In this way, there is no need to sort the data in each bucket after it is sorted.
  2. The data is evenly distributed between each bucket. If some buckets have many buckets and some have few. The time complexity of sorting within the bucket is not constant. In extreme cases, the data is divided into one bucket, which degenerates into the time complexity of O(nlogn).

Applicable scenarios

More suitable for use in external sorting. The so-called external sorting means that the data is stored in an external disk. The amount of data is relatively large and the memory is limited, so it cannot be loaded into the memory at once.

For example, we have 10GB of order data, and we want to sort by order amount (assuming the amounts are all positive integers), but our memory is limited, only a few hundred MB, and there is no way to load all 10GB of data into the memory at one time. What should we do at this time?

Solution ideas

The same is true for sorting 10G order data according to the order amount. The minimum order amount is 1 yuan and the maximum is 100,000 yuan. We divide all orders into 100 buckets according to the amount. In the first bucket, we store orders with an amount between 1 yuan and 1,000 yuan, in the second bucket, we store orders with an amount between 1,001 yuan and 2,000 yuan, and so on. . Each bucket corresponds to a file and is numbered and named according to the size of the amount range (00, 01, 02…99).

Ideally, if the order amount is evenly distributed between 1 and 100,000, the order will be evenly divided into 100 files, and each small file stores about 100MB of order data. We can divide these 100 small files into The files are placed in memory one by one and sorted using quick sort. After all the files are sorted, we only need to read the order data in each small file from small to large according to the file number, and write it to a file. Then what is stored in this file is the amount according to the amount. Order data sorted from small to large.

Code Practice

/** * Bucket sorting: Divide the array arr into n subranges (buckets) of the same size, sort each subrange separately, and finally merge it */ public class BucketSort implements LineSort { private static final QuickSort quickSort = new QuickSort( ); @Override public int[] sort(int[] sourceArray, int bucketSize) { // Find the maximum and minimum values int minValue = sourceArray[0]; int maxValue = sourceArray[1]; for (int value : sourceArray) { minValue = Math.min(minValue, value); maxValue = Math.max(maxValue, value); } // Number of buckets int bucketCount = (maxValue - minValue) / bucketSize + 1; int[][] buckets = new int [bucketCount][bucketSize]; // Save the element index of the array of each bucket, the default value is 0 int[] indexArr = new int[bucketCount]; // Distribute the values in the array to each bucket for (int value: sourceArray) { int bucketIndex = (value - minValue) / bucketSize; // The array of the current bucket has reached the maximum value and needs to be expanded if (indexArr[bucketIndex] == buckets[bucketIndex].length) { ensureCapacity(buckets, bucketIndex); } // Put the data into the bucket, and the array subscript corresponding to the bucket + 1 buckets[bucketIndex][indexArr[bucketIndex] + + ] = value; } // Sort each bucket, using quick sort int here k = 0; for (int i = 0; i < buckets.length; i + + ) { if (indexArr[i] == 0) { continue; } // The default capacity is bucketSize, which must be sorted according to the actual bucket capacity , otherwise the default value of bucketSize is 0 quickSort.quickSortInternal(buckets[i], 0, indexArr[i] - 1); for (int j = 0; j < indexArr[i]; j + + ) { sourceArray[k + + ] = buckets[i][j]; } } return sourceArray; } /** * Expand the array and save the data* * @param buckets * @param bucketIndex */ private void ensureCapacity(int[][] buckets, int bucketIndex) { int[] tempArr = buckets[bucketIndex]; int[] newArr = new int[tempArr.length * 2]; for (int j = 0; j < tempArr.length; j + + ) { newArr[j ] = tempArr[j]; } buckets[bucketIndex] = newArr; }}

Unit testing

Generate one million data, data range [1, 100000]

@DisplayName("Linear sorting algorithm test") public class LineSortTest { private static int length = 100; private int[] array = new int[length]; @BeforeEach public void beforeEach() { Random rand = new Random( ); for (int i = 0; i < length; i + + ) { // Randomly generate [1, 1000000] data array[i] = rand.nextInt(length) + 1; } } @DisplayName("Bucket Sort") @Test public void testBucketSort() { BucketSort bucketSort = new BucketSort(); // 100 data, 10 buckets int[] sort = bucketSort.sort(array, 10); System.out.println(Arrays.toString (sort)); }}

Summary

How to sort 1 million users by age? Have the thinking questions become very simple now? Let me tell you my solution.

In fact, sorting 1 million users by age is like sorting 500,000 test takers by grades. We assume that the age range is from a minimum of 1 year to a maximum of 120 years. We can iterate through these 1 million users, divide them into these 120 buckets according to their age, and then traverse the elements in these 120 buckets sequentially. This gives us data on 1 million users sorted by age.