Data structure-heap sort and its complexity calculation

Table of Contents

1. Heap sort

1.1 Adjust heap construction upward

1.2 Adjust heap construction downwards

2. Comparison of time complexity between two heap building methods

2.1 Adjust downward the time complexity of heap building

2.2 Adjust the time complexity of heap construction upward

Topk problem

In the previous section, we talked about Heap implementation, and also included Upward adjustment method and Downward adjustment method. Finally, we used heap to implement it. To sort the data:

int main()
{
HP hp;
HeapInit( & amp;hp);
int arr[] = { 65,100,70,32,50,60 };
int i = 0;
for (i = 0; i < sizeof(arr) / sizeof(int); i + + )
{
HeapPush( & amp;hp, arr[i]);
}
while (!HeapEmpty( & amp;hp))
{
HeapDatatype top = HeapTop( & amp;hp);
printf("%d ", top);
HeapPop( & amp;hp);
}
return 0;
}

Can the above code sort the data?

The answer is yes, but the above method has two drawbacks:

1. It is too troublesome to write a heap first.

2. Space complexity + copying data.

1.Heap sort

In the previous section, using a heap to sort data involves inserting the data into the heap one by one and then adjusting the sorting. Can we directly build the data into a heap?

Of course, there are two ways to build a heap: upward adjustment to build a heap and downward adjustment to build a heap.

1.1 Adjust heap construction upward

Let’s first talk about Upward adjustment of heap building:

Adjusting the heap upward is actually the logic of inserting the heap, which requires that the previous data must be a heap. The subscript starts from 1 because a piece of data itself can be regarded as a heap, and then adjust upward.

The picture below is the result after we upwardly adjust an array data to build a heap. It can be seen that at this time we are building a small heap:

Now the question comes, we need to sort the data in ascending order, is it better to build a large pile or a small pile?

Let’s talk about the conclusion first: Ascending order – build a big pile Descending order – build a small pile.

Suppose we want to get ascending order, and we build a small heap at this time, then we put the smallest selected data at the position with subscript 0. If we want to continue to select the next smallest data and place it at the position with subscript 1, The remaining data must be regarded as a heap. In this way, the relationship between the heap and the heap will be completely messed up, and the heap can only be rebuilt, which is too costly.

And if we build a large heap, adjust downward to select the largest data, swap it head to tail, put the largest data at the last subscript, then isolate the last data, treat other data as a heap, and then adjust downward to select Take the next largest one and swap it head to tail… until all the data is sorted. At this time, the data is in ascending order.

code show as below:

#define _CRT_SECURE_NO_WARNINGS 1
#include<stdio.h>

typedef int HeapDatatype;

swap(HeapDatatype* p1, HeapDatatype* p2)
{
HeapDatatype tmp = *p1;
*p1 = *p2;
*p2 = tmp;
}
//Upward adjustment method
void AdjustUp(HeapDatatype* a, int child)
{
int parent = (child - 1) / 2;
while (child > 0)
{
if (a[parent] < a[child])
{
HeapDatatype p = a[parent];
a[parent] = a[child];
a[child] = p;
child = parent;
parent = (child - 1) / 2;
}
else
{
break;
}
}
}
//Downward adjustment method
void AdjustDown(HeapDatatype* a, int n, int parent)
{
int child = parent * 2 + 1;
while (child < n)
{
if (child + 1 < n & amp; & amp; a[child] < a[child + 1])
{
child + + ;
}
if (a[parent] < a[child])
{
swap( & amp;a[parent], & amp;a[child]);
parent = child;
child = parent * 2 + 1;
}
else
{
break;
}
}
}
//Heap sort
void HeapSort(int* a, int n)
{
//Build the heap - adjust the heap upward
for (int i = 1; i < n; i + + )
{
AdjustUp(a, i);
}
//Adjust downward to get the next largest data
int end = n - 1;
while (end > 0)
{
swap( & amp;a[0], & amp;a[end]);
AdjustDown(a, end, 0);
end--;
}
}

int main()
{
int a[] = { 7,8,3,5,1,9,5,4};
HeapSort(a, sizeof(a) / sizeof(int));
return 0;
}

What we build is a big pile, and what we get in the end is ascending order:

To get the data in descending order, build a small heap, adjust downward to select the smallest data, swap the beginning and the end, put the smallest data at the last subscript, isolate the last data, treat other data as a heap, and then move downward Adjust and select the next smallest data, swap it head to tail… until all the data is sorted, which results in the descending order of the data.

The code is as follows: (Just change the ‘<' in downward adjustment and upward adjustment to '>‘)

#define _CRT_SECURE_NO_WARNINGS 1
#include<stdio.h>

typedef int HeapDatatype;

swap(HeapDatatype* p1, HeapDatatype* p2)
{
HeapDatatype tmp = *p1;
*p1 = *p2;
*p2 = tmp;
}
//Upward adjustment method
void AdjustUp(HeapDatatype* a, int child)
{
int parent = (child - 1) / 2;
while (child > 0)
{
if (a[parent] > a[child])
{
HeapDatatype p = a[parent];
a[parent] = a[child];
a[child] = p;
child = parent;
parent = (child - 1) / 2;
}
else
{
break;
}
}
}
//Downward adjustment method
void AdjustDown(HeapDatatype* a, int n, int parent)
{
int child = parent * 2 + 1;
while (child < n)
{
if (child + 1 < n & amp; & amp; a[child] > a[child + 1])
{
child + + ;
}
if (a[parent] > a[child])
{
swap( & amp;a[parent], & amp;a[child]);
parent = child;
child = parent * 2 + 1;
}
else
{
break;
}
}
}
//Heap sort
void HeapSort(int* a, int n)
{
//Build the heap - adjust the heap upward
for (int i = 1; i < n; i + + )
{
AdjustUp(a, i);
}
//Adjust downward to get the next smallest data
int end = n - 1;
while (end > 0)
{
swap( & amp;a[0], & amp;a[end]);
AdjustDown(a, end, 0);
end--;
}
}

int main()
{
int a[] = { 7,8,3,5,1,9,5,4};
HeapSort(a, sizeof(a) / sizeof(int));
return 0;
}

Since we are building a small heap, what we get is the data in descending order:

Note: Regardless of whether it is ascending or descending order, the data is placed from back to front, so as not to mess up the heap relationship.

1.2 Adjust heap construction downward

We can see that heap sort uses upward adjustment to build the heap, and two functions need to be written: Downward adjustment function and upward adjustment function.

So what if we want to use a downward adjustment function to solve the problem?

This requires downward adjustment of the heap building:

Downward adjustment to build a heap requires that the left and right subtrees of the root node are both large heaps (small heaps). If the left and right subtrees do not satisfy the large heap, we only need to ensure that the left and right subtrees of the left and right subtrees are large heaps. A heap (small heap) is enough. If not, we will search further down, so as long as the left and right subtrees of all parent nodes are large heaps (small heaps), then we will adjust it backwards, because the leaf node itself is It is a heap, so there is no need to adjust, then start adjusting from the parent node of the last node.

code show as below:

#define _CRT_SECURE_NO_WARNINGS 1
#include<stdio.h>

typedef int HeapDatatype;

swap(HeapDatatype* p1, HeapDatatype* p2)
{
HeapDatatype tmp = *p1;
*p1 = *p2;
*p2 = tmp;
}

//Downward adjustment method
void AdjustDown(HeapDatatype* a, int n, int parent)
{
int child = parent * 2 + 1;
while (child < n)
{
if (child + 1 < n & amp; & amp; a[child] > a[child + 1])
{
child + + ;
}
if (a[parent] > a[child])
{
swap( & amp;a[parent], & amp;a[child]);
parent = child;
child = parent * 2 + 1;
}
else
{
break;
}
}
}
//Heap sort
void HeapSort(int* a, int n)
{
//Build the heap - adjust the heap downward
for (int i = (n - 1 - 1) / 2; i >= 0; i--)
{
AdjustDown(a, n, i);
}
int end = n - 1;
while (end > 0)
{
swap( & amp;a[0], & amp;a[end]);
AdjustDown(a, end, 0);
end--;
}
}

int main()
{
int a[] = { 7,8,3,5,1,9,5,4};
HeapSort(a, sizeof(a) / sizeof(int));
return 0;
}

In the code, int i=(n-1-1)/2 is the subscript to find the father through the child. n is the size of the array. First subtract one to get the last subscript, then subtract one and divide by two to get the parent node of the last child. .

This is to adjust the heap construction downward. In the future, we will use the heap construction to adjust downward and no longer use the heap construction to adjust upward. There is a gap between the two methods not only in the amount of code, but also in the time complexity. Adjusting the heap downward is also a gap. The time complexity of heap is smaller.

2. Comparison of time complexity of two methods of building a heap

2.1 Downwardly adjust the time complexity of heap construction

We know from the previous article that when adjusting downward to build a heap, we must ensure that the left and right subtrees of each parent node are large heaps (small heaps), so we adjust from bottom to top, and each of the last layer Each leaf node itself can be regarded as a heap. There is no need to adjust. The adjustment starts from their parent node (that is, the penultimate layer starts to adjust), so the time complexity is as follows:

Total number of steps = ∑ (number of nodes in each layer * number of layers that need to be adjusted for this node)

2.2 Upward adjustment of time complexity of heap construction

Upward adjustment and downward adjustment are just the opposite. When adjusting downward, the 2^(h-2) nodes in row h-1 need to be adjusted downward by 1 layer, while when adjusting upward, the 2^(h-2) nodes in row h-1 need to be adjusted downward. h-1) nodes need to be adjusted upward by h-2. The downward adjustment is large multiplication small and small multiplication large, while the upward adjustment is large multiplication large and small multiplication small. The time complexity is as follows:

The above is the time complexity of adjusting heap construction upwards and adjusting heap construction downwards. So what is the time complexity of our entire heap sorting process?

During the heap sorting process, in addition to building the heap, there is also a downward adjustment of the number of selections. When selecting a number, the head and tail must be swapped, once, and adjusted downwards once. Therefore, the 2^(h-1) nodes in row h, each time The head-to-tail exchange has to be adjusted (h-1) times, a total of 2^(h-1)*(h-1). It can be seen that the time complexity of the data selection process and the time complexity of upward adjustment to build the heap remain the same. Consistent, that is, O(N*logN)

So the overall time complexity of heap sorting is: Building heap + number of selections = O(N + N*logN), that is, O(N*logN).

Topk question

TOP-K problem: It is to find the top K largest elements or smallest elements in data combination. Generally, the amount of data is relatively large.
For example: top 10 professionals, Fortune 500, rich list, top 100 active players in the game, etc.
For the Top-K problem, the simplest and most direct way that can be thought of is sorting. However, if the amount of data is very large, sorting is not advisable (the data may not all be loaded into the memory at once). The best way is to use a heap to solve the problem. The basic idea is as follows:

1. Build a heap using the first K elements in the data set
For the first k largest elements, build a small heap
For the first k smallest elements, build a big pile
2. Use the remaining N-K elements to compare with the top element of the heap in sequence. If not satisfied, replace the top element of the heap
After comparing the remaining N-K elements with the top element of the heap in turn, the remaining K elements in the heap are the first K smallest or largest elements required.

For example: if we want to find the first K smallest numbers among 10,000 numbers, we first build a small heap of the first K numbers, and then compare the remaining N-K elements with the top element of the heap in sequence. If not satisfied, replace the heap. top element

code show as below:

#define _CRT_SECURE_NO_WARNINGS 1
#include<stdio.h>
#include<time.h>
#include<stdlib.h>

//Top-K question
typedef int HeapDatatype;

swap(HeapDatatype* p1, HeapDatatype* p2)
{
HeapDatatype tmp = *p1;
*p1 = *p2;
*p2 = tmp;
}
//Downward adjustment method
void AdjustDown(HeapDatatype* a, int n, int parent)
{
int child = parent * 2 + 1;
while (child < n)
{
if (child + 1 < n & amp; & amp; a[child] > a[child + 1])
{
child + + ;
}
if (a[parent] > a[child])
{
swap( & amp;a[parent], & amp;a[child]);
parent = child;
child = parent * 2 + 1;
}
else
{
break;
}
}
}
void CreateNDate()
{
//Create data
int n = 10000;
srand(time(0));
const char* file = "data.txt";
FILE* fin = fopen(file, "w");
if (fin == NULL)
{
perror("fopen error");
return;
}

for (size_t i = 0; i < n; + + i)
{
int x = rand() % 1000000;
fprintf(fin, "%d\
", x);
}

fclose(fin);
}

void PrintTopK(int k)
{
const char* file = "data.txt";
FILE* fout = fopen(file, "r");
if (fout == NULL)
{
perror("fopen error");
return;
}

int* kminheap = (int*)malloc(sizeof(int) * k);
if (kminheap == NULL)
{
perror("malloc error");
return;
}

for (int i = 0; i < k; i + + )
{
fscanf(fout, "%d", & amp;kminheap[i]);
}

// Build a small pile
for (int i = (k - 1 - 1) / 2; i >= 0; i--)
{
AdjustDown(kminheap, k, i);
}

int val = 0;
while (!feof(fout))
{
fscanf(fout, "%d", & amp;val);
if (val > kminheap[0])
{
kminheap[0] = val;
AdjustDown(kminheap, k, 0);
}
}

for (int i = 0; i < k; i + + )
{
printf("%d ", kminheap[i]);
}
printf("\
");
}


int main()
{
CreateNDate();
PrintTopK(5);
return 0;
}

We have finished learning all about heap sorting. In the next section, we will continue to talk about preorder, midorder, postorder and level order of binary trees.

To be continued. . .