Data structure heap and heap sorting (sorting method with extremely superior time complexity)

1. The concept and structure of the heap

1.1 The concept of heap

If there is a set of key codes K = {k0, k1, k2, …, kn-1}, store all its elements in a one-dimensional array in the order of a complete binary tree, and satisfy: Ki = K2i + 1 and Ki >= K2i + 2) i = 0, 1, 2…, it is called a small heap (or a large heap). The heap with the largest root node is called the largest heap or large root heap, and the heap with the smallest root node is called the smallest heap or small root heap.

1.2 The nature of the heap

(1) The value of a node in the heap is always not greater than or not less than the value of its parent node;

(2) The heap is always a complete binary tree.

Big root heap instance

56 78

32 28 65 49

………

Small root heap instance

29 31

34 48 35 69

………

1.3 The bottom layer of the heap

logical structure

physical structure

2. Upward adjustment algorithm

When pushing_back data, since the bottom layer of the heap is an array, it only needs the bottom layer array of the heap + + size to realize it, but is it enough to just insert the data? Obviously not, we have to ensure that after inserting the data at the end, the nature of the small heap (large heap) remains unchanged, that is, after the small heap (large heap) inserts data, it is still a small heap (large heap). So it is necessary to compare the inserted data with its parent, and find a suitable position for him by adjusting the algorithm upwards.

To give a simple example, the diagram is as follows:

We understand the idea of this algorithm, how to implement the specific algorithm code?

We use small heaps as examples below, and the same is true for large heaps.

void AdjustUp(HPDataType* a,int child)
{
    int parent=(child-1)/2;
    //while(parent>=0)--This cannot be used to judge, because when the parent is 0, 0/2=0, falling into an infinite loop
    while(child>0)
    {
        if(a[child]<a[parent])
        {
            Swap( &a[child], &a[parent]);
            child=parent;
            parent=(child-1)/2;
        }
        else
        {
            break;
        }
    }
}

With the upward adjustment algorithm, we can easily insert a data at the end of the heap.

The code implementation is as follows:

void HeapPush(HP* php, HPDataType x)
{
assert(php);
if (php->size == php->capacity)
{
int newCapacity = php->capacity == 0 ? 4 : php->capacity * 2;
HPDataType* tmp = (HPDataType*)realloc(php->a, newCapacity * sizeof(HPDataType));
if (tmp == NULL)
{
perror("realloc fail");
exit(-1);
}

php->a = tmp;
php->capacity = newCapacity;
}

php->a[php->size] = x;
php->size++;

AdjustUp(php->a, php->size - 1);
}

Adjust the time complexity of the algorithm upwards:

O(log2N)

3. Adjust the algorithm downward

When we want to pop the data at the top of the heap, since the top of the heap is the smallest number, but the two numbers in the child position have no fixed size relationship, that is to say, the left child and the right child of the top element of the heap are not necessarily the same Who is big and who is small, then after we delete the top element of the heap, how should we adjust the data position to maintain the heap shape?

First, let’s look at the downward adjustment algorithm.

(Similarly we will use the small heap as an example)

If the top element of the heap is a larger element, and the left subtree and right subtree of the top element are both a small heap, then we can maintain the shape of the small heap by adjusting the top element downward.

The diagram is as follows:

After understanding the principle, let’s learn how to implement it through code?

The code implementation is as follows:

void AdjustDown(HPDataType* a,int n,int parent)
{
int minChild = parent * 2 + 1;
//Assume the left child is the smaller child
while (minChild < n)
{
//find the younger child
if (minChild + 1 < minChild)
{
minChild++;
// If the right child is smaller than the left child, the right child is the smaller child
}

if (a[minChild] > a[parent])
{
Swap( &a[minChild], &a[parent]);
parent = minChild;
minChild = parent * 2 + 1;
}
else
{
break;
}
}
}

Note: The premise of using downward adjustment is that the left subtree and right subtree of the parent element have the same heap shape (that is, both are small heaps or both are large heaps).

Next, let’s implement the deletion of the top element of the heap.

We take a unique approach:

(1) We exchange the top element of the heap with the last element in the heap (in this way we can use downward adjustment, because the left subtree and right subtree of the top element are still small heaps)

(2) Heap array –size (the purpose is to delete the last element (this is the original top element of the heap))

(3) Use the downward adjustment algorithm on the top elements of the heap until the heap shape is established

The specific code implementation is as follows:

void HeapPop(HP* php)
{
assert(php);
assert(!HeapEmpty(php));
Swap( &php->a[0], &php->a[php->size - 1]);
php->size--;

AdjustDown(php->a, php->size, 0);
}

Downward adjustment time complexity:

O(log2N)

4. Use downward adjustment method to build heap

Thoughts: (Take building a small heap with the array {15,1,19,25,8,34,65,4,27,7} as an example)

Because the use of the downward adjustment method is premised (the left subtree and the right subtree have the same heap shape), so we consider starting from the back-end elements of the array. Assuming that all array elements are arranged in a complete binary tree in the order given, find the node of the last non-leaf node (that is, the parent of the last leaf node, readers of this conclusion can try to draw a picture by themselves), and start to adjust downwards (Because the downward adjustment condition is met), until the adjustment reaches the root, the heap is completed.

code show as below:

for (int i = (n - 1 - 1) / 2; i >= 0; --i)
{
//n-1 is the subscript of the last number in the array, (subscript-1)/2 gets the subscript of the last non-leaf node
AdjustDown(a, n, i);
}

5. Use the upward adjustment method to build a heap

Thoughts: (Take building a small heap with the array {15,1,19,25,8,34,65,4,27,7} as an example)

Insert the first element 15, take 15 as the top element of the heap, insert the second element 1, because 1<15, the small heap shape cannot be maintained after inserting 1, use the upward adjustment algorithm for 1, and so on until all the elements in the array If both are put into the heap and the small heap is maintained, the small heap is established.

Code:

for (int i = 1; i < n; + + i)
{
AdjustUp(a, i);
}

6. Comparison of heap building time complexity of two algorithms

6.1 Use downward adjustment method to build heap time complexity:

Suppose the height of the tree is h

The number of adjustments = the number of nodes in each layer * the worst downward adjustment times of nodes in this layer

T(N)=2^0*(h-1) + 2^1*(h-2) + ….2^(h-3)*2 + 2^(h-2)*1

Using the misplaced subtraction method to find

T(N)=N-log2(N+1)

6.2 Use upward adjustment method to build heap time complexity:

Suppose the height of the tree is h

The number of adjustments = the number of nodes in each layer * the worst downward adjustment times of nodes in this layer

T(N)=2^1*1 + 2^2*2 + ….2^(h-2)*(h-2) + 2^(h-1)*(h-1)

If the result is calculated accurately, the reader can use the dislocation subtraction method

Here is a conclusion: a complete binary tree with a height of h and a number of nodes of N, then 2^(h-1)=N,h=log2(N + 1)

Let’s make a rough calculation here, that is, only consider the number of adjustments in the last layer:

2^(h-1)*(h-1)*2/2=2^h*(h-1)/2=(N+1)*(log2(N+1)-1)/2

Then the time complexity is roughly O(N*log2N)

It is obvious from the above that the downward adjustment method is better, so we choose the downward adjustment method to build the heap

7. Heap sort

Big idea: select and sort, select numbers in turn, from back to front

void HeapSort(int* a, int n)
{
for (int i = (n - 1 - 1) / 2; i >= 0; --i)
{
AdjustDown(a, n, i);
}

// select the number
int i = 1;
while (i < n)
{
Swap( &a[0], &a[n - 1]);
AdjustDown(a, n - i, 0);
+ + i;
}
}

8.TopK problem — select the number with the largest or smallest top K in a bunch of data

For the problem of the number of K before the election, we have two methods, let’s talk about the idea first

Step 1: Heap sort O(N*logN)

Step 2: Heap selection

(1) Build a lot? Build a large pile of N numbers, just select K times (Pop K times) The time complexity of this is O(N + log N*K)

(2) Build a small pile? Suppose N is large and K is small. For example: N=10 billion K=100, then (1) The time complexity of the method is too high to be practical.

We can consider using the first K numbers to build a small heap of K, and then traverse the subsequent N-K numbers in turn. If the data at the top of the heap is larger than the data at the top of the heap, replace the data at the top of the heap and adjust it downwards into the heap.

Then the data in the last heap is the largest top K.

void CreateDataFile(const char* filename, int N)
{
FILE* fin = fopen(filename, "w");
if (fin == NULL)
{
perror("fopen fail");
return;
}
srand(time(0));

for (int i = 0; i < N; + + i)
{
fprintf(fin, "%d\
", rand() 00000);
}

fclose(fin);
}

void PrintTopK(const char* filename, int k)
{
assert(filename);

FILE* fout = fopen(filename, "r");
if (fout == NULL)
{
perror("fopen fail");
return;
}

int* minHeap = (int*)malloc(sizeof(int)*k);
if (minHeap == NULL)
{
perror("malloc fail");
return;
}
// How to read the first K data
for (int i = 0; i < k; + + i)
{
fscanf(fout, "%d", &minHeap[i]);
}

// Build k small heaps
for (int j = (k - 2) / 2; j >= 0; --j)
{
AdjustDown(minHeap, k, j);
}

// N-K after continuing to read
int val = 0;
while (fscanf(fout, "%d", & amp;val) != EOF)
{
if (val > minHeap[0])
{
minHeap[0] = val;
AdjustDown(minHeap, k, 0);
}
}

for (int i = 0; i < k; + + i)
{
printf("%d ", minHeap[i]);
}

free(minHeap);
fclose(fout);
}