[Algorithm and Data Structure] Code implementation of merge sort (detailed illustration) and explanation of master formula

Table of Contents

1. Merge sort

1.1. Algorithm description

1.2. Illustrations

2. Code implementation

3. master formula

3.1, formulas and conclusions

3.2. Suitable for certain special recursions

3.3. Calculate the time complexity of merge sort

1. Merge sort

Merge sort is an effective sorting algorithm based on merge operations. This algorithm is a very typical application using recursion or Divide and Conquer. Merge the already ordered subsequences to obtain a completely ordered sequence; that is, first make each subsequence orderly, and then make the subsequence segments orderly.

If two ordered lists are merged into one ordered list, it is called a two-way merge.

1.1. Algorithm description

Divide the input sequence of length n into two subsequences of length n/2;

Use merge sorting on these two subsequences;

Merge two sorted subsequences into a final sorted sequence.

Merging two ordered sequences into one ordered sequence is called “merging”, which is where the name merge sort comes from.

1.2. Illustration

In a nutshell: To sort the range from L to R, you can first find the midpoint M from L to R. First sort the data on the left, then sort the data on the right, and then integrate the two ordered subsequences into a new ordered sequence.

For example:

Merge sort an array [8,3,6,4,2,1,5,7].

Step 1: Divide the input sequence of length n into two subsequences of length n/2, and then divide the new subsequence into two subsequences whose length is half of itself, that is, n/4. sequence, and so on. When only one number is left in a single subsequence, a number is naturally ordered, that is, both the left and right sides are sorted at this time.

Step 2: Merge the two sorted subsequences into a new sorted sequence. First, there is a pointer in each subsequence pointing to the first element of the subsequence. The elements of the two pointers are compared in pairs. The smaller element is first put into the new subsequence, and then the pointer is moved and the comparison is continued until all are placed. into a new subsequence, that is, a subsequence merger is completed. Slowly merge and finally make all elements in order, that is, merge sort is completed.

This idea process is very essential. After understanding this idea, you can try to implement it with code.

2. Code implementation

To use merge, you first need to know the array arr and the leftmost L subscript and rightmost R subscript of the array, so you need to find out and bring them into MergeSort.

int main()
{
int arr[10] = { 8,3,6,4,2,1,5,7 };
int sz = sizeof(arr) / sizeof(arr[0]);
MergeSort(arr, 0, sz - 1);
int i = 0;
for ( i = 0; i < sz; i + + )
{
printf("%d ", arr[i]);
}
return 0;
}

Next, let’s look at the implementation of MergeSort.

The first is to determine whether L is equal to R. If L and R are equal, it is equivalent to that there is only one element in a single subsequence. At this time, the subsequence is ordered, so it can be returned directly without any operation, that is, if ( L == R) return;

If L is not equal to R, it is considered that the subsequence can still be split, so the mid value is found, and two recursions are performed to order the two recursive ranges. The recursive range is L to mid, mid + 1 to R .

Finally, when the recursion ends, it means that the sequences from L to mid and mid + 1 to R are ordered, but the whole sequence is still unordered. At this time, you need to use ExternalSort external sorting to integrate these two subsequences into a new ordered sequence.

void MergeSort(int* arr, int L, int R)
{
if (L == R) //The subsequence has only one number, and the default is ordered
{
return;
}
int mid = L + (R - L) / 2;
MergeSort(arr, L, mid);
MergeSort(arr, mid + 1, R);
ExternalSort(arr, L, mid, R);
}

The function of ExternalSort is to merge L to M and M + 1 to R in arr into a new ordered sequence, and store the determined result sequence into the area pointed to by the help pointer first, wait for all merging to be completed, and then HelpThe data in the entire area is copied to the location corresponding to arr.

The rules for saving help are:

1. If neither p1 nor p2 crosses the boundary, perform a while loop:

Compare p1 and p2. If p1 is greater than p2, put the element pointed to by p2 into help, then move p2 right to point to the next one, and continue the next round of comparison.

Compare p1 and p2. If p1 is less than or equal to p2, put the element pointed to by p1 into help, then move p1 right to point to the next one, and continue the next round of comparison.

2. If one party crosses the boundary, exit the loop and determine which of p1 and p2 has remaining elements that have not been queued into help. If there are, they will be queued directly into help.

void ExternalSort(int* arr, int L, int M, int R)
{
int* help = (int*)malloc(sizeof(int) * (R - L + 1)); //Auxiliary space, used to store sorted data, the space size is R-L + 1.
if (help == NULL)
{
perror("ExternalSort->malloc");
return;
}
int helpSz = R - L + 1;
int i = 0;
int p1 = L;
int p2 = M + 1;
while (p1 <= M & amp; & amp; p2 <= R)
{
        //Determine whether p1 is less than or equal to p2. If so, put the value pointed to by p1 into the help array and then advance the two pointers by one, and vice versa for p2.
help[i + + ] = arr[p1] <= arr[p2] ? arr[p1 + + ] : arr[p2 + + ];
}
while (p1 <= M) //If p1 has not crossed the boundary, copy all remaining elements to after help
{
help[i + + ] = arr[p1 + + ];
}
while (p2 <= R) //If p2 has not crossed the boundary, copy all remaining elements to after help
{
help[i + + ] = arr[p2 + + ];
}
for ( i = 0; i < helpSz; i + + )
{
arr[L + i] = help[i]; //Copy the merged data back to the corresponding position of the original array arr
}
free(help);
help = NULL;
}

At this point, the code implementation part of merge sort is over. Generally speaking, because recursion is used, the amount of code is not much, but the most difficult thing is to understand the idea of merge sort. You need to understand the operation steps of merge sort and ideas.

3. master formula

So after completing the merge sort, I want to know the time complexity of this sort. How should I calculate it? Some people say that of course you can just search it on Baidu to find out. What I want to say is that there is indeed no problem with this, but adhering to the concept of “teaching a man to fish is worse than teaching a man to fish”, I want to take you to a deeper understanding and let you learn to do it on your own Compute the time complexityof recursion.

The formula used for calculation is to use themaster formula:When calculating algorithms involving recursion, the calculation complexity becomes a bit troublesome. The master formula is used to analyze recursive behavior and estimate the time complexity of recursive behavior.

3.1, formulas and conclusions

master formula:T(N) = a*T(N/b) + O(N^d)
$T(N) = a * T(\frac{N}{b}) + O(N^{d})$
Formula explanation: N represents the size of the parent problem. N/b represents the size of the sub-problem. The size of the sub-problems must be the same, that is, they are all N/b. a represents the number of recursions, which is how many times the sub-problem is called in the parent problem. O(N^d) represents the complexity of other operations except the recursive call operation.
Conclusion (the proof is too complicated, just remember the conclusion):

When a, b, and d in the formula meet d
When a, b, and d in the formula conform to d=logb a, the time complexity is O((N^d)*logN)
When a, b, and d in the formula satisfy d>logb a, the time complexity is O(N^d)

Note: The master formula is suitable for some special recursions, that is, the size of the sub-problem must be divided equally. No matter how many parts you divide it into, even if the divided areas overlap, as long as the area sizes are the same, it can be used.

[Example]

The following is a code that uses recursion to find the maximum value in an array. Here, the bisection method is used to quickly find the maximum value. The division size of the sub-problems is the same, so the master can be used FormulaCalculate time complexity.

First of all, you can see that process itself calls process twice, and there are two sub-problem calls in the parent problem, that is, a=2.

Then, since it is a dichotomy, the size of each sub-problem is half of the size of the parent problem, that is, N/2.

Then look at other operations. All other operations are only executed once, that is, the time complexity is O(1).

Finally: the master formula of this recursion isT(N) = 2 * T(N/2) + O(1).

That is, a = 2, b = 2, d = 0, substitute it into the conclusion formula to get d

int process(int* arr, int L, int R) { if (L == R) return arr[L]; int mid = L + (R - L) / 2; int leftMAX = process(arr, L, mid); //left half int rightMAX = process(arr, mid + 1, R); //right half return leftMAX > rightMAX ? leftMAX : rightMAX; } int main() { int arr[] = { 8,3,6,4,2,1,5,7 }; int sz = sizeof(arr) / sizeof(arr[0]); int max = process(arr, 0, sz - 1); printf("%d\\ ", max); return 0; }

3.2, applicable to some special recursions

As mentioned above: the master formula is suitable for some special recursions, that is, the size of the sub-problem must be divided equally. No matter how many parts you divide it into, even if the divided areas overlap, as long as the area sizes are the same, it can be used.

However, some people may still misunderstand the meaning. Let’s use illustrations to explain it to everyone.

[Illustration]

First of all, Equally divided area is the easiest to understand, which is to divide N into several equal parts. Two parts are N/2, and three parts are N/3.

The scale of the sub-problem is two-thirds on the left and two-thirds on the right. Does this comply with the master formula?

The answer is consistent, because here we only focus on whether the area sizes are consistent and do not care about whether the areas overlap.

The scale of the subproblems is different and does not conform to the master formula.

3.3. Calculate the time complexity of merge sort

After learning the master formula, let’s calculate the time complexity of Merge Sort.

void MergeSort(int* arr, int L, int R) { if (L == R) //The subsequence has only one number, and the default is ordered { return; } int mid = L + (R - L) / 2; MergeSort(arr, L, mid); MergeSort(arr, mid + 1, R); ExternalSort(arr, L, mid, R); }

Assume that the data volume of the entire process is of N scale, and both sub-problems are T(N/2), so it is 2*T(N/2).

So now let’s observe other statements except the sub-problem: the if statement has a time complexity of O(1), and the two pointers of the ExternalSort function only go forward to the right and traverse all the statements without going back. The data is processed once, and because the amount of data is N, the time complexity of the ExternalSort function is O(N).

That is: T(N) = 2 * T(N/2) + O(N) where a = 2, b = 2, d = 1

Substitute a, b, and d into the conclusion formula to get d=logb a, and the time complexity is O((N^d)*logN) = O(N*logN).

If you think the author’s writing is good, please give the blogger a big thumbs up to support me. Your support is my biggest motivation for updating!

If you think the author’s writing is good, please give the blogger a big thumbs up to support me. Your support is my biggest motivation for updating!

If you think the author’s writing is good, please give the blogger a big thumbs up to support me. Your support is my biggest motivation for updating!

The knowledge points of the article match the official knowledge files, and you can further learn related knowledge. Algorithm skill tree Home page Overview 55592 people are learning the system