[Data Structure] Recursive and non-recursive implementation of merge sort

Merge sort

Preface
1. Recursive implementation of merge sort
- (1) The core idea of merge sorting
- Merge sort run legend
- (2) Core steps of merge sort implementation
- (3) Detailed explanation of merge sorting code source
- (4) Merge sort efficiency analysis
- - 1) Time complexity O(N*logN)
  - 2) Space complexity O(N)
  - Stability: Very stable
2. Non-recursive implementation of merge sort
- (1) Discussion on the shortcomings of recursion
(2) Merge sort non-recursive algorithm implementation ideas
(3) Detailed explanation of code source
(4) Operation results

Foreword

Quick sort: front order
Merge Sort: Postorder

1. Recursive implementation of merge sort

(1) The core idea of merge sort

Merge the ordered subsequences to obtain a completely ordered sequence; that is, first make each subsequence sorted, and then make the subsequence segments sorted. If two ordered lists are merged into one ordered list, it is called a two-way merge.

Merge sort run legend

(2) Core steps of merge sort implementation

Downward recursion Split in half .
Recursive return conditions: Recurse to the minimum, 1 is in order [Control the range and merge interval]
Recurse to the deepest point, and when it is the smallest, recurse back, start dividing the separated range in half, merge the ordered subsequences, and merge them in tmp.
Then copy the ordered sequence formed in tmp back to the original array [Because the next time you recurse back to the previous level and perform the next merge process, the data will be merged in tmp, and the data will be merged in tmp. The data in tmp is overwritten, so a small part of the merged and sorted subsequences must be copied back to the original array in time]
Then recursively return to the recursive merge of the previous level until the number of recursive levels is completed.

(3) Detailed explanation of merge sort code source

void _MergeSort(DataType* a, DataType* tmp, int begin, int end) {<!-- -->
if (begin>=end) {<!-- --> //Condition for recursive return: abnormal range or only 1 number left
return;
}
\t
int mid = (begin + end) / 2;

//Recurse to the minimum first
//[begin,mid][mid + 1,end]
_MergeSort(a, tmp, begin, mid); //The array starts from 0, so end=mid-1 is designed like this
_MergeSort(a, tmp, mid + 1, end);

//Merge again - so the merging process is designed after the recursion (post-order)
//Merge into tmp array and copy back
int begin1 = begin; int end1 = mid;
int begin2 = mid + 1; int end2 = end;
int index = begin; //Point to tmp,=begin is based on the position of the array to be compared and inserted. Find its corresponding position in tmp, and the previously sorted data will not be overwritten.

//Prototype: merge two sets of numbers and order them
while (begin1 <= end1 & amp; & amp; begin2 <= end2) {<!-- --> // & amp; & amp;If one group meets the conditions, it will no longer continue and jump out of the loop.
if (a[begin1] < a[begin2]) {<!-- -->
tmp[index + + ] = a[begin1 + + ];
}
else {<!-- -->
tmp[index + + ] = a[begin2 + + ];
}
}

while (begin1 <= end1) {<!-- --> //Insert the remaining items that have not been inserted into tmp
tmp[index + + ] = a[begin1 + + ];
}
while (begin2 <= end2) {<!-- --> //Insert the remaining items that have not been inserted into tmp
tmp[index + + ] = a[begin2 + + ];
}

        //Copy back to original array
//source dest copied array size
memcpy(a + begin,tmp + begin,sizeof(DataType)*(end-begin + 1));
}



void MergeSort(DataType* a, int n) {<!-- -->
DataType* tmp = (DataType*)malloc(sizeof(DataType) * n); //Open a new array (temporary storage) for the merge sort process
if (tmp == NULL) {<!-- -->
perror("malloc fail");
return;
}

//Pass the array to be sorted, the tmp temporary array used in the merging process, and the merging range.
_MergeSort(a, tmp, 0, n - 1); //Because there is malloc tmp operation in the main function, if the main function is called recursively, malloc will be used every time it is called, and then free is a waste of space.
//So use sub-functions for recursion [_sub-function]
free(tmp);
}

(4) Merge sort efficiency analysis

1) Time complexity O(N*logN)

Bipartite has a logN layer. Precisely because it is a logN layer, the recursion depth will not be too deep, so there is no need to consider non-recursion. Of course, non-recursion can also be implemented.
Each layer has N numbers for merge sorting

=>N*logN

2) Space complexity O(N)

The space complexity is a bit high (if there are 1kw of data, 1kw of space must be opened)
A new array with space size N is opened for the merging and sorting process.

What problems will occur when merging on the original array:

Will overwrite data
After the smallest 1 is moved to the position of 8, 8 and 7 no longer remain in order.

Stability: Very stable

2. Non-recursive implementation of merge sort

Merge sort is a binary idea => logN level => The recursion will not be too deep, and after the compiler is optimized, the performance gap between recursion and non-recursion is not that big => So there is no need to consider non-recursion, but the non-recursive implementation does not Disaster. Let’s walk you through how to implement it simply.

(1) Discussion on the shortcomings of recursion

Disadvantages of recursion: Recursion consumes stack frames. If the recursion level is too deep, it is easy to explode the stack.
[The stack space is relatively small, only 8M in x86 (32-bit) environment. (Compared to the heap in the same environment, there is 2G +). Because usually function calls cannot open many stack frames. Theoretically, it may explode if the recursion depth is >1w, but in fact, it will explode if the recursion depth is about 5k]

At this time, you need to change the non-recursive

Recursive->Non-recursive:

Change cycle
Utilize [data structure] stack (essentially the memory space opened on the heap through malloc, the memory space is large enough)
Recursively and work backwards (for example, it is also feasible to work backwards for the Fibonacci sequence) [The non-recursive implementation of merge sort is also a good example]
The non-recursive implementation of merge sort uses the third point.

(2) Merge sort non-recursive algorithm implementation ideas

Although it is not recursive, it is the reverse version of recursion. It is directly starting from the deepest level, going back in reverse order, and starting the merge sort directly, eliminating the need to go deeper into the recursion. Although it is not recursive, it can be regarded as another implementation of the recursive idea.

Create a new array (temporary storage) for the merge sort process
int gap=1; gap*=2 [gap controls the scope of merging: 11 merges, 22 merges, 44 merges]
for (int i = 0; i < n; i + = 2 * gap) { [i controls the group number for comparison and controls the group number for merging]
After one round of merging, copy the merged ordered array back to the original array using memcpy.
Enter a new round of merging, until gap>n, the merging is completed

☆Two things to note
6. if (begin2 >= n) { break; } The second group does not exist, and this group does not need to be merged.
7. if (end2 > n) { end2 = n – 1; } The second group’s right boundary is out of bounds, please correct it

The difference between placing memcpy() inside a for() loop [copying one group after merging it together] and putting it outside the for() loop [copying it back together after merging one round] and the situations and problems that may arise

(3) Detailed explanation of code source

//Merge sort - non-recursive version: start from the bottom and work backwards (like Fibonacci)
void MergeSortNonR(DataType* a,int n) {<!-- -->
DataType* tmp = (DataType*)malloc(sizeof(DataType) * n); //Open a new array (temporary storage) for the merge sort process
if (tmp == NULL) {<!-- -->
perror("malloc fail");
return;
}
\t
int gap = 1;
while (gap < n) {<!-- --> //gap control 11 merge, 22 merge, 44 merge
//i controls the group number for comparison and controls the group number for merging
for (int i = 0; i < n; i + = 2 * gap) {<!-- --> //You can draw the array and demonstrate it on the scratch paper to clarify the numbers to be compared: begin1, end1, The quantitative relationship between begin2 and end2 and i and gap
//[begin1,end1][begin2,end2] merge
int begin1 = i; int end1 = i + gap - 1;
int begin2 = i + gap; int end2 = i + 2 * gap - 1;
\t\t\t

//If the second group does not exist, this group does not need to be merged
if (begin2 >= n) {<!-- -->
break;
}

//The second group's right boundary cross-border problem, correct it
if (end2 > n) {<!-- -->
end2 = n - 1;
}

int index = i;
while (begin1 <= end1 & amp; & amp; begin2 <= end2) {<!-- --> // & amp; & amp; One of the groups does not meet the conditions (one of the arrays has been traversed). If you continue, you will break out of the loop.
if (a[begin1] < a[begin2]) {<!-- --> //Compare the two arrays and put the smaller one in
tmp[index + + ] = a[begin1 + + ];
}
else
{<!-- -->
tmp[index + + ] = a[begin2 + + ];
}
}

while (begin1 <= end1) {<!-- --> //Traverse the remaining parts of the array that have not been traversed into
tmp[index + + ] = a[begin1 + + ];
}
while (begin2 <= end2) {<!-- -->
tmp[index + + ] = a[begin2 + + ];
}

//Copy back to original array
memcpy(a + i, tmp + i,(end2-i + 1)*sizeof(DataType)); //Put it in the for() loop, merge a group, and copy it back
//Put it outside, after one round of merging according to the gap, and then copy it back together.

//Error (2*gap) * sizeof(DataType): Not all 2*gap numbers must be copied. As gap*=2 increases, 2*gap may exceed the range of the original array.
//memcpy(a + i, tmp + i, (2*gap) * sizeof(DataType));
//Error begin1 will change (begin1 + +), and i represents the position of the array head
//memcpy(a + i, tmp + i, (end2 - begin1 + 1) * sizeof(DataType));
}

printf("\
");

gap *= 2; //gap controls overall merging
}
free(tmp);
}