Find the longest repeating substring in a string—-Finding feature vector N method

Article directory

    • topic
    • Question analysis
    • Implementation details
    • Functions and their functions
    • Use parameters and specific functions
      • getmax_N(string P,int* N):
      • Next(string P)
      • main()
    • Program running results
  • Supplement (code explanation of feature vector N)

Title

Write an algorithm to return the longest repeating substring and its subscript position for a given string str.

eg: str=”abcdacdac”, then the substring “cdac” is the longest repeated substring of str, and the subscript is 2

Question analysis

1. Step 1: Traverse the string to obtain the feature vectors N of all substrings
2. Step 2: Find the maximum eigenvector NN_max of all substrings
3. Step 3: Return the subscript of the longest repeated string

Implementation details

  • Traverse each character in the string, and obtain the characteristic value N corresponding to each character based on the matching method of the prefix substring and the left substring;

  • Compare the characteristic values corresponding to all strings obtained in a, and save the maximum value in the array N_Store;

  • Remove the first character of the string and repeat a and b again. And immediately compare the sizes of the largest feature vectors stored in the N_Store array, save the larger one in the variable NN_max, and use index_NN_max to record the subscript of the corresponding element in the N_Store array at this time;




  • If the length of the remaining string after deletion is smaller than NN_max, the deletion operation will no longer be performed, otherwise the deletion will continue, that is, the c operation will be repeated;
  • Finally, index_NN_max is the starting position of the longest repeated substring, and NN_max is the length of the longest repeated substring;

Functions and their functions

getmax_N(string P,int* N): Find the largest eigenvalue in the eigenvector array
Next(string P)

  1. Find the feature vector corresponding to each character of the string P;
  2. Find the maximum value of all eigenvectors in 1 and store the value in the array N_Store;
  3. If this operation has been performed multiple times, then compare the maximum value of the largest feature vector obtained by each operation to obtain the length and starting subscript of the longest repeated substring. After each recursion, update the longest substring. The string length and corresponding starting index will not be updated until the target value is found;

main(): Returns the longest repeated substring and its corresponding starting position;

Use parameters and specific functions

#define SIZE 1000
int N_Store[SIZE] = {<!-- -->0}; // Used to store the maximum value of the string feature vector obtained after each deletion operation
int NN_max = N_Store[0]; // Used to store the maximum value among all maximum values, that is, the length of the longest repeated substring
static int countn = 0; // Used to store the valid value length of the N_Store array
int index_NN_max = 0; // Store the starting index of the longest repeated substring

getmax_N(string P,int* N):

/*
Find the largest eigenvalue in the eigenvector array
P: string corresponding to the feature vector array
N: feature vector array
maxN: the maximum value in this feature vector array
*/
int getmax_N(string P,int* N) {<!-- -->
    int maxN = N[0];
    int n = 0;
    while (n<P.length()) // Traverse the array N (because N stores the eigenvalue vector array of this string P, so P.length is used directly), and take the maximum value
    {<!-- -->
        maxN = max(N[n + + ], maxN);
    }
    return maxN;
}

Next(string P)

/*
Feature vector update function
P: the string to be judged;
m: string length;
N[]: array of feature vectors corresponding to strings;
Get the job done:
1. Find the feature vector corresponding to each character of the string P
2. Find the maximum value of all eigenvectors in 1, and store this value in the array N_Store
3. If this operation has been performed multiple times, compare the maximum value of the largest eigenvector obtained each time from these operations.
*/
int* Next(string P) {<!-- -->
    int m = P.length();
    assert(m > 0);
    int* N = new int[m];
    assert(N != 0);
    N[0] = 0; //Set the first value of the feature vector to 0
    for (int i = 1; i < m; + + i) {<!-- --> // Traverse the string
        int k = N[i - 1]; // Find the previous eigenvalue N[i-1] at position i
        while (k > 0 & amp; & amp; P[k] != P[i]) // Drawing representation
        {<!-- -->
            k = N[k - 1];
        }
        N[i] = (P[i] == P[k]) ? k + 1 : 0; // shown in the figure
    }
    int maxN = getmax_N(P,N); // Call the function to find the maximum value N of the feature vector of this P string
    cout << "The largest eigenvalue in the eigenvector array is: " << endl;
    cout << maxN << endl;
    N_Store[countn] = maxN; // Put the maximum value of the feature vector of this P string into the N_Store array
    int flag_maxN = NN_max;
    if (countn >= 1) {<!-- -->
        NN_max = max(NN_max, N_Store[countn-1]); // Find the largest feature vector in the current N_Store array (the largest is the largest)
        if (NN_max != flag_maxN) {<!-- --> // When the maximum eigenvalue is updated, the corresponding subscript is also updated
            index_NN_max = countn - 1; // Record the starting index of the longest repeated substring
        }
    }
    countn + + ; // countn + 1, move the N_Store array backward
    if (P.length() > NN_max & amp; & amp; P.length() > 1) {<!-- --> // Judgment criteria for deleting the first element in the string, if the current string If the length is already smaller than the current maximum eigenvector value, there is no need to delete it.
        for (int t = 0; t < m - 1; t + + ) {<!-- -->
            P[t] = P[t + 1]; // The last element has not been deleted at this time. We are worried about crossing the boundary, so we set m-1
        }
        if (!P.empty()) {<!-- -->
            P.pop_back(); // Delete the element at the last position of the string
        }
        Next(P); // Recursive operation, delete the string after the first element, calculate the eigenvalues again, find the maximum value of all eigenvalues, etc.
    }

    return N;
}

main()

int main() {<!-- -->
    static string str;
    str = "abcdacdac";
    string str_Store = str; // This str_Store string is used to store the initial str string that has not been deleted.
    Next(str); // Perform the work of finding the feature vectors of strings and their substrings. The obtained N_Store array includes the largest feature vector in the feature vector array, as well as the starting subscript index_NN_max and length NN_max of the longest repeated substring.
    printf("After deleting %d characters, the maximum repeated substring length obtained is: %d\\
", index_NN_max, NN_max);
    for (int out_max = index_NN_max; out_max < index_NN_max + NN_max; out_max + + ) {<!-- -->
        cout << str_Store[out_max] << "";
    }
}

Program running results

Supplement (code explanation of feature vector N)