HDU 1686: Oulipo ← KMP algorithm (overlapping calculations)

[Source of the question]
http://acm.hdu.edu.cn/showproblem.php?pid=1686
http://poj.org/problem?id=3461

[Title description]
The French author Georges Perec (1936–1982) once wrote a book, La disparition, without the letter ‘e’. He was a member of the Oulipo group. A quote from the book:
Tout avait Pair normal, mais tout s’affirmait faux. Tout avait Fair normal, d’abord, puis surgissait l’inhumain, l’affolant. son tapis, assailant à tout instant son imagination, l’intuition d’un tabou, la vision d’un mal obscur, d’un quoi vacant, d’un non-dit: la vision, l’avision d’un oubli commandant tout, où s’abolissait la raison : tout avait l’air normal mais…
Perec would probably have scored high (or rather, low) in the following contest. People are asked to write a perhaps even meaningful text on some subject with as few occurrences of a given “word” as possible. Our task is to provide the jury with a program that counts these occurrences, in order to obtain a ranking of the competitors. These competitors often write very long texts with nonsense meaning; a sequence of 500,000 consecutive ‘T’s is not unusual. And they never use spaces.
So we want to quickly find out how often a word, i.e., a given string, occurs in a text. More formally: given the alphabet {‘A’, ‘B’, ‘C’, …, ‘Z’} and two finite strings over that alphabet, a word W and a text T, count the number of occurrences of W in T. All the consecutive characters of W must exactly match consecutive characters of T. Occurrences may overlap.

[Input format]
The first line of the input file contains a single number: the number of test cases to follow. Each test case has the following format:
One line with the word W, a string over {‘A’, ‘B’, ‘C’, …, ‘Z’}, with 1 ≤ |W| ≤ 10,000 (here |W | denotes the length of the string W).
One line with the text T, a string over {‘A’, ‘B’, ‘C’, …, ‘Z’}, with |W| ≤ |T| ≤ 1,000,000.

[Output format]
For every test case in the input file, the output should contain a single number, on a single line: the number of occurrences of the word W in the text T.

[Input sample]
3
BAPC
BAPC
AZA
AZAZAZA
VERDI
AVERDXIVYERDIAN

[Output sample]
1
3
0

[Algorithm Analysis]
1. For the conventionally defined KMP function, the first parameter is the main string and the second parameter is the pattern string. However, the example given in this question is to enter the pattern string first and then the main string. Therefore, when calling the KMP function defined in this question, pay attention to the order of parameters.
2. The statistics in this question are the number of times the pattern string overlaps in the main string.
Therefore, in the KMP function, when if(j==lent), j=0 when statistics are not overlapped is changed to j=ne[j] when statistics are overlapped. For example:
Topic “HDU 2087: Cutting Floral Cloth Strips”, Count the number of times the pattern string appears in the main string without overlapping. (if(j==lent), j=0)
Title “HDU 1686: Oulipo”, Overlap
Counts the number of times the pattern string appears in the main string. (if(j==lent), j=ne[j])
3. It is proven that HDU does not support finding the length of string, so you can modify the string in the following code into a character array and then submit it to HDU for testing. But note that the length of the character array x is calculated with strlen(x), and the length of the string x is calculated with x.length().

[Algorithm code: string version]
This string version of the code is correct, but it fails the test on the HDU because the HDU cannot recognize the string.

#include<bits/stdc + + .h>
using namespace std;

const int maxn=1e4 + 5;
int ne[maxn];

void getNext(string t) {
    int len=t.length();
    int i=0, j=-1;
    ne[0]=-1;
    while(i<len) {
        if(j==-1 || t[i]==t[j]) {
            i + + ;
            j + + ;
            ne[i]=j;
        } else j=ne[j];
    }
}

int KMP(string S,string T) {
    int lens=S.length();
    int lent=T.length();
    int i=0;
    int j=0;
    int cnt=0;
    while(i<lens & amp; & amp; j<lent) {
        if(j==-1 || S[i]==T[j]) {
            i + + ;
            j + + ;
        } else j=ne[j];
        if(j==lent) {
            cnt + + ;
            j=ne[j]; //j=0;
        }
    }
    return cnt;
}

int main() {
    int T;
    cin>>T;
    while(T--) {
        string s,t;
        cin>>s>>t;
        getNext(t);
        cout<<KMP(t,s)<<endl;
    }
    return 0;
}


/*
input:
3
BAPC
BAPC
AZA
AZAZAZA
VERDI
AVERDXIVYERDIAN

output:
1
3
0
*/

[Algorithm code: character array version]
This character array version code can pass the test normally on HDU. This character array version code is only slightly modified based on the string version code.

#include <iostream>
#include <cstring>
using namespace std;

const int maxn=1e6 + 5;
char s[maxn];
char t[maxn];
int ne[maxn];

void getNext(char t[]) {
    int len=strlen(t);
    int i=0, j=-1;
    ne[0]=-1;
    while(i<len) {
        if(j==-1 || t[i]==t[j]) {
            i + + ;
            j + + ;
            ne[i]=j;
        } else j=ne[j];
    }
}

int KMP(char S[],char T[]) {
    int lens=strlen(S);
    int lent=strlen(T);
    int i=0;
    int j=0;
    int cnt=0;
    while(i<lens & amp; & amp; j<lent) {
        if(j==-1 || S[i]==T[j]) {
            i + + ;
            j + + ;
        } else j=ne[j];
        if(j==lent) {
            cnt + + ;
            j=ne[j]; //j=0;
        }
    }
    return cnt;
}

int main() {
    int T;
    cin>>T;
    while(T--) {
        cin>>s>>t;
        getNext(t);
        cout<<KMP(t,s)<<endl;
    }
    return 0;
}


/*
input:
3
BAPC
BAPC
AZA
AZAZAZA
VERDI
AVERDXIVYERDIAN

output:
1
3
0
*/

[References]
https://blog.csdn.net/hnjzsyjyj/article/details/127112363
https://blog.csdn.net/hnjzsyjyj/article/details/127140892
https://blog.csdn.net/hnjzsyjyj/article/details/127112363
https://blog.csdn.net/Ezereal/article/details/50998678

The knowledge points of the article match the official knowledge files, and you can further learn related knowledge. Algorithm skill tree Home page Overview 57340 people are learning the system