Arbitrary modulus NTT (decomposition factor FFT)

Prerequisite knowledge:

f

f

T

FFT

FFT

Introduction

General NTT modulus

P

P

P must be satisfied

P

=

r

x

2

k

+

1

P=r\times 2^k + 1

P=r×2k+1. But if this condition is not met, NTT cannot be used directly, and any modulus NTT needs to be used.

You can learn about the three-modulus NTT.

Let’s introduce a kind of arbitrary modulus NTT – dismantling coefficient FFT.

Coefficient FFT

For polynomial

f

(

x

)

f(x)

Coefficient of f(x)

A

A

A, you can

A

A

A is divided into high

A

1

A_1

A1? and low

A

0

A_0

A0?, such as

A

=

A

1

x

2

15

+

A

0

A=A_1\times 2^{15} + A_0

A=A1?×215+A0?. Then, you can put

A

1

A_1

A1? as a coefficient to construct a polynomial

f

1

(

x

)

f_1(x)

f1?(x), will

A

0

A_0

A0? as a coefficient to construct a polynomial

f

0

(

x

)

f_0(x)

f0?(x). Similarly, the polynomial

g

(

x

)

g(x)

Coefficient of g(x)

B

B

B is divided into high

B

1

B_1

B1? and low

B

0

B_0

B0?, get

g

1

(

x

)

g_1(x)

g1?(x) and

g

0

(

x

)

g_0(x)

g0?(x).

So,

f

(

x

)

g

(

x

)

=

f

1

(

x

)

g

1

(

x

)

x

2

30

+

(

f

1

(

x

)

g

0

(

x

)

+

f

0

(

x

)

g

1

(

x

)

)

x

2

15

+

f

0

(

x

)

g

0

(

x

)

f(x)g(x)=f_1(x)g_1(x)\times 2^{30} + (f_1(x)g_0(x) + f_0(x)g_1(x))\times 2^{15} + f_0(x)g_0(x)

f(x)g(x)=f1?(x)g1?(x)×230 + (f1?(x)g0?(x) + f0?(x)g1?(x))×215 + f0?(x)g0?(x)

use

f

f

T

FFT

FFT is used for processing, and the forward transformation needs four times. Originally, the inverse transformation also needs four times, but because

f

1

(

x

)

g

0

(

x

)

f_1(x)g_0(x)

f1?(x)g0?(x) and

f

0

(

x

)

g

1

(

x

)

f_0(x)g_1(x)

f0?(x)g1?(x) are of the same order, and can be added and then inversely transformed, so only three inverse transformations are required.

After obtaining the three parts, when multiplying the order of each part,

Throughout the process, a total of

7

7

7 times

D.

f

T

DFT

DFT or

I

D.

f

T

IDFT

IDFT.

The time complexity is

o

(

no

log

?

no

)

O(n\log n)

O(nlogn).

Compared with the three-modulus NTT, the factorized FFT is easy to understand, easy to implement, and runs fast, but sometimes the accuracy may be problematic, and if the coefficients in the resulting polynomial are large and the long double cannot be stored, it cannot be used. Be careful when doing the questions.

Example

Luogu P4245 [Template] Arbitrary modulus polynomial multiplication

#include <bits/stdc++.h>
using namespace std;
const int N=500000;
const long double pi=acos(-1.0);
int n,m,l=1;
long long p,f[N+5],g[N+5],w1[N+5],w2[N+5],w3[N+5],ans[N+5];
struct cp{<!-- -->
long double a, b;
cp operator + (const cp ax)const{<!-- -->
return (cp){<!-- -->a + ax.a,b + ax.b};
}
cp operator -(const cp ax)const{<!-- -->
return (cp){<!-- -->a-ax.a,b-ax.b};
}
cp operator *(const cp ax)const{<!-- -->
return (cp){<!-- -->a*ax.a-b*ax.b,b*ax.a + a*ax.b};
}
}w,wn,v1[N+5],v2[N+5],v3[N+5],v4[N+5],h1[N+5],h2[N+5],h3[N+5];
void ch(cp *a){<!-- -->
for(int i=1,j=l/2,k;i<l-1;i ++ ){<!-- -->
if(i<j) swap(a[i],a[j]);
k=l/2;
while(j>=k){<!-- -->
j-=k;k>>=1;
}
j + =k;
}
}
void fft(cp *a,int fl){<!-- -->
ch(a);
for(int i=2;i<=l;i<<=1){<!-- -->
wn=(cp){<!-- -->cos(fl*2*pi/i),sin(fl*2*pi/i)};
for(int j=0;j<l;j + =i){<!-- -->
w=(cp){<!-- -->1,0};
for(int k=j;k<j + i/2;k + + ,w=w*wn){<!-- -->
cp t=a[k],u=w*a[k + i/2];
a[k]=t + u;
a[k + i/2]=t-u;
}
}
}
}
int main()
{<!-- -->
scanf("%d%d%lld", &n, &m, &p);
for(int i=0;i<=n;i ++ ){<!-- -->
scanf("%lld", &f[i]);f[i]%=p;
}
for(int i=0;i<=m;i ++ ){<!-- -->
scanf("%lld", &g[i]);g[i]%=p;
}
while(l<n + m + 1) l<<=1;
for(int i=0;i<l;i ++ ){<!-- -->
v1[i].a=f[i]>>15;
v2[i].a=f[i] &((1<<15)-1);
v3[i].a=g[i]>>15;
v4[i].a=g[i] &((1<<15)-1);
}
fft(v1,1);fft(v2,1);fft(v3,1);fft(v4,1);
for(int i=0;i<l;i ++ ){<!-- -->
h1[i]=v1[i]*v3[i];
h2[i]=v1[i]*v4[i] + v2[i]*v3[i];
h3[i]=v2[i]*v4[i];
}
fft(h1,-1);fft(h2,-1);fft(h3,-1);
for(int i=0;i<l;i ++ ){<!-- -->
w1[i]=(long long)(h1[i].a/l + 0.5)%p;
w2[i]=(long long)(h2[i].a/l + 0.5)%p;
w3[i]=(long long)(h3[i].a/l + 0.5)%p;
ans[i]=(w1[i]*(1<<30)%p + w2[i]*(1<<15)%p+w3[i])%p;
}
for(int i=0;i<=n + m;i + + ){<!-- -->
printf("%lld ",ans[i]);
}
return 0;
}