Dynamic programming quickly sums the combination number and finds the inverse element of Fermat’s little theorem

Description of topic

In fact, it needs to be translated

no

2

n^2

Find the number of combinations of (m, n) point pairs of n2 kinds

(

m

no

)

\binom{m}{n}

(nm?)

However, the range of data given in this question is

2 \times 10^5

2×105, even with

(

)

\mathbb{O}(1)

The O(1) method to obtain the size of each combination will also be TLE!.

Dynamic programming method

In this question, two material packages need to be selected each time, and the number of combinations required is:

(

)

\binom{A_i + B_i + A_j + B_j}{A_i + B_i}

(Ai? + Bi?Ai? + Bi? + Aj? + Bj)
We can think of it as the so-called “walking grid” method to solve, that is, for example: from the upper left corner of a square matrix

(

)

(0, 0)

Start at (0,0) and end at the lower right corner of the square matrix

(

)

(m, n)

(m,n), you can only choose to go right or down each time, and the final number of methods is

(

)

\binom{m + n}{n}

(nm + n?) Therefore, we can correspond the two, and compare the above solution process to this problem, as

(

)

(-A_i, -B_i)

(?Ai?,?Bi?) to

(

)

(A_j, B_j)

(Aj?, Bj?) (since there are no negative coordinates, you need to add an offset offset)
There is also recursion:

[

]

[

]

[

]

[

]

[

]

[

]

dp[i][j]=dp[i][j – 1] + dp[i – 1][j]

dp[i][j]=dp[i][j?1] + dp[i?1][j]

[

]

[

]

dp[0][0] = 1

dp[0][0]=1

So this problem can be solved by recursion in dynamic programming, just need to coordinate dp[offset - A_i][offset - B_i] + + corresponding to the number of materials in each material package , and then use the above recursive formula to superimpose the contribution of each point.

Superposition may be a bit difficult to understand, in fact, it is seen as: for each point pair (A_i, B_i), from dp[offset - A_i][offset - B_i] Start to calculate this recursive formula, and finally add up the n recursive tables. That is to say, after completing the dp[offset - A_i][offset - B_i] + + operation for each point pair, you only need to perform the above recursion once to get all the and style. That is, the answer is in n dp[offset + A_i][offset + B_i]

But there are some details:

First of all, the contribution of itself to itself must be subtracted, because the same material package will not be selected at the same time.
Secondly, there will be such a situation
(

?

A

i

,

?

B

i

)

?

(

A

j

,

B

j

)

(-A_i, -B_i) \Rightarrow (A_j, B_j)

(?Ai?,?Bi?)?(Aj?,Bj?)

(

?

A

j

,

?

B

j

)

?

(

A

i

,

B

i

)

(-A_j, -B_j) \Rightarrow (A_i, B_i)

(?Aj?,?Bj?)?(Ai?,Bi?) That is, after processing the first point and adding the final n numbers, the result obtained is actually twice the answer, and still needs to be divided by two to get correct answer.

For the first point above

The initial thought was that this would only need to be done

(

)

\mathbb{O}(n)

O(n) times to find the number of combinations can be solved by Yang Hui Triangle. However, according to the data, it can be observed that the maximum need to reach

(

∣

)

800

(4 \cdot |A|) ^ 2 = 8000^2

(4?∣A∣)2=80002
MLE will happen!

This kind of observation is very important. It is easy to find if the array is too large, but if the array is not enough, it is easy to overflow and it is difficult to check (initially I only opened 4000 * 4000, so it is not too large but The input data would overflow it again, which ended up not being checked), in fact, this sentence was summed up by hours of debugging with no progress.

Therefore, it is necessary to change the direction, and you can store the factorial value of 1 ~ 8000 instead, which greatly optimizes the space. Since the data in this question is too large, the final answer also needs to be correct to 998244353The output can only be performed after taking the modulus, so all numbers are calculated in a space of mod = 998244353 . According to modular arithmetic there are:

(

)

(

)

(a + b) \% mod = (a \% mod + b\% mod) \% mod

(a + b)%mod = (a%mod + b%mod)%mod

(

)

(

)

(a – b) \% mod = (a \%mod – b\%mod + mod) \% mod

(a?b)%mod=(a%mod?b%mod + mod)%mod

(

)

(

)

(a * b)\%mod = (a\%mod * b\% mod) \%mod

(a?b)%mod=(a%mod?b%mod)%mod
But the division in this space is not so simple, it needs to be multiplied in the form of inverse elements to achieve the effect of division. When mod is a prime number, its inverse can be solved by Fermat’s little theorem.

≡

a^{p – 1} \equiv 1\enspace mod \enspace p

ap?1≡1modp

≡

\Rightarrow a \cdot a^{p – 2} \equiv 1 \enspace mod \enspace p

?a?ap?2≡1modp Therefore,

a^{p – 2}

ap?2 is

The inverse of a in this space can be solved by fast power

// Fast exponentiation inversion element just call qpow(a, MOD - 2)
ull qpow(ull a, ull b) {<!-- -->
    ull res = 1;
    while (b)
    {<!-- -->
        if (b & amp; 1) res = res * a % MOD;
        a = a * a % MOD;
        b >>= 1;
    }
    return res;
}

You can initialize the factorial fac and the inverse element inv_fac of the factorial and solve the number of combinations in the following way

factorial:

no

!

=

(

no

?

1

)

!

?

no

n! = (n – 1)! \cdot n

n!=(n?1)!?n , factorial inverse:

1

no

!

=

1

(

no

+

1

)

!

?

(

no

+

1

)

\frac{1}{n!} = \frac{1}{(n + 1)!} \cdot (n + 1)

n!1?=(n + 1)!1(n + 1)

void init() // Initialize factorial and factorial inverse
{<!-- -->
    fac[0] = 1;
    for(int i = 1; i <= 4 * offset; i ++ )
        fac[i] = (fac[i - 1] * i) % MOD;
    inv_fac[4 * offset] = qpow(fac[4 * offset], MOD - 2);
    for(int i = 4 * offset - 1;i >= 0;i--)
        inv_fac[i] = inv_fac[i + 1] * (i + 1) % MOD;
}

ull C(int p, int q) // Select q items from p items
{<!-- -->
    ull ret = ((fac[p] * inv_fac[q]) % MOD * inv_fac[p - q]) % MOD;
    // Every time you multiply, you must pay attention to taking the modulus to prevent overflow
    return ret;
}

Final code

#include 
#include 
using namespace std;

#define MOD 998244353
typedef pair PII;
typedef unsigned long long ull;
const int offset = 2000;
int n;
vector v;
ull dp[2 * offset + 1][2 * offset + 1];
//ull map[4 * offset + 1][4 * offset + 1];
ull fac[4 * offset + 1];
ull inv_fac[4 * offset + 1];

// Fast exponentiation inversion element just call qpow(a, MOD - 2)
ull qpow(ull a, ull b) {<!-- -->
    ull res = 1;
    while (b)
    {<!-- -->
        if (b & amp; 1) res = res * a % MOD;
        a = a * a % MOD;
        b >>= 1;
    }
    return res;
}
void init()
{
    fac[0] = 1;
    for(int i = 1; i <= 4 * offset; i ++ )
        fac[i] = (fac[i - 1] * i) % MOD;
    inv_fac[4 * offset] = qpow(fac[4 * offset], MOD - 2);
    for(int i = 4 * offset - 1;i >= 0;i--)
        inv_fac[i] = inv_fac[i + 1] * (i + 1) % MOD;
}
ull C(int p, int q)
{
    ull ret = ((fac[p] * inv_fac[q]) % MOD * inv_fac[p - q]) % MOD; // Pay attention to modulo every time to prevent overflow
    return ret;
}

int main()
{
    init();
    cin >> n;
    for(int i = 1; i <= n; i ++ )
    {
        int a, b;
        cin >> a >> b;
        v.push_back({a, b});
    }

    for(int i = 0; i < n; i ++ )
        dp[offset - v[i].first][offset - v[i].second] + = 1; // If there are two packages with the same content, they still need to be superimposed (the packages are regarded as different)

    for(int i = 0; i <= 2 * offset; i ++ )
    {
        for(int j = 0; j <= 2 * offset; j ++ )
        {
            if (i == 0 & amp; & amp; j == 0) continue;
            if (i > 0) dp[i][j] = dp[i][j] + dp[i - 1][j];
            if (j > 0) dp[i][j] = dp[i][j] + dp[i][j - 1];
            dp[i][j] %= MOD;
        }
    }

    ull ans = 0;
    ull inv = qpow(2, MOD - 2);
    for(int i = 0; i < n; i ++ )
    {
        // To subtract the "contribution" from the map to itself
        ans = (ans + (dp[offset + v[i].first][offset + v[i].second]
               - C(2 * (v[i].first + v[i].second),2 * v[i].second) + MOD) % MOD ) % MOD;
    }
    cout << ans * inv % MOD;
}

Differential Testing

According to the original idea, there is a code of brute_force:

#include <iostream>
#include <vector>
using namespace std;

typedef pair<int, int> PII;
#define MOD 998244353
const int offset = 2000;

int map[offset + 1][offset + 1];
vector<PII> v;

int main()
{<!-- -->
    int n;
    cin >> n;
    for (int i = 0; i < n; i ++ )
    {<!-- -->
        int a, b;
        cin >> a >> b;
        v.push_back({<!-- --> a, b });
    }

    for (int i = 0; i <= offset; i ++ ) map[i][0] = 1;
    for (int i = 1; i <= offset; i ++ )
        for (int j = 1; j <= i; j ++ )
            map[i][j] = (map[i - 1][j - 1] + map[i - 1][j]) % MOD;
    unsigned long long ans = 0;
    for (int i = n - 1; i >= 0; i--)
        for (int j = 0; j < i; j ++ )
            ans = (ans + map[v[i].first + v[i].second + v[j].first + v[j].second][v[i].first + v[j].first] ) % MOD;
    cout << ans;
}

You can perform differential testing on the code obtained after the above analysis and the brute_force version to verify its correctness. (For the specific test method, see the previous blog)

In fact, when debugging, I also used differential testing to check, but they are all in

no

≤

200

,

∣

A

i

∣

≤

500

n \leq 200, |A_i| \leq 500

n≤200,∣Ai?∣≤500 (Because if the data is too large, brute_force is too slow), the final answer is the same, but it just fails because of the small size Range data did not detect overflow errors!