Implementation of Fast Multiplication and Modular Reduction algorithms in finite fields

1. Introduction

For basic knowledge about finite fields, please refer to:

RISC Zero Team November 2022 Video Intro to Finite Fields: RISC Zero Study Club

Finite fields are the basis for almost all mathematics in cryptography.
All operations in the ZKP proof system are based on finite fields:

Digital circuits using Boolean operations: such as AND, OR, NOT.
Arithmetic circuits using finite field operations: such as addition, multiplication, and negation.

However, real computers do not have finite field circuit devices, only:

ADD rax, rbx
MUL rax
SHR rax, CL
etc.

Therefore, finite field operations need to be constructed based on the above operations.
The speed of finite field operations is critical for the following reasons:

The biggest obstacle to the usability of ZKP is the proof overhead.
Almost all the proof time is spent on finite field operations. To improve ZKP proof speed:
- Reduce the number of finite field operations (e.g., more efficient NTT or MSM algorithms)
- Make finite field operations more efficient (e.g., use optimized finite field representations)

The main contents of this article are:

BigInts
BigInts classic addition operation
BigInts classic multiplication operation
Modular reduction (Barrett’s algorithm): Most useful when the numerical representation cannot be changed.
Montgomery form
Multiplication and reduction (Montgomery algorithm): the most commonly used algorithm.
Other multiplication algorithms

And compared the classic algorithms for large integer multiplication operations, Barrett algorithm, and Montgomery algorithm:

2. Large integers and their addition and multiplication operations

Big integers, also known as BigInt or Multiprecision Integers.
The operators for real computers are word based:

Almost all modern computers use 64-bit words
But 32-bit words are not completely obsolete. For example, in the IoT world.

For larger (such as 256-bit) fields, they will be divided into words to represent:

For example, 256-bit numbers are usually represented by four 64-bit words.
For example, an 8-digit decimal number can be represented by four 2-digit words.

For example, if the large integer 27311837 is expressed in decimal digits, the corresponding value is:

(

)

100

(27\ 31\ 18\ 37)_{100}

(27 31 18 37)100?.

2.1 Classic addition of large integers

The corresponding large integer addition operation, such as

(

)

100

(

)

100

(27\ 31\ 18\ 37)_{100} + (88\ 68\ 97\ 89)_{100}

(27 31 18 37)100? + (88 68 97 89)100?, the calculation rule is:

For details, see the Algorithm 10.3 algorithm in the Handbook of Elliptic and Hyperelliptic Curve Cryptography:

2.2 Classic multiplication of large integers

(

)

100

(

)

100

(54\ 12)_{100}*(36\ 29)_{100}

(54 12) 100 (36 29) 100? Large integer multiplication operation is an example. For details, see the Algorithm 10.8 algorithm in the Handbook of Elliptic and Hyperelliptic Curve Cryptography:

The calculation data corresponding to each step is:

3. Modular Reduction

It should be noted that the results of the above addition and multiplication operations are all larger values, and these large result values need to be reduced to the corresponding canonical representation, such as:

Common Modular Reduction algorithms are:

1) Barret reduction algorithm: Most useful when the numerical representation cannot be changed.
2) Montgomery multiplication and reduction algorithm: the most commonly used algorithm.

Related blogs include:

Basic algorithm optimization–Fast Modular Multiplication
GPU/CPU friendly modular multiplication algorithm: Multi-Precision Fast Modular Multiplication
Montgomery reduction – multi-precision modular multiplication algorithm

3.1 Barret reduction algorithm

The most obvious way to do reduction is to divide, but division is expensive and may not be constant time. Single-word division operation

b=1，R=2^k

b=1, R=2k for example:

func reduce(a uint) uint {
    q:= a / n // Division implicitly returns the floor of the result.
    return a - q * n
}

There will be timing attack problems at non-constant time.
Barrett reduction is

1/n

1/n is approximately

m/2^k

m/2k, because

m/2^k

Division in m/2k is actually a right shift operation, which is much cheaper. [Can be approximated

The value of m is

m=\left \lfloor 2^k/n\right \rfloor

m=?2k/n?]

func reduce(a uint) uint {
    q := (a * m) >> k // ">> k" denotes bitshift by k.
    return a - q * n
}

However, the result after this reduce is

[

)

[0,2n)

[0,2n) instead of

[

)

[0,n)

[0,n), so further reduction is required:

func reduce(a uint) uint {
    q := (a * m) >> k
    a -= q * n
    if a >= n {
        a -= n
    }
    return a
}

The Algorithm 10.17 algorithm in the Handbook of Elliptic and Hyperelliptic Curve Cryptography extends it to the multi-word Barrett Reduction algorithm, and the result before the last step of reduce above is no longer

[

)

[0,2n)

[0,2n) but a possibly larger range value, so step 4 in the Algorithm 10.17 algorithm uses while:

3.2 Montgomery multiplication and reduction algorithm

Montgomery Form is another finite field representation that supports fast combined multiplication and reduction algorithms.

Previously the finite field elements were expressed as:

∈

[

]

x\in [0,N-1]

x∈[0,N?1]

The Montgomery Form representation is defined as:

[

]

(

)

[x]=(xR)\mod N

[x]=(xR)modN

The Montgomery Reduction algorithm calculates:

(

)

(

)

REDC(u)=(uR^{-1})\mod N

REDC(u)=(uR?1)modN
Instead of the previous calculation by Barrett Reduction

u\mod N

umodN.

REDC

REDC is a very versatile formula:

1) Convert Classic to Montgomery:
[

x

]

=

R

E

D

C

(

(

x

R

2

)

m

o

d

N

)

[x]=REDC((xR^2)\mod N)

[x]=REDC((xR2)modN)
2) Convert Montgomery to Classic:
R

E

D

C

(

[

x

]

)

=

x

REDC([x])=x

REDC([x])=x
3) Multiplication operation represented by Montgomery Form:
(

(

x

R

m

o

d

N

)

?

(

y

R

m

o

d

N

)

?

R

?

1

m

o

d

N

)

=

(

x

y

R

)

m

o

d

N

((xR\mod N)*(yR\mod N)*R^{-1}\mod N)=(xyR)\mod N

((xRmodN)?(yRmodN)?R?1modN)=(xyR)modN, corresponding to the corresponding implementation in the Algorithm 11.3 algorithm in the Handbook of Elliptic and Hyperelliptic Curve Cryptography:

Among them, the Montgomery reduction algorithm implemented in the Algorithm 10.22 algorithm in the Handbook of Elliptic and Hyperelliptic Curve Cryptography is:

4. Other multiplication algorithms

The evolution process of the Multiplication algorithm is:

The multiplication algorithm was once considered to have a runtime of approximately
O

(

n

2

)

O(n^2)

O(n2).
Karatsuba invented a divide-and-conquer algorithm whose runtime is
O

(

n

1.58

)

O(n^{1.58})

O(n1.58).
The Toom-Cook multiplication algorithm is similar to the Karatsuba algorithm, with slightly better performance.
Sch?nhage–Strassen invented an NTT algorithm whose runtime is
O

(

n

?

log

?

n

?

log

?

log

?

n

)

O(n\cdot \log n\cdot \log\log n)

O(n?logn?loglogn).
It is even slower when multiplying large integers, such as a 4096-bit RSA key.

Reference materials

[1] RISC Zero team February 2023 video Finite Field Implementations: Barrett & Montgomery [slide see Finite Field Implementations]
[2] Wikipedia Barrett reduction

RISC Zero Series Blog

RISC0: Towards a Unified Compilation Framework for Zero Knowledge
Risc Zero ZKVM: zk-STARKs + RISC-V
Highlights of ZK Hack and ZK Summit in 2023
RISC Zero zkVM White Paper
Risc0: Use Continunations to prove arbitrary EVM transactions
Zeth: The first Type 0 zkEVM
Introduction to the RISC Zero project
RISC Zero zkVM performance metrics
Continuations: Extending RISC Zero zkVM to support (infinitely) large computations
A summary on the FRI low degree test first 2 pages of introduction
Reed-Solomon Codes and their relationship to RISC Zero zkVM
RISC Zero zkVM architecture
The relationship between RISC-V and RISC Zero zkVM