[Record] PDF|Chinese and English PDF scanned version directory extraction (1, QQ+GPT)

need:
1) Quickly extract table of contents from PDF;
2) Don’t want to download any software.

Article directory

  • 1. Directly export the directory using existing commonly used software
    • 1 (recommendation index☆) QQ OCR text recognition
    • 2 (recommendation index 0 stars) GPT4 image recognition
    • 3 (recommended index 0 stars) GPT4 AI PDF plug-in
    • 4 (recommended index ☆☆☆☆) QQ + GPT3.5 combination
    • 5 (recommended index ☆☆☆☆) QQ + GPT4 combination

The table of contents text I extracted will be used to embed it into the PDF. For software to batch add tables of contents to PDF and how to use the software, please see my previous article: PDF batch insertion of table of contents.

There will be another article later on using Python for OCR extraction, which can throw the extraction process into the server background and make it more convenient to use (no need to open GPT or QQ).

Picture of the directory used for testing:

1. Directly export the directory using existing commonly used software

This solution is generated using commonly used software, and there is no need to download other strange software.

1 (recommendation index☆) QQ OCR text recognition

Star Points:
1) Ready to use
2) Chinese glyph recognition is particularly accurate

Deduction of star points:
1) There is no way to run it in the background
2) Number recognition is particularly poor,
3) The format is very messy, and it takes a long time to adjust for directory extraction.

Recognition results:

Abstract..
Abstract . . . . . . . . . . . . . . . . . . . . . ........[II Chapter 1 Introduction...... . . . .. . . . .
1.1: Research background......................11.2 Research status at home and abroad...... ... . . .. . . . . . . ............ .. ..... . ........2
1.2.1 Current research status of large integer decomposition.... . . ... . . . .. .... . . . . . . . . . . . . . . . . . . 3
1.2.2 Current status of research on general number field sieve method..... . Research progress..
.... . . . .. . ... . .. .. ... . . . . .. .. . . . . . .. ... 4
1.3 The main content of the paper........................................ ..41.4 Structural arrangement of the paper... .... .... . . . . . . . . .. . . . . . . . . . ....5
i . . i . . ... o
Chapter 2 Relevant Theoretical Basis... .. .. ........... 72.1 Basics of Cryptography... .........................72.1.1 Principles of Cryptography....... . . ............. …….. 2.1.2 Public Key Cryptosystem......i......................92.1.3 RSA Public Key Cryptosystem ....
2.2 Basics of integer decomposition... .. .9
2.2.1 The problem of integer factorization... . . . . . . . . . . . . . . . 102.2.2 Commonly used integer decomposition methods.... . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.3 RSA Factorization Challenge Number...... . . . . . . . . . . . . . .. . . . . . . . . . . . . . . 122.2.4 Determination of prime numbers... . . . ..................................................152.3 General Introduction to number field sieve method.... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
……………….l7
2.3.1 Polynomial selection... .. . . . . . . . . . . . . . . . . . . . . . . 172.3.2 Sieve number pairs.... . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3.3 Solving a system of linear equations..... . .4﹑Solving the square roots of algebraic numbers.... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.4 General number fields Typical application examples of the sieve method... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192.4.1 - Polynomial selection for factoring RSA-768 ......... .. . . . . . . . . . . . . . . . . .. .. 192.4.2 Screening and filtering of decomposed RSA-768..... . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.4.3 Solve the system of equations by factoring RSA-768... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.4.4 Solving the square root of factoring RSA-768...
1K Little P· . . . . . . . . . . . . . . . . . . . . . . . 21
2.5 Summary of this chapter ........ .. . . . . . . . . . . . .
Chapter 3 Analysis and Comparison of Linear Polynomial Selection Methods... . . . . . . . . . . . . . . . .

2 (recommendation index 0 stars) GPT4 image recognition

I have been using GPT4 quite comfortably until I encountered the need for directory recognition.
It’s gibbering and annoying!

3 (recommendation index 0 stars) GPT4 AI PDF plug-in

I was originally wondering if the native prompt command was wrong and whether using a plug-in would help, but plug-in it!
It doesn’t support recognition of scanned PDFs at all, game over!

4 (recommended index ☆☆☆☆) QQ + GPT3.5 combination

The above does not mean that QQ’s text recognition is extremely accurate, but the catalog compilation work is very heavy.
But GPT happens to be a smart robot. It usually talks nonsense but does serious work.

So it can be like this: QQ automatically recognizes it first, and then copies it to GPT with one click, saying:

Revise this table of contents:
Summary..
Abstract . . . . . . . . . . . . . . . . . . . . . ........[II Chapter 1 Introduction...... . . . .. . . . .
1.1: Research background......................11.2 Research status at home and abroad...... ... . . .. . . . . . . ............ .. ..... . ........2
1.2.1 Current research status of large integer decomposition.... . . ... . . . .. .... . . . . . . . . . . . . . . . . . . 3
1.2.2 Current status of research on general number field sieve method..... . Research progress..
.... . . . .. . ... . .. .. ... . . . . .. .. . . . . . .. ... 4
1.3 The main content of the paper........................................ ..41.4 Structural arrangement of the paper... .... .... . . . . . . . . .. . . . . . . . . . ....5
i . . i . . ... o
Chapter 2 Relevant Theoretical Basis... .. .. ........... 72.1 Basics of Cryptography... .........................72.1.1 Principles of Cryptography....... . . ............. …….. 2.1.2 Public Key Cryptosystem......i......................92.1.3 RSA Public Key Cryptosystem ....
2.2 Basics of integer decomposition... .. .9
2.2.1 The problem of integer factorization... . . . . . . . . . . . . . . . 102.2.2 Commonly used integer decomposition methods.... . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.3 RSA Factorization Challenge Number...... . . . . . . . . . . . . . .. . . . . . . . . . . . . . . 122.2.4 Determination of prime numbers... . . . ..................................................152.3 General Introduction to number field sieve method.... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
……………….l7
2.3.1 Polynomial selection... .. . . . . . . . . . . . . . . . . . . . . . . 172.3.2 Sieve number pairs.... . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3.3 Solving a system of linear equations..... . .4﹑Solving the square roots of algebraic numbers.... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.4 General number fields Typical application examples of the sieve method... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192.4.1 - Polynomial selection for factoring RSA-768 ......... .. . . . . . . . . . . . . . . . . .. .. 192.4.2 Screening and filtering of decomposed RSA-768..... . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.4.3 Solve the system of equations by factoring RSA-768... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.4.4 Solving the square root of factoring RSA-768...
1K Little P· . . . . . . . . . . . . . . . . . . . . . . . 21
2.5 Summary of this chapter ........ .. . . . . . . . . . . . .
Chapter 3 Analysis and Comparison of Linear Polynomial Selection Methods... . . . . . . . . . . . . . . . .

Output result GPT3:

It’s fast and good! ! !
One star is deducted because it does not make up for some of the missing page information.

5 (recommended index ☆☆☆☆) QQ + GPT4 combination

Likewise, I also tested the results of GPT4.
GPT4 deducts one star because it responds slower than GPT3.5. Although it even outputs indentation, indentation can be added automatically in software that adds directories in batches, so it does not constitute an advantage.

But the advantage is that it completes all the missing page number information! too strong.

Output result GPT4:

In short, the best way is QQ recognition + GPT3.5.

Extended reading: [Tools] FreePic2PDF + PdgCntEditor | PDF batch bookmarks (Windows)