Python Mnist handwritten multi-digit recognition based on paddleocr2.7.0

Directory of series articles

Chapter 1 Python Mnist handwritten multi-digit recognition based on paddleocr2.7.0

Article directory

  • Table of Contents of Series Articles
  • Preface
  • 1. Install PaddleOCR
  • 2. Configuration environment
  • 3. Create a handwritten digit data set
    • 1. Splicing data sets
        • Splicing the data set from 0-99:
    • 2.Data display
  • 4. Training model
    • 1. Download the pre-trained model
    • 2. Modify parameters
    • 3. Model training
    • 4. Model export
  • 5. Model testing
    • 1. Sampling test pictures
    • 2. Model testing
  • 6. System testing (detection + identification)
    • 1. Download the detection and direction model
    • 2. System detection
  • Summarize
  • References:

Foreword

Reference PaddleOCR: Handwritten multi-digit recognition based on MNIST dataset.
This link uses PaddleOCRv2.1 version PP-OCRv2.0 to train the recognition of handwritten digits. After reproduction, the recognition accuracy can reach 99%.
At present, PaddleOCR2.7.0 has been released. This article uses the PP-OCRv4.0 version of the model for handwritten digit recognition.

1. Install PaddleOCR

Website: https://github.com/PaddlePaddle/PaddleOCR
release2.7, select code, click download zip, download to local decompression.

2. Configuration environment

pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple/ --trusted-host pypi.tuna.tsinghua.e
du.cn

3. Create a handwritten digit data set

The traditional MNIST handwriting dataset only has single digits 0-9. By splicing data, a multi-digit recognition model is implemented.

1. Splicing data sets

In the project root directory, create a new folder dataset, and create two new folders train and test in the dataset.
~/PaddleOCR/dataset/train
~/PaddleOCR/dataset/test

Splicing 0-99 data set:

code show as below:

import cv2
import random
import numpy as np
from tqdm import tqdm
from paddle.vision.datasets import MNIST

#Load the dataset
mnist_train = MNIST(mode='train', backend='cv2')
mnist_test = MNIST(mode='test', backend='cv2')

#Dataset preprocessing
datas_train = {<!-- -->}
for i in range(len(mnist_train)):
    sample = mnist_train[i]
    x, y = sample[0], sample[1] # x: the pixel corresponding to the number (28*28); y corresponds to the real digital label

    _sum = np.sum(x, axis=0) # Add the pixels of x to row 0
    _where = np.where(_sum > 0) # _sum > 0 When the condition is true, return the coordinates
    x = 255 - x[:, _where[0][0]: _where[0][-1] + 1] # Slice the pixel matrix of x, and finally generate the effect of "white text on black background"
    # Save all the training elements corresponding to the number 5 into a list, and store them in datas_train according to the serial number 5
    if str(y[0]) in datas_train:
        datas_train[str(y[0])].append(x)
    else:
        datas_train[str(y[0])] = [x]

# The processing process is the same as the training set
datas_test = {<!-- -->}
for i in range(len(mnist_test)):
    sample = mnist_test[i]
    x, y = sample[0], sample[1]

    _sum = np.sum(x, axis=0)
    _where = np.where(_sum > 0)
    x = 255 - x[:, _where[0][0]: _where[0][-1] + 1]
    if str(y[0]) in datas_test:
        datas_test[str(y[0])].append(x)
    else:
        datas_test[str(y[0])] = [x]

# Picture splicing sampling
datas_train_list = []
for num in tqdm(range(0, 100)): # Change to 0 - 100 according to project needs
    for _ in range(1000): # For each number, generate 1000 training samples
        imgs = [255 - np.zeros((28, np.random.randint(10)))] # Generate a 28*n matrix with pixels of 255, n in [0~10]
        for word in str(num): # Traverse the string character by character
            index = np.random.randint(0, len(datas_train[word])) # Randomly select an index among the elements corresponding to num such as 0
            imgs.append(datas_train[word][index]) # Add to imgs
            imgs.append(255 - np.zeros((28, np.random.randint(10)))) # Then add a blank section
        img = np.concatenate(imgs, 1) # Concatenate in the column dimension
        cv2.imwrite('dataset/train/ d_ d.jpg' % (num, _), img) # Write local dataset
        datas_train_list.append('train/ d_ d.jpg\t%d\
' % (num, _, num))

datas_test_list = []
for num in tqdm(range(0, 100)):
    for _ in range(50):
        imgs = [255 - np.zeros((28, np.random.randint(10)))]
        for word in str(num):
            index = np.random.randint(0, len(datas_test[word]))
            imgs.append(datas_test[word][index])
            imgs.append(255 - np.zeros((28, np.random.randint(10))))
        img = np.concatenate(imgs, 1)
        cv2.imwrite('dataset/test/ d_ d.jpg' % (num, _), img)
        datas_test_list.append('test/ d_ d.jpg\t%d\
' % (num, _, num))

# Data list generation
with open('dataset/train.txt', 'w') as f:
    for line in datas_train_list:
        f.write(line)

with open('dataset/test.txt', 'w') as f:
    for line in datas_test_list:
        f.write(line)

2. Data display

picture:
Training Picture 1

Training Picture 2

4. Training model

1. Download the pre-trained model

~/PaddleOCR
Create a new folder pretrain_models
Download training model

https://paddleocr.bj.bcebos.com/PP-OCRv4/chinese/ch_PP-OCRv4_rec_server_train.tar

Description: The training model is a model obtained by finetune based on the pre-training model on real data and vertical synthetic text data. It has better performance in real application scenarios. The pre-training model is directly based on the full volume It is trained on real data and synthetic data, and is more suitable for finetune on your own data set.

Unzip the tar file in the terminal:

cd pretrain_models
tar -xf ch_PP-OCRv4_rec_server_train.tar & amp; & amp; del ch_PP-OCRv4_rec_server_train.tar

2. Modify parameters

Copy PaddleOCR/configs/rec/PP-OCRv4/ch_PP-OCRv4_rec_hgnet.yml
Go to the PaddleOCR directory and change the name to multi_mnist.yml

The parameters are modified as follows:

Global:
  debug: false
  use_gpu: True
  epoch_num: 10
  log_smooth_window: 20
  print_batch_step: 10
  save_model_dir: ./output/multi_mnist
  save_epoch_step: 1
  eval_batch_step: [0, 200]
  cal_metric_during_train: true
  pretrained_model: ./pretrain_models/ch_PP-OCRv4_rec_server_train/best_accuracy
  checkpoints:
  save_inference_dir:
  use_visualdl: false
  infer_img: doc/imgs_words/ch/word_1.jpg
  character_dict_path: ./label_list.txt
  max_text_length: &max_text_length 25
  infer_mode: false
  use_space_char: true
  distributed: true
  save_res_path: ./output/rec/predicts_ppocrv3.txt


Optimizer:
  name: Adam
  beta1: 0.9
  beta2: 0.999
  lr:
    name: cosine
    learning_rate: 0.001
    warmup_epoch: 5
  regularizer:
    name: L2
    factor: 3.0e-05


Architecture:
  model_type: rec
  algorithm: SVTR_HGNet
  Transform:
  Backbone:
    name: PPHGNet_small
  Head:
    name: MultiHead
    head_list:
      - CTCHead:
          Neck:
            name: svtr
            dims: 120
            depth: 2
            hidden_dims: 120
            kernel_size: [1, 3]
            use_guide: True
          Head:
            fc_decay: 0.00001
      -NRTRHead:
          nrtr_dim: 384
          max_text_length: *max_text_length

Loss:
  name: MultiLoss
  loss_config_list:
    - CTCLoss:
    -NRTRLoss:

PostProcess:
  name: CTCLabelDecode

Metric:
  name: RecMetric
  main_indicator: acc

Train:
  dataset:
    name: MultiScaleDataSet
    ds_width: false
    data_dir: ./dataset
    ext_op_transform_idx: 1
    label_file_list:
    - ./dataset/train.txt
    transforms:
    - DecodeImage:
        img_mode: BGR
        channel_first: false
    -RecConAug:
        prob: 0.5
        ext_data_num: 2
        image_shape: [48, 320, 3]
        max_text_length: *max_text_length
    -RecAug:
    -MultiLabelEncode:
        gtc_encode: NRTRLabelEncode
    - KeepKeys:
        keep_keys:
        -image
        - label_ctc
        - label_gtc
        - length
        -valid_ratio
  sampler:
    name: MultiScaleSampler
    scales: [[320, 32], [320, 48], [320, 64]]
    first_bs: &bs 16
    fix_bs: false
    divided_factor: [8, 16] # w, h
    is_training: True
  loader:
    shuffle: true
    batch_size_per_card: *bs
    drop_last: true
    num_workers: 8
Eval:
  dataset:
    name: SimpleDataSet
    data_dir: ./dataset/
    label_file_list:
    - ./dataset/test.txt
    transforms:
    - DecodeImage:
        img_mode: BGR
        channel_first: false
    -MultiLabelEncode:
        gtc_encode: NRTRLabelEncode
    - RecResizeImg:
        image_shape: [3, 48, 320]
    - KeepKeys:
        keep_keys:
        -image
        - label_ctc
        - label_gtc
        -length
        -valid_ratio
  loader:
    shuffle: false
    drop_last: false
    batch_size_per_card: 16
    num_workers: 4

3. Model training

python tools/train.py -c ./multi_mnist.yml

1. If a no exits error message appears, change it according to the prompts.
2. An out of memory error occurs.
Modify first_bs in train: & amp;bs 16 and batch_size_per_card in evl: 16
, which can be adjusted appropriately according to your own GPU performance.
3. It may be that the batch size is set too small. The global_step of each epoch reaches about 8820. In the second epoch, the accuracy remains at 99%.

 ppocr INFO: epoch: [1/10], global_step: 8820, lr: 0.000200, acc: 0.968749, norm_edit_dis: 0.991667, CTCLoss: 0.105981, NRTRLoss: 0.609214, loss
: 0.720151, avg_reader_cost: 0.00000 s, avg_batch_cost: 0.35449 s, avg_samples: 10.6, ips: 29.90187 samples/s, eta: 7:48:30

[2023/10/18 13:07:56] ppocr INFO: epoch: [2/10], global_step: 8850, lr: 0.000200, acc: 0.999999, norm_edit_dis: 1.000000, CTCLoss: 0.078658, NRTRLoss: 0.631617, loss
: 0.715509, avg_reader_cost: 0.00040 s, avg_batch_cost: 0.35049 s, avg_samples: 11.4, ips: 32.52572 samples/s, eta: 7:48:30
[2023/10/18 13:07:59] ppocr INFO: epoch: [2/10], global_step: 8860, lr: 0.000201, acc: 0.999999, norm_edit_dis: 1.000000, CTCLoss: 0.076151, NRTRLoss: 0.631300, loss
: 0.715509, avg_reader_cost: 0.00000 s, avg_batch_cost: 0.34813 s, avg_samples: 10.0, ips: 28.72520 samples/s, eta: 7:48:26
[2023/10/18 13:08:03] ppocr INFO: epoch: [2/10], global_step: 8870, lr: 0.000201, acc: 0.937499, norm_edit_dis: 0.989583, CTCLoss: 0.118480, NRTRLoss: 0.620982, loss
: 0.725218, avg_reader_cost: 0.00000 s, avg_batch_cost: 0.36089 s, avg_samples: 11.4, ips: 31.58819 samples/s, eta: 7:48:23
[2023/10/18 13:08:06] ppocr INFO: epoch: [2/10], global_step: 8880, lr: 0.000201, acc: 0.937499, norm_edit_dis: 0.989583, CTCLoss: 0.132596, NRTRLoss: 0.633977, loss
: 0.769039, avg_reader_cost: 0.00000 s, avg_batch_cost: 0.35529 s, avg_samples: 12.4, ips: 34.90077 samples/s, eta: 7:48:20
[2023/10/18 13:08:10] ppocr INFO: epoch: [2/10], global_step: 8890, lr: 0.000201, acc: 0.968749, norm_edit_dis: 0.994792, CTCLoss: 0.142756, NRTRLoss: 0.630706, loss
: 0.783180, avg_reader_cost: 0.00040 s, avg_batch_cost: 0.36093 s, avg_samples: 11.4, ips: 31.58535 samples/s, eta: 7:48:17

4. Model export

/PaddleOCR

python3 tools/export_model.py -c ./multi_mnist.yml -o Global.pretrained_model=./output/multi_mnist/best_accuracy Global.load_static_weights=False Global.save_inference_dir=./inference/multi_mnist

5. Model testing

1. Sampling test pictures

~PaddleOCR
Create a new folder test_imgs
The code to run creat_test_imgs.py is as follows:

import cv2
import random
import numpy as np
from tqdm import tqdm
from paddle.vision.datasets import MNIST

#Load the dataset
mnist_test = MNIST(mode='test', backend='cv2')

#Dataset preprocessing
datas_test = {<!-- -->}
for i in range(len(mnist_test)):
    sample = mnist_test[i]
    x, y = sample[0], sample[1]

    _sum = np.sum(x, axis=0)
    _where = np.where(_sum > 0)
    x = 255 - x[:, _where[0][0]: _where[0][-1] + 1]
    if str(y[0]) in datas_test:
        datas_test[str(y[0])].append(x)
    else:
        datas_test[str(y[0])] = [x]

# Picture splicing sampling
for num in tqdm(range(0, 100)):
    imgs = [255 - np.zeros((28, np.random.randint(10)))]
    for word in str(num):
        index = np.random.randint(0, len(datas_test[word]))
        imgs.append(datas_test[word][index])
        imgs.append(255 - np.zeros((28, np.random.randint(10))))
    img = np.concatenate(imgs, 1)
    cv2.imwrite('./test_imgs/ d.jpg' % num, img)

The content of label_list.txt file is as follows

0
1
2
3
4
5
6
7
8
9

2. Model testing

cd ~/PaddleOCR

python tools/infer/predict_rec.py --image_dir="./test_imgs" --rec_model_dir="./inference/multi_mnist/" --rec_char_dict_path="./label_list.txt"

The training results are shown below:

[2023/10/18 14:01:17] ppocr INFO: Predicts of ./test_imgs\091.jpg:(91’, 0.9993628263473511)
[2023/10/18 14:01:17] ppocr INFO: Predicts of ./test_imgs\092.jpg:(92’, 0.9998289346694946)
[2023/10/18 14:01:17] ppocr INFO: Predicts of ./test_imgs\093.jpg:(93’, 0.9994925260543823)
[2023/10/18 14:01:17] ppocr INFO: Predicts of ./test_imgs\094.jpg:(94’, 0.9997493028640747)
[2023/10/18 14:01:17] ppocr INFO: Predicts of ./test_imgs\095.jpg:(95’, 0.9997667074203491)
[2023/10/18 14:01:17] ppocr INFO: Predicts of ./test_imgs\096.jpg:(96’, 0.9991846084594727)
[2023/10/18 14:01:17] ppocr INFO: Predicts of ./test_imgs\097.jpg:(97’, 0.9987956285476685)
[2023/10/18 14:01:17] ppocr INFO: Predicts of ./test_imgs\098.jpg:(98’, 0.9908415675163269)

The accuracy of digital recognition reaches about 99%.

6. System testing (detection + identification)

1. Download the detection and direction model

Download the inference model of PP-OCRv4_det and the inference model of PP-OCRv2_cls to the inference folder


Unzip the tar file using the following command

tar -xf ch_PP-OCRv4_det_infer.tar & amp; & amp; del ch_PP-OCRv4_det_infer.tar

tar -xf ch_ppocr_mobile_v2.0_cls_infer.tar & amp; & amp; del ch_ppocr_mobile_v2.0_cls_infer.tar


2. System detection

Create a new template/test folder and put pictures containing handwritten numbers.
———-PaddleOCR
———-template/test

python ./tools/infer/predict_system.py --image_dir="../template/test/" --rec_model_dir="./inference/multi_mnist/" --det_model_dir="./inference/ch_PP-OCRv4_det_infer " --cls_model_dir="./inference/ch_ppocr_mobile_v2.0_cls_infer" --rec_char_dict_path="./label_list.txt"

Test result analysis:


The recognition results of 8 handwritten pictures show that the model detection effect of PP-OCRv4 is significantly improved compared to PP-OCRv2. Only one number was not recognized among the 8 pictures. However, the recognition effect is not ideal. The recognition accuracy of the eight pictures is: 40%, 60%, 10% (one less number is detected), 22%, 87.5%, 64%, 75%, 95%.

It is closely related to the degree of standardization of handwriting, and the recognition accuracy needs to be further improved.

Summary

This article introduces the application of paddleocr to recognize handwritten digits. If you need to fine-tune the detection model, you can download the detection model for training.

Reference materials:

1.PaddleOCR: Handwritten multi-digit recognition based on MNIST data set
2. PaddleOCR trains its own data set (has been pitted on Windows 10)