Directory of series articles
Chapter 1 Python Mnist handwritten multi-digit recognition based on paddleocr2.7.0
Article directory
- Table of Contents of Series Articles
- Preface
- 1. Install PaddleOCR
- 2. Configuration environment
- 3. Create a handwritten digit data set
-
- 1. Splicing data sets
-
-
- Splicing the data set from 0-99:
-
- 2.Data display
- 4. Training model
-
- 1. Download the pre-trained model
- 2. Modify parameters
- 3. Model training
- 4. Model export
- 5. Model testing
-
- 1. Sampling test pictures
- 2. Model testing
- 6. System testing (detection + identification)
-
- 1. Download the detection and direction model
- 2. System detection
- Summarize
- References:
Foreword
Reference PaddleOCR: Handwritten multi-digit recognition based on MNIST dataset.
This link uses PaddleOCRv2.1 version PP-OCRv2.0 to train the recognition of handwritten digits. After reproduction, the recognition accuracy can reach 99%.
At present, PaddleOCR2.7.0 has been released. This article uses the PP-OCRv4.0 version of the model for handwritten digit recognition.
1. Install PaddleOCR
Website: https://github.com/PaddlePaddle/PaddleOCR
release2.7, select code, click download zip, download to local decompression.
2. Configuration environment
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple/ --trusted-host pypi.tuna.tsinghua.e du.cn
3. Create a handwritten digit data set
The traditional MNIST handwriting dataset only has single digits 0-9. By splicing data, a multi-digit recognition model is implemented.
1. Splicing data sets
In the project root directory, create a new folder dataset, and create two new folders train and test in the dataset.
~/PaddleOCR/dataset/train
~/PaddleOCR/dataset/test
Splicing 0-99 data set:
code show as below:
import cv2 import random import numpy as np from tqdm import tqdm from paddle.vision.datasets import MNIST #Load the dataset mnist_train = MNIST(mode='train', backend='cv2') mnist_test = MNIST(mode='test', backend='cv2') #Dataset preprocessing datas_train = {<!-- -->} for i in range(len(mnist_train)): sample = mnist_train[i] x, y = sample[0], sample[1] # x: the pixel corresponding to the number (28*28); y corresponds to the real digital label _sum = np.sum(x, axis=0) # Add the pixels of x to row 0 _where = np.where(_sum > 0) # _sum > 0 When the condition is true, return the coordinates x = 255 - x[:, _where[0][0]: _where[0][-1] + 1] # Slice the pixel matrix of x, and finally generate the effect of "white text on black background" # Save all the training elements corresponding to the number 5 into a list, and store them in datas_train according to the serial number 5 if str(y[0]) in datas_train: datas_train[str(y[0])].append(x) else: datas_train[str(y[0])] = [x] # The processing process is the same as the training set datas_test = {<!-- -->} for i in range(len(mnist_test)): sample = mnist_test[i] x, y = sample[0], sample[1] _sum = np.sum(x, axis=0) _where = np.where(_sum > 0) x = 255 - x[:, _where[0][0]: _where[0][-1] + 1] if str(y[0]) in datas_test: datas_test[str(y[0])].append(x) else: datas_test[str(y[0])] = [x] # Picture splicing sampling datas_train_list = [] for num in tqdm(range(0, 100)): # Change to 0 - 100 according to project needs for _ in range(1000): # For each number, generate 1000 training samples imgs = [255 - np.zeros((28, np.random.randint(10)))] # Generate a 28*n matrix with pixels of 255, n in [0~10] for word in str(num): # Traverse the string character by character index = np.random.randint(0, len(datas_train[word])) # Randomly select an index among the elements corresponding to num such as 0 imgs.append(datas_train[word][index]) # Add to imgs imgs.append(255 - np.zeros((28, np.random.randint(10)))) # Then add a blank section img = np.concatenate(imgs, 1) # Concatenate in the column dimension cv2.imwrite('dataset/train/ d_ d.jpg' % (num, _), img) # Write local dataset datas_train_list.append('train/ d_ d.jpg\t%d\ ' % (num, _, num)) datas_test_list = [] for num in tqdm(range(0, 100)): for _ in range(50): imgs = [255 - np.zeros((28, np.random.randint(10)))] for word in str(num): index = np.random.randint(0, len(datas_test[word])) imgs.append(datas_test[word][index]) imgs.append(255 - np.zeros((28, np.random.randint(10)))) img = np.concatenate(imgs, 1) cv2.imwrite('dataset/test/ d_ d.jpg' % (num, _), img) datas_test_list.append('test/ d_ d.jpg\t%d\ ' % (num, _, num)) # Data list generation with open('dataset/train.txt', 'w') as f: for line in datas_train_list: f.write(line) with open('dataset/test.txt', 'w') as f: for line in datas_test_list: f.write(line)
2. Data display
picture:
4. Training model
1. Download the pre-trained model
~/PaddleOCR
Create a new folder pretrain_models
Download training model
https://paddleocr.bj.bcebos.com/PP-OCRv4/chinese/ch_PP-OCRv4_rec_server_train.tar
Description: The training model is a model obtained by finetune based on the pre-training model on real data and vertical synthetic text data. It has better performance in real application scenarios. The pre-training model is directly based on the full volume It is trained on real data and synthetic data, and is more suitable for finetune on your own data set.
Unzip the tar file in the terminal:
cd pretrain_models tar -xf ch_PP-OCRv4_rec_server_train.tar & amp; & amp; del ch_PP-OCRv4_rec_server_train.tar
2. Modify parameters
Copy PaddleOCR/configs/rec/PP-OCRv4/ch_PP-OCRv4_rec_hgnet.yml
Go to the PaddleOCR directory and change the name to multi_mnist.yml
The parameters are modified as follows:
Global: debug: false use_gpu: True epoch_num: 10 log_smooth_window: 20 print_batch_step: 10 save_model_dir: ./output/multi_mnist save_epoch_step: 1 eval_batch_step: [0, 200] cal_metric_during_train: true pretrained_model: ./pretrain_models/ch_PP-OCRv4_rec_server_train/best_accuracy checkpoints: save_inference_dir: use_visualdl: false infer_img: doc/imgs_words/ch/word_1.jpg character_dict_path: ./label_list.txt max_text_length: &max_text_length 25 infer_mode: false use_space_char: true distributed: true save_res_path: ./output/rec/predicts_ppocrv3.txt Optimizer: name: Adam beta1: 0.9 beta2: 0.999 lr: name: cosine learning_rate: 0.001 warmup_epoch: 5 regularizer: name: L2 factor: 3.0e-05 Architecture: model_type: rec algorithm: SVTR_HGNet Transform: Backbone: name: PPHGNet_small Head: name: MultiHead head_list: - CTCHead: Neck: name: svtr dims: 120 depth: 2 hidden_dims: 120 kernel_size: [1, 3] use_guide: True Head: fc_decay: 0.00001 -NRTRHead: nrtr_dim: 384 max_text_length: *max_text_length Loss: name: MultiLoss loss_config_list: - CTCLoss: -NRTRLoss: PostProcess: name: CTCLabelDecode Metric: name: RecMetric main_indicator: acc Train: dataset: name: MultiScaleDataSet ds_width: false data_dir: ./dataset ext_op_transform_idx: 1 label_file_list: - ./dataset/train.txt transforms: - DecodeImage: img_mode: BGR channel_first: false -RecConAug: prob: 0.5 ext_data_num: 2 image_shape: [48, 320, 3] max_text_length: *max_text_length -RecAug: -MultiLabelEncode: gtc_encode: NRTRLabelEncode - KeepKeys: keep_keys: -image - label_ctc - label_gtc - length -valid_ratio sampler: name: MultiScaleSampler scales: [[320, 32], [320, 48], [320, 64]] first_bs: &bs 16 fix_bs: false divided_factor: [8, 16] # w, h is_training: True loader: shuffle: true batch_size_per_card: *bs drop_last: true num_workers: 8 Eval: dataset: name: SimpleDataSet data_dir: ./dataset/ label_file_list: - ./dataset/test.txt transforms: - DecodeImage: img_mode: BGR channel_first: false -MultiLabelEncode: gtc_encode: NRTRLabelEncode - RecResizeImg: image_shape: [3, 48, 320] - KeepKeys: keep_keys: -image - label_ctc - label_gtc -length -valid_ratio loader: shuffle: false drop_last: false batch_size_per_card: 16 num_workers: 4
3. Model training
python tools/train.py -c ./multi_mnist.yml
1. If a no exits error message appears, change it according to the prompts.
2. An out of memory error occurs.
Modify first_bs in train: & amp;bs 16 and batch_size_per_card in evl: 16
, which can be adjusted appropriately according to your own GPU performance.
3. It may be that the batch size is set too small. The global_step of each epoch reaches about 8820. In the second epoch, the accuracy remains at 99%.
ppocr INFO: epoch: [1/10], global_step: 8820, lr: 0.000200, acc: 0.968749, norm_edit_dis: 0.991667, CTCLoss: 0.105981, NRTRLoss: 0.609214, loss : 0.720151, avg_reader_cost: 0.00000 s, avg_batch_cost: 0.35449 s, avg_samples: 10.6, ips: 29.90187 samples/s, eta: 7:48:30 [2023/10/18 13:07:56] ppocr INFO: epoch: [2/10], global_step: 8850, lr: 0.000200, acc: 0.999999, norm_edit_dis: 1.000000, CTCLoss: 0.078658, NRTRLoss: 0.631617, loss : 0.715509, avg_reader_cost: 0.00040 s, avg_batch_cost: 0.35049 s, avg_samples: 11.4, ips: 32.52572 samples/s, eta: 7:48:30 [2023/10/18 13:07:59] ppocr INFO: epoch: [2/10], global_step: 8860, lr: 0.000201, acc: 0.999999, norm_edit_dis: 1.000000, CTCLoss: 0.076151, NRTRLoss: 0.631300, loss : 0.715509, avg_reader_cost: 0.00000 s, avg_batch_cost: 0.34813 s, avg_samples: 10.0, ips: 28.72520 samples/s, eta: 7:48:26 [2023/10/18 13:08:03] ppocr INFO: epoch: [2/10], global_step: 8870, lr: 0.000201, acc: 0.937499, norm_edit_dis: 0.989583, CTCLoss: 0.118480, NRTRLoss: 0.620982, loss : 0.725218, avg_reader_cost: 0.00000 s, avg_batch_cost: 0.36089 s, avg_samples: 11.4, ips: 31.58819 samples/s, eta: 7:48:23 [2023/10/18 13:08:06] ppocr INFO: epoch: [2/10], global_step: 8880, lr: 0.000201, acc: 0.937499, norm_edit_dis: 0.989583, CTCLoss: 0.132596, NRTRLoss: 0.633977, loss : 0.769039, avg_reader_cost: 0.00000 s, avg_batch_cost: 0.35529 s, avg_samples: 12.4, ips: 34.90077 samples/s, eta: 7:48:20 [2023/10/18 13:08:10] ppocr INFO: epoch: [2/10], global_step: 8890, lr: 0.000201, acc: 0.968749, norm_edit_dis: 0.994792, CTCLoss: 0.142756, NRTRLoss: 0.630706, loss : 0.783180, avg_reader_cost: 0.00040 s, avg_batch_cost: 0.36093 s, avg_samples: 11.4, ips: 31.58535 samples/s, eta: 7:48:17
4. Model export
/PaddleOCR python3 tools/export_model.py -c ./multi_mnist.yml -o Global.pretrained_model=./output/multi_mnist/best_accuracy Global.load_static_weights=False Global.save_inference_dir=./inference/multi_mnist
5. Model testing
1. Sampling test pictures
~PaddleOCR
Create a new folder test_imgs
The code to run creat_test_imgs.py is as follows:
import cv2 import random import numpy as np from tqdm import tqdm from paddle.vision.datasets import MNIST #Load the dataset mnist_test = MNIST(mode='test', backend='cv2') #Dataset preprocessing datas_test = {<!-- -->} for i in range(len(mnist_test)): sample = mnist_test[i] x, y = sample[0], sample[1] _sum = np.sum(x, axis=0) _where = np.where(_sum > 0) x = 255 - x[:, _where[0][0]: _where[0][-1] + 1] if str(y[0]) in datas_test: datas_test[str(y[0])].append(x) else: datas_test[str(y[0])] = [x] # Picture splicing sampling for num in tqdm(range(0, 100)): imgs = [255 - np.zeros((28, np.random.randint(10)))] for word in str(num): index = np.random.randint(0, len(datas_test[word])) imgs.append(datas_test[word][index]) imgs.append(255 - np.zeros((28, np.random.randint(10)))) img = np.concatenate(imgs, 1) cv2.imwrite('./test_imgs/ d.jpg' % num, img)
The content of label_list.txt file is as follows
0 1 2 3 4 5 6 7 8 9
2. Model testing
cd ~/PaddleOCR python tools/infer/predict_rec.py --image_dir="./test_imgs" --rec_model_dir="./inference/multi_mnist/" --rec_char_dict_path="./label_list.txt"
The training results are shown below:
[2023/10/18 14:01:17] ppocr INFO: Predicts of ./test_imgs\091.jpg:(91’, 0.9993628263473511)
[2023/10/18 14:01:17] ppocr INFO: Predicts of ./test_imgs\092.jpg:(92’, 0.9998289346694946)
[2023/10/18 14:01:17] ppocr INFO: Predicts of ./test_imgs\093.jpg:(93’, 0.9994925260543823)
[2023/10/18 14:01:17] ppocr INFO: Predicts of ./test_imgs\094.jpg:(94’, 0.9997493028640747)
[2023/10/18 14:01:17] ppocr INFO: Predicts of ./test_imgs\095.jpg:(95’, 0.9997667074203491)
[2023/10/18 14:01:17] ppocr INFO: Predicts of ./test_imgs\096.jpg:(96’, 0.9991846084594727)
[2023/10/18 14:01:17] ppocr INFO: Predicts of ./test_imgs\097.jpg:(97’, 0.9987956285476685)
[2023/10/18 14:01:17] ppocr INFO: Predicts of ./test_imgs\098.jpg:(98’, 0.9908415675163269)
The accuracy of digital recognition reaches about 99%.
6. System testing (detection + identification)
1. Download the detection and direction model
Download the inference model of PP-OCRv4_det and the inference model of PP-OCRv2_cls to the inference folder
Unzip the tar file using the following command
tar -xf ch_PP-OCRv4_det_infer.tar & amp; & amp; del ch_PP-OCRv4_det_infer.tar tar -xf ch_ppocr_mobile_v2.0_cls_infer.tar & amp; & amp; del ch_ppocr_mobile_v2.0_cls_infer.tar
2. System detection
Create a new template/test folder and put pictures containing handwritten numbers.
———-PaddleOCR
———-template/test
python ./tools/infer/predict_system.py --image_dir="../template/test/" --rec_model_dir="./inference/multi_mnist/" --det_model_dir="./inference/ch_PP-OCRv4_det_infer " --cls_model_dir="./inference/ch_ppocr_mobile_v2.0_cls_infer" --rec_char_dict_path="./label_list.txt"
Test result analysis:
The recognition results of 8 handwritten pictures show that the model detection effect of PP-OCRv4 is significantly improved compared to PP-OCRv2. Only one number was not recognized among the 8 pictures. However, the recognition effect is not ideal. The recognition accuracy of the eight pictures is: 40%, 60%, 10% (one less number is detected), 22%, 87.5%, 64%, 75%, 95%.
It is closely related to the degree of standardization of handwriting, and the recognition accuracy needs to be further improved.
Summary
This article introduces the application of paddleocr to recognize handwritten digits. If you need to fine-tune the detection model, you can download the detection model for training.
Reference materials:
1.PaddleOCR: Handwritten multi-digit recognition based on MNIST data set
2. PaddleOCR trains its own data set (has been pitted on Windows 10)