Directory
- 1. My ubuntu version! [Insert image description here](https://img-blog.csdnimg.cn/297945917309494ab03b50764e6fb775.png)
- 2. First pull the paddleocr source code
- 3. Download the model
- 4. Preparation before training
- 1. Create a file in the source code folder to put your own things
- 2. Prepare data
-
- 2.1 Data annotation
- 2.2 Data division
- 3. Rewrite the yml configuration file
- 4.Install anaconda
- 5. Start training
- 6. Report an error
-
- (1) libGL.so.1
- (2)Polygon
- (3) lanms
- (4) UnicodeDecodeError: utf-8’ codec can’t decode byte 0xbc in position 2: invalid start byte
- (5) Out of memory error on GPU 0. Cannot allocate xxxxMB memory on GPU 0, xxxxGB memory has been allocated and available memory is only 0.000000B.
1. My ubuntu version
2. First pull the paddleocr source code
Download address: https://gitee.com/paddlepaddle/PaddleOCR
3. Download model
-
I want to train a Chinese model. I saw that the pre-trained model has the best generalization performance, so I downloaded this model.
https://gitee.com/link?target=https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_train.tar -
Other model addresses: https://gitee.com/paddlepaddle/PaddleOCR/blob/release/2.6/doc/doc_ch/models_list.md
4. Preparation before training
1. Create a file to put your own things in the source code folder
-
The config folder is used to install yml configuration files
pretrained_model is used to install the pre-trained model downloaded in the previous step
split_rec_label is used to put the data set
output is used to store the trained model -
It is not mandatory to create a folder, but it is more convenient to manage your own files. The address of the yml source file is
Under the path PaddleOCR-release-2.6/configs/rec/PP-OCRv3
2. Prepare data
2.1 Data annotation
Reference blog: https://blog.csdn.net/qq_49627063/article/details/119134847
2.2 Data Division
Before training, all images are in a folder and all label information is in the same txt file, so a script needs to be written to split them in a ratio of 8:1:1.
import os import re import shutil import random import argparse def split_label(all_label, train_label, val_label, test_label): f = open(all_label, 'r') f_train = open(train_label, 'w') f_val = open(val_label, 'w') f_test = open(test_label, 'w') raw_list = f.readlines() num_train = int(len(raw_list) * 0.8) num_val = int(len(raw_list) * 0.1) num_test = int(len(raw_list) * 0.1) random.shuffle(raw_list) for i in range(num_train): f_train.writelines(raw_list[i]) for i in range(num_train, num_train + num_val): f_val.writelines(raw_list[i]) for i in range(num_train + num_val, num_train + num_val + num_test): f_test.writelines(raw_list[i]) f.close() f_train.close() f_val.close() f_test.close() def split_img(all_imgs, train_label, train_imgs, val_label, val_imgs, test_label, test_imgs): f_train = open(train_label, 'r') f_val = open(val_label, 'r') f_test = open(test_label, 'r') train_list = f_train.readlines() val_list = f_val.readlines() test_list = f_test.readlines() for i in range(len(train_list)): img_path = os.path.join(all_imgs, re.split("[/\t]", train_list[i])[1]) shutil.move(img_path, train_imgs) for i in range(len(val_list)): img_path = os.path.join(all_imgs, re.split("[/\t]", val_list[i])[1]) shutil.move(img_path, val_imgs) for i in range(len(test_list)): img_path = os.path.join(all_imgs, re.split("[/\t]", test_list[i])[1]) shutil.move(img_path, test_imgs) def get_args(): parser = argparse.ArgumentParser() parser.add_argument("--all_label", default="../paddleocr/PaddleOCR/train_data/cls/cls_gt_train.txt") parser.add_argument("--all_imgs_dir", default="../paddleocr/PaddleOCR/train_data/cls/images/") parser.add_argument("--train_label", default="../paddleocr/PaddleOCR/train_data/cls/train.txt") parser.add_argument("--train_imgs_dir", default="../paddleocr/PaddleOCR/train_data/cls/train/") parser.add_argument("--val_label", default="../paddleocr/PaddleOCR/train_data/cls/val.txt") parser.add_argument("--val_imgs_dir", default="../paddleocr/PaddleOCR/train_data/cls/val/") parser.add_argument("--test_label", default="../paddleocr/PaddleOCR/train_data/cls/test.txt") parser.add_argument("--test_imgs_dir", default="../paddleocr/PaddleOCR/train_data/cls/test/") return parser.parse_args() def main(args): if not os.path.isdir(args.train_imgs_dir): os.makedirs(args.train_imgs_dir) if not os.path.isdir(args.val_imgs_dir): os.makedirs(args.val_imgs_dir) if not os.path.isdir(args.test_imgs_dir): os.makedirs(args.test_imgs_dir) split_label(args.all_label, args.train_label, args.val_label, args.test_label) split_img(args.all_imgs_dir, args.train_label, args.train_imgs_dir, args.val_label, args.val_imgs_dir, args.test_label, args.test_imgs_dir) if __name__ == "__main__": main(get_args())
3. Rewrite the yml configuration file
- Source address: https://gitee.com/paddlepaddle/PaddleOCR/blob/release/2.6/configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml
Global: debug: false use_gpu: true epoch_num: 800 log_smooth_window: 20 print_batch_step: 10 save_model_dir: wjp/output/rec_ppocr_v3_distillation save_epoch_step: 3 eval_batch_step: [0, 2000] cal_metric_during_train: true pretrained_model: checkpoints: save_inference_dir: use_visualdl: false infer_img: doc/imgs_words/ch/word_1.jpg character_dict_path: ppocr/utils/ppocr_keys_v1.txt max_text_length: &max_text_length 25 infer_mode: false use_space_char: true distributed: true save_res_path: wjp/output/rec/predicts_ppocrv3_distillation.txt Optimizer: name: Adam beta1: 0.9 beta2: 0.999 lr: name: Piecewise decay_epochs : [700] values: [0.0005, 0.00005] warmup_epoch: 5 regularizer: name: L2 factor: 3.0e-05 Architecture: model_type: &model_type "rec" name: DistillationModel algorithm: Distillation Models: Teacher: pretrained: freeze_params: false return_all_feats: true model_type: *model_type algorithm: SVTR Transform: Backbone: name: MobileNetV1Enhance scale: 0.5 last_conv_stride: [1, 2] last_pool_type: avg Head: name: MultiHead head_list: - CTCHead: Neck: name: svtr dims: 64 depth: 2 hidden_dims: 120 use_guide: True Head: fc_decay: 0.00001 -SARHead: enc_dim: 512 max_text_length: *max_text_length Student: pretrained: freeze_params: false return_all_feats: true model_type: *model_type algorithm: SVTR Transform: Backbone: name: MobileNetV1Enhance scale: 0.5 last_conv_stride: [1, 2] last_pool_type: avg Head: name: MultiHead head_list: - CTCHead: Neck: name: svtr dims: 64 depth: 2 hidden_dims: 120 use_guide: True Head: fc_decay: 0.00001 -SARHead: enc_dim: 512 max_text_length: *max_text_length Loss: name: CombinedLoss loss_config_list: - DistillationDMLLoss: weight: 1.0 act: "softmax" use_log: true model_name_pairs: - ["Student", "Teacher"] key: head_out multi_head: True dis_head:ctc name: dml_ctc - DistillationDMLLoss: weight: 0.5 act: "softmax" use_log: true model_name_pairs: - ["Student", "Teacher"] key: head_out multi_head: True dis_head: sar name: dml_sar - DistillationDistanceLoss: weight: 1.0 mode: "l2" model_name_pairs: - ["Student", "Teacher"] key: backbone_out - DistillationCTCLoss: weight: 1.0 model_name_list: ["Student", "Teacher"] key: head_out multi_head: True - DistillationSARLoss: weight: 1.0 model_name_list: ["Student", "Teacher"] key: head_out multi_head: True PostProcess: name: DistillationCTCLabelDecode model_name: ["Student", "Teacher"] key: head_out multi_head: True Metric: name: DistillationMetric base_metric_name: RecMetric main_indicator: acc key: "Student" ignore_space: False Train: dataset: name: SimpleDataSet data_dir: wjp/split_rec_label/train ext_op_transform_idx: 1 label_file_list: - wjp/split_rec_label/train.txt transforms: - DecodeImage: img_mode: BGR channel_first: false -RecConAug: prob: 0.5 ext_data_num: 2 image_shape: [48, 320, 3] max_text_length: *max_text_length -RecAug: -MultiLabelEncode: - RecResizeImg: image_shape: [3, 48, 320] - KeepKeys: keep_keys: -image - label_ctc - label_sar - length -valid_ratio loader: shuffle: true batch_size_per_card: 32 drop_last: true num_workers: 4 Eval: dataset: name: SimpleDataSet data_dir: wjp/split_rec_label/val label_file_list: - wjp/split_rec_label/val.txt transforms: - DecodeImage: img_mode: BGR channel_first: false -MultiLabelEncode: - RecResizeImg: image_shape: [3, 48, 320] - KeepKeys: keep_keys: -image - label_ctc - label_sar - length -valid_ratio loader: shuffle: false drop_last: false batch_size_per_card: 128 num_workers: 4
4. Install anaconda
Reference blog: https://blog.csdn.net/wyf2017/article/details/118676765
- Create a python virtual environment
conda create -n ppocr
- Switch virtual environments
source activate ppocr
5. Start training
python tools/train.py -c wjp/ch_PP-OCRv3_rec_distillation.yml -o Global.pretrained_model=wjp/ch_PP-OCRv3_rec_train/best_accuracy //-c parameter puts the configuration file address, -o parameter puts the pre-training model address
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple
6. Error reporting
(1) libGL.so.1
ImportError: libGL.so.1: cannot open shared object file: No such file or directory
- Solution:
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple opencv-python-headless
(2)Polygon
ModuleNotFoundError: No module named 'Polygon'
- Solution:
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple Polygon3
(3) lanms
ModuleNotFoundError: No module named 'lanms'
Source code download address: https://github.com/AndranikSargsyan/lanms-nova/tree/master
Compile with reference to my tutorial: http://t.csdnimg.cn/BqOW6
- Replace the __init__.py file
import numpy as np def merge_quadrangle_n9(polys, thres=0.3, precision=10000): if len(polys) == 0: return np.array([], dtype='float32') p = polys.copy() p[:, :8] *= precision ret = np.array(merge_quadrangle_n9(p, thres), dtype='float32') ret[:, :8] /= precision return ret
- Find where the Linux anaconda package is placed
pip show numpy
You will know the package installation address in this environment.
- Move the entire lanms folder of the compiled library to this address to call it
(4) UnicodeDecodeError: utf-8’ codec can’t decode byte 0xbc in position 2: invalid start byt
f = open('txt01.txt',encoding='utf-8')
Change encoding=’utf-8’ to GB2312, gbk, ISO-8859-1, you can try any one
(5)Out of memory error on GPU 0. Cannot allocate xxxxMB memory on GPU 0, xxxxGB memory has been allocated and available memory is only 0.000000B.
Keep changing the batch_size_per_card parameter in the training configuration yml file to a smaller value (divided by 2) until this error is no longer reported.