CNN model training–Code practice for verification code identification

Tip: After the article is written, the table of contents can be automatically generated. For how to generate it, please refer to the help document on the right.

Article directory

  • Summary
  • Abstract
  • CNN model training–Verification code identification
    • 1. Code practice
    • 2. Clarify training needs
    • 3. Collect training data
    • 4. CNN network architecture
    • 5. Model training
    • 5. Training and testing
    • 6. Test results
      • 6.1. 1K verification code image data
      • 6.2. 1W verification code image data
      • 6.3. 10W verification code image data
    • 7. Reflection on results and ideas for subsequent improvements
  • Summarize

Abstract

This week, we mainly implement the convolutional neural network reviewed last week, and the code implements verification code recognition. The entire implementation process is mainly divided into several steps, including collecting training data, building network structure, training model, and testing model. In terms of obtaining data, it is automatically generated by computers, so that you can quickly have labeled data. The network structure is three convolutional layers and two fully connected layers. After several trainings, it can be concluded that as the amount of data increases, the accuracy of model identification becomes higher and higher.

Abstract

This week, I mainly implemented the Convolutional Neural Network that we reviewed last week to recognize captcha. The whole implementation process is mainly divided into four steps: collecting training data, building network structure, training model and testing model. In the aspect of data acquisition , I used the computer to generate automatically so that I can quickly have the labeled data. The network structure includes three layers of convolutional layers and two layers of fully connected layers. After several training sessions, we can draw a conclusion that the accuracy of model recognition is getting higher and higher with the increase of data size.

CNN model training – verification code identification

1. Code practice

Last week’s task was mainly to review the relevant knowledge of the convolutional neural network CNN model that I had learned. My understanding of this framework was refreshed again, and I was even more impressed. Of course, in addition to the framework, we also reviewed the working details of each layer of the framework (convolutional layer, activation layer, pooling layer, fully connected layer and output layer). After understanding this knowledge again, we must start collecting Information, learn relevant knowledge, and reproduce this knowledge by hand, so that you can have a deeper understanding of this knowledge.

2. Clarify training needs

Because when I was an undergraduate, due to the epidemic, I needed to do health check-ins every day. Because the process was cumbersome, I wanted to make a computer script that could automatically do health check-ins. I never expected that in the end, the need for verification code verification made the whole process impossible. Go down. After studying related courses on machine learning, the problem of verification code can be learned and identified through convolutional neural network, so this code implementation is to realize this unresolved wish.

3. Collect training data

The training of neural networks has relatively high data requirements. In order to obtain a better neural network model, it is necessary to prepare high-quality, diverse, labeled and balanced data sets. It may have been difficult to collect relevant data in the past, but now there are many data type databases, such as the ImageNet image database that is friendly to CNN. The verification code data this time can be automatically generated through the captcha encapsulation class method, so that high-quality data can be effectively obtained to satisfy the growth of the neural network. The specific generated code is as follows:

from captcha.image import ImageCaptcha
from PIL import Image
import random

NUMBER = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
ALPHABET = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L' , 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', ' Y', 'Z']

ALL_CHAR_SET = NUMBER + ALPHABET #Define the content of the verification code, which is numbers and letters
ALL_CHAR_SET_LEN = len(ALL_CHAR_SET) #The length of all data sets
MAX_CAPTCHA = 4 #Number of verification codes

# Image size
IMAGE_HEIGHT = 60
IMAGE_WIDTH = 160

def random_captcha(): #Generate the corresponding verification code text
    captcha_text = []
    for i in range(MAX_CAPTCHA):
        c = random.choice(ALL_CHAR_SET)
        captcha_text.append(c)
    return ''.join(captcha_text)

def gen_captcha_text_and_image(): #Generate verification code image
    image = ImageCaptcha()
    captcha_text = random_captcha()
    captcha_image = Image.open(image.generate(captcha_text))
    return captcha_text, captcha_image

4. CNN network architecture

According to the relevant information learned, a simple convolutional neural network is simply defined using Pytorch. The convolutional neural network mainly includes three convolutional layers and two fully connected layers. The specific code definition is as follows:

import torch.nn as nn
import captcha_setting #Verification code related parameter definition set

# Defines a convolutional neural network (CNN) model, including three convolutional layers (layer1, layer2, layer3) and two fully connected layers (fc, rfc)
class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.layer1 = nn.Sequential( #Define the first convolutional layer
            nn.Conv2d(1, 32, kernel_size=3, padding=1), #2D convolution layer, the number of input channels is 1, the number of output channels is 32, the convolution kernel size is 3, and the padding is 1
            nn.BatchNorm2d(32), #2D batch normalization layer, used to normalize the data of each batch
            nn.Dropout(0.5), #Dropout layer, randomly sets 50% of neurons to 0 to prevent overfitting
            nn.ReLU(), #ReLU activation function layer
            nn.MaxPool2d(2)) #2D maximum pooling layer, using a 2x2 pooling window
        self.layer2 = nn.Sequential( #The definitions of the following two convolutional layers are similar
            nn.Conv2d(32, 64, kernel_size=3, padding=1),
            nn.BatchNorm2d(64),
            nn.Dropout(0.5),
            nn.ReLU(),
            nn.MaxPool2d(2))
        self.layer3 = nn.Sequential(
            nn.Conv2d(64, 64, kernel_size=3, padding=1),
            nn.BatchNorm2d(64),
            nn.Dropout(0.5),
            nn.ReLU(),
            nn.MaxPool2d(2))
        self.fc = nn.Sequential(
            nn.Linear((captcha_setting.IMAGE_WIDTH//8)*(captcha_setting.IMAGE_HEIGHT//8)*64, 1024),
            # The first fully connected layer fc is defined, the number of input nodes is the content of the brackets (the size of the verification code image), and the number of output nodes is 1024.
            nn.Dropout(0.5),
            nn.ReLU())
        self.rfc = nn.Sequential(
            nn.Linear(1024, captcha_setting.MAX_CAPTCHA*captcha_setting.ALL_CHAR_SET_LEN),
            # The second fully connected layer rfc is defined, the number of input nodes is brackets 1024, and the number of output nodes is the content of brackets (the total number of possible verification codes generated)
        )

5. Model training

Model training requires understanding of two important processes, epoch and batch, which are commonly used training parameters in deep learning. Epoch and batch jointly control the granularity and organization of the training process. Among them, epoch defines the number of times the entire training data set passes through the model, while batch defines the number of samples processed during one training process. , their functions in the training process are as follows:

  • Epoch refers to a complete training process, that is, using the entire training set to train once. When a complete data set passes through the neural network once and returns once, a forward propagation and a backward propagation are performed. This process is called an epoch.
  • Batch refers to the number of samples used to train the model at one time. During the training process, a large number of training samples are usually divided into multiple batches for processing. The samples in each batch are fed into the model together, and the loss is calculated and the model parameters are updated through forward propagation and back propagation. Each stage contains multiple batches of training processes. In each batch, the model receives a batch of training samples for forward propagation and back propagation, and updates the parameters.
import torch
import torch.nn as nn
from torch.autograd import Variable
import my_dataset # Process data input to the model
from captcha_cnn_model import CNN #Load the defined cnn model

#Training parameters
num_epochs = 30
batch_size = 100
learning_rate = 0.001

def main():
    cnn = CNN() #Initialize CNN model
    cnn.train()
    print('init net')
    criterion = nn.MultiLabelSoftMarginLoss() #Define the loss function
    optimizer = torch.optim.Adam(cnn.parameters(), lr=learning_rate) #Define the optimizer and select Adam as the optimization method

    train_dataloader = my_dataset.get_train_data_loader() # Load training data
    for epoch in range(num_epochs): #How many epochs to train based on the above training parameters
        for i, (images, labels) in enumerate(train_dataloader):# Determine the number of training times based on the amount of data generated in each round
            images = Variable(images)
            labels = Variable(labels.float())
            predict_labels = cnn(images)
            loss = criterion(predict_labels, labels)
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            if (i + 1) % 10 == 0:
                print("epoch:", epoch, "step:", i, "loss:", loss.item())
            if (i + 1) % 100 == 0:
                torch.save(cnn.state_dict(), "./model.pkl") #Save 100 training results to model.pkl
                print("save model")
        print("epoch:", epoch, "step:", i, "loss:", loss.item())
    torch.save(cnn.state_dict(), "./model.pkl") #Save the trained data to model.pkl
    print("save last model")

5. Training and testing

The idea of training is to let the trained model identify the verification code pictures in the test set, extract the test verification code, compare it with its label, if it is consistent, accumulate the correct accumulator, and finally count all the data to get the final correct Rate.

import numpy as np
import torch
from torch.autograd import Variable
import captcha_setting
import my_dataset
from captcha_cnn_model import CNN
import one_hot_encoding

def main():
    cnn = CNN()
    cnn.eval()
    cnn.load_state_dict(torch.load('model.pkl'))
    print("load cnn net.")

    test_dataloader = my_dataset.get_test_data_loader() # Load the test data set
    correct = 0 # correct number
    total = 0 # Total number of verifications
    for i, (images, labels) in enumerate(test_dataloader):
        image = images
        vimage = Variable(image)
        predict_label = cnn(vimage)
#Extract the four data in the verification code
        c0 = captcha_setting.ALL_CHAR_SET[np.argmax(predict_label[0, 0:captcha_setting.ALL_CHAR_SET_LEN].data.numpy())]
        c1 = captcha_setting.ALL_CHAR_SET[np.argmax(predict_label[0, captcha_setting.ALL_CHAR_SET_LEN:2 * captcha_setting.ALL_CHAR_SET_LEN].data.numpy())]
        c2 = captcha_setting.ALL_CHAR_SET[np.argmax(predict_label[0, 2 * captcha_setting.ALL_CHAR_SET_LEN:3 * captcha_setting.ALL_CHAR_SET_LEN].data.numpy())]
        c3 = captcha_setting.ALL_CHAR_SET[np.argmax(predict_label[0, 3 * captcha_setting.ALL_CHAR_SET_LEN:4 * captcha_setting.ALL_CHAR_SET_LEN].data.numpy())]
        predict_label = '%s%s%s%s' % (c0, c1, c2, c3)
        true_label = one_hot_encoding.decode(labels.numpy()[0])
        total + = labels.size(0)
        if(predict_label == true_label):
            correct + = 1
        if(total 0==0):
            print('Test Accuracy of the model on the %d test images: %f %%' % (total, 100 * correct / total))
    print('Test Accuracy of the model on the %d test images: %f %%' % (total, 100 * correct / total))

6. Test results

6.1, 1K verification code image data

The model data trained by generating 1K verification code image data was tested. The test set data was 103 verification code data, as shown in the figure below. After testing, the final accuracy rate was 0%, which was relatively low. .

6.2, 1W verification code image data

The model data trained by generating 1W verification code image data was tested. The test set data was 103 verification code data, as shown in the figure below. After testing, the final accuracy rate was 3.88%, and the accuracy rate is still relatively high. Low.
![Insert image description here](https://img-blog.csdnimg.cn/a93c6aec85444bbbb6118e44a8523949 .png

6.3, 10W verification code image data

The model data trained by generating 1W verification code image data was tested. The test set data was 103 verification code data, as shown in the figure below. After testing, the final accuracy rate was 74.76%, which is relatively high. .
Please add image description

7. Reflection on results and ideas for subsequent improvements

Different performance models obtained through training with different samples. The conclusion obtained after testing is that when the model is determined, the entire CNN structure is as shown in the figure below. As the order of magnitude of the data increases, the performance of the entire model in identifying verification codes continues to improve. improve. However, because the speed of using the CPU to train the model is very slow, it is very slow to obtain the data results. In the end, it was changed to use the GPU to train the model, which is much faster. Subsequent training will continue to use the GPU for training, which can be faster. got the answer.

Later, I want to verify the impact of the depth of the neural network on the model performance by changing the number of layers of the neural network model. In addition, I also want to introduce the previously learned ResNet and DenseNet related frameworks to verify its improvement in application.

Summary

These days, I have been constantly implementing CNN models to enable CNN to recognize verification codes. Because I did not learn the relevant knowledge when I was an undergraduate, I got stuck here. This time after learning CNN, I regenerated this idea. By constantly searching for relevant knowledge and querying the principles of related functions, I have a better understanding of the implementation of the entire code. I believe that machine learning can be used to realize more interesting ideas in the future. Of course, the GPU will be used to implement training in the future. It really takes a lot of time to use the CPU to train, which will make me remember it.