Tip: After the article is written, the table of contents can be automatically generated. For how to generate it, please refer to the help document on the right.
Article directory
- Summary
- Abstract
- CNN model training–Verification code identification
-
- 1. Code practice
- 2. Clarify training needs
- 3. Collect training data
- 4. CNN network architecture
- 5. Model training
- 5. Training and testing
- 6. Test results
-
- 6.1. 1K verification code image data
- 6.2. 1W verification code image data
- 6.3. 10W verification code image data
- 7. Reflection on results and ideas for subsequent improvements
- Summarize
Abstract
This week, we mainly implement the convolutional neural network reviewed last week, and the code implements verification code recognition. The entire implementation process is mainly divided into several steps, including collecting training data, building network structure, training model, and testing model. In terms of obtaining data, it is automatically generated by computers, so that you can quickly have labeled data. The network structure is three convolutional layers and two fully connected layers. After several trainings, it can be concluded that as the amount of data increases, the accuracy of model identification becomes higher and higher.
Abstract
This week, I mainly implemented the Convolutional Neural Network that we reviewed last week to recognize captcha. The whole implementation process is mainly divided into four steps: collecting training data, building network structure, training model and testing model. In the aspect of data acquisition , I used the computer to generate automatically so that I can quickly have the labeled data. The network structure includes three layers of convolutional layers and two layers of fully connected layers. After several training sessions, we can draw a conclusion that the accuracy of model recognition is getting higher and higher with the increase of data size.
CNN model training – verification code identification
1. Code practice
Last week’s task was mainly to review the relevant knowledge of the convolutional neural network CNN model that I had learned. My understanding of this framework was refreshed again, and I was even more impressed. Of course, in addition to the framework, we also reviewed the working details of each layer of the framework (convolutional layer, activation layer, pooling layer, fully connected layer and output layer). After understanding this knowledge again, we must start collecting Information, learn relevant knowledge, and reproduce this knowledge by hand, so that you can have a deeper understanding of this knowledge.
2. Clarify training needs
Because when I was an undergraduate, due to the epidemic, I needed to do health check-ins every day. Because the process was cumbersome, I wanted to make a computer script that could automatically do health check-ins. I never expected that in the end, the need for verification code verification made the whole process impossible. Go down. After studying related courses on machine learning, the problem of verification code can be learned and identified through convolutional neural network, so this code implementation is to realize this unresolved wish.
3. Collect training data
The training of neural networks has relatively high data requirements. In order to obtain a better neural network model, it is necessary to prepare high-quality, diverse, labeled and balanced data sets. It may have been difficult to collect relevant data in the past, but now there are many data type databases, such as the ImageNet image database that is friendly to CNN. The verification code data this time can be automatically generated through the captcha encapsulation class method, so that high-quality data can be effectively obtained to satisfy the growth of the neural network. The specific generated code is as follows:
from captcha.image import ImageCaptcha from PIL import Image import random NUMBER = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'] ALPHABET = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L' , 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', ' Y', 'Z'] ALL_CHAR_SET = NUMBER + ALPHABET #Define the content of the verification code, which is numbers and letters ALL_CHAR_SET_LEN = len(ALL_CHAR_SET) #The length of all data sets MAX_CAPTCHA = 4 #Number of verification codes # Image size IMAGE_HEIGHT = 60 IMAGE_WIDTH = 160 def random_captcha(): #Generate the corresponding verification code text captcha_text = [] for i in range(MAX_CAPTCHA): c = random.choice(ALL_CHAR_SET) captcha_text.append(c) return ''.join(captcha_text) def gen_captcha_text_and_image(): #Generate verification code image image = ImageCaptcha() captcha_text = random_captcha() captcha_image = Image.open(image.generate(captcha_text)) return captcha_text, captcha_image
4. CNN network architecture
According to the relevant information learned, a simple convolutional neural network is simply defined using Pytorch. The convolutional neural network mainly includes three convolutional layers and two fully connected layers. The specific code definition is as follows:
import torch.nn as nn import captcha_setting #Verification code related parameter definition set # Defines a convolutional neural network (CNN) model, including three convolutional layers (layer1, layer2, layer3) and two fully connected layers (fc, rfc) class CNN(nn.Module): def __init__(self): super(CNN, self).__init__() self.layer1 = nn.Sequential( #Define the first convolutional layer nn.Conv2d(1, 32, kernel_size=3, padding=1), #2D convolution layer, the number of input channels is 1, the number of output channels is 32, the convolution kernel size is 3, and the padding is 1 nn.BatchNorm2d(32), #2D batch normalization layer, used to normalize the data of each batch nn.Dropout(0.5), #Dropout layer, randomly sets 50% of neurons to 0 to prevent overfitting nn.ReLU(), #ReLU activation function layer nn.MaxPool2d(2)) #2D maximum pooling layer, using a 2x2 pooling window self.layer2 = nn.Sequential( #The definitions of the following two convolutional layers are similar nn.Conv2d(32, 64, kernel_size=3, padding=1), nn.BatchNorm2d(64), nn.Dropout(0.5), nn.ReLU(), nn.MaxPool2d(2)) self.layer3 = nn.Sequential( nn.Conv2d(64, 64, kernel_size=3, padding=1), nn.BatchNorm2d(64), nn.Dropout(0.5), nn.ReLU(), nn.MaxPool2d(2)) self.fc = nn.Sequential( nn.Linear((captcha_setting.IMAGE_WIDTH//8)*(captcha_setting.IMAGE_HEIGHT//8)*64, 1024), # The first fully connected layer fc is defined, the number of input nodes is the content of the brackets (the size of the verification code image), and the number of output nodes is 1024. nn.Dropout(0.5), nn.ReLU()) self.rfc = nn.Sequential( nn.Linear(1024, captcha_setting.MAX_CAPTCHA*captcha_setting.ALL_CHAR_SET_LEN), # The second fully connected layer rfc is defined, the number of input nodes is brackets 1024, and the number of output nodes is the content of brackets (the total number of possible verification codes generated) )
5. Model training
Model training requires understanding of two important processes, epoch and batch, which are commonly used training parameters in deep learning. Epoch and batch jointly control the granularity and organization of the training process. Among them, epoch defines the number of times the entire training data set passes through the model, while batch defines the number of samples processed during one training process. , their functions in the training process are as follows:
- Epoch refers to a complete training process, that is, using the entire training set to train once. When a complete data set passes through the neural network once and returns once, a forward propagation and a backward propagation are performed. This process is called an epoch.
- Batch refers to the number of samples used to train the model at one time. During the training process, a large number of training samples are usually divided into multiple batches for processing. The samples in each batch are fed into the model together, and the loss is calculated and the model parameters are updated through forward propagation and back propagation. Each stage contains multiple batches of training processes. In each batch, the model receives a batch of training samples for forward propagation and back propagation, and updates the parameters.
import torch import torch.nn as nn from torch.autograd import Variable import my_dataset # Process data input to the model from captcha_cnn_model import CNN #Load the defined cnn model #Training parameters num_epochs = 30 batch_size = 100 learning_rate = 0.001 def main(): cnn = CNN() #Initialize CNN model cnn.train() print('init net') criterion = nn.MultiLabelSoftMarginLoss() #Define the loss function optimizer = torch.optim.Adam(cnn.parameters(), lr=learning_rate) #Define the optimizer and select Adam as the optimization method train_dataloader = my_dataset.get_train_data_loader() # Load training data for epoch in range(num_epochs): #How many epochs to train based on the above training parameters for i, (images, labels) in enumerate(train_dataloader):# Determine the number of training times based on the amount of data generated in each round images = Variable(images) labels = Variable(labels.float()) predict_labels = cnn(images) loss = criterion(predict_labels, labels) optimizer.zero_grad() loss.backward() optimizer.step() if (i + 1) % 10 == 0: print("epoch:", epoch, "step:", i, "loss:", loss.item()) if (i + 1) % 100 == 0: torch.save(cnn.state_dict(), "./model.pkl") #Save 100 training results to model.pkl print("save model") print("epoch:", epoch, "step:", i, "loss:", loss.item()) torch.save(cnn.state_dict(), "./model.pkl") #Save the trained data to model.pkl print("save last model")
5. Training and testing
The idea of training is to let the trained model identify the verification code pictures in the test set, extract the test verification code, compare it with its label, if it is consistent, accumulate the correct accumulator, and finally count all the data to get the final correct Rate.
import numpy as np import torch from torch.autograd import Variable import captcha_setting import my_dataset from captcha_cnn_model import CNN import one_hot_encoding def main(): cnn = CNN() cnn.eval() cnn.load_state_dict(torch.load('model.pkl')) print("load cnn net.") test_dataloader = my_dataset.get_test_data_loader() # Load the test data set correct = 0 # correct number total = 0 # Total number of verifications for i, (images, labels) in enumerate(test_dataloader): image = images vimage = Variable(image) predict_label = cnn(vimage) #Extract the four data in the verification code c0 = captcha_setting.ALL_CHAR_SET[np.argmax(predict_label[0, 0:captcha_setting.ALL_CHAR_SET_LEN].data.numpy())] c1 = captcha_setting.ALL_CHAR_SET[np.argmax(predict_label[0, captcha_setting.ALL_CHAR_SET_LEN:2 * captcha_setting.ALL_CHAR_SET_LEN].data.numpy())] c2 = captcha_setting.ALL_CHAR_SET[np.argmax(predict_label[0, 2 * captcha_setting.ALL_CHAR_SET_LEN:3 * captcha_setting.ALL_CHAR_SET_LEN].data.numpy())] c3 = captcha_setting.ALL_CHAR_SET[np.argmax(predict_label[0, 3 * captcha_setting.ALL_CHAR_SET_LEN:4 * captcha_setting.ALL_CHAR_SET_LEN].data.numpy())] predict_label = '%s%s%s%s' % (c0, c1, c2, c3) true_label = one_hot_encoding.decode(labels.numpy()[0]) total + = labels.size(0) if(predict_label == true_label): correct + = 1 if(total 0==0): print('Test Accuracy of the model on the %d test images: %f %%' % (total, 100 * correct / total)) print('Test Accuracy of the model on the %d test images: %f %%' % (total, 100 * correct / total))
6. Test results
6.1, 1K verification code image data
The model data trained by generating 1K verification code image data was tested. The test set data was 103 verification code data, as shown in the figure below. After testing, the final accuracy rate was 0%, which was relatively low. .
6.2, 1W verification code image data
The model data trained by generating 1W verification code image data was tested. The test set data was 103 verification code data, as shown in the figure below. After testing, the final accuracy rate was 3.88%, and the accuracy rate is still relatively high. Low.
6.3, 10W verification code image data
The model data trained by generating 1W verification code image data was tested. The test set data was 103 verification code data, as shown in the figure below. After testing, the final accuracy rate was 74.76%, which is relatively high. .
7. Reflection on results and ideas for subsequent improvements
Different performance models obtained through training with different samples. The conclusion obtained after testing is that when the model is determined, the entire CNN structure is as shown in the figure below. As the order of magnitude of the data increases, the performance of the entire model in identifying verification codes continues to improve. improve. However, because the speed of using the CPU to train the model is very slow, it is very slow to obtain the data results. In the end, it was changed to use the GPU to train the model, which is much faster. Subsequent training will continue to use the GPU for training, which can be faster. got the answer.
Later, I want to verify the impact of the depth of the neural network on the model performance by changing the number of layers of the neural network model. In addition, I also want to introduce the previously learned ResNet and DenseNet related frameworks to verify its improvement in application.
Summary
These days, I have been constantly implementing CNN models to enable CNN to recognize verification codes. Because I did not learn the relevant knowledge when I was an undergraduate, I got stuck here. This time after learning CNN, I regenerated this idea. By constantly searching for relevant knowledge and querying the principles of related functions, I have a better understanding of the implementation of the entire code. I believe that machine learning can be used to realize more interesting ideas in the future. Of course, the GPU will be used to implement training in the future. It really takes a lot of time to use the CPU to train, which will make me remember it.