Foreword
Convolutional neural network is a type of feedforward neural network that contains convolutional calculations and has a deep structure. It is one of the representative algorithms of deep learning. It can effectively process through convolutional layers, pooling layers, fully connected layers and other structures. Such as time series and picture data, etc. There are many concepts about convolution on the Internet, so I won’t describe them one by one here. Of course, we must start from actual problems and use code to deepen our impression. Before writing the code, let me first talk about why I want to write this article?
I also used Tensorflow.js to try image classification with others before. Although the results were obtained, my understanding and impression of the code were not deep. Later, I came into contact with PyTorch due to work and business reasons, and found that this framework was easier to use. After a while, I wanted to use this to classify the pictures I had used before. At the beginning, I also read the article to implement it, but most of the online handwriting recognition is implemented using the MNIST data set, and sometimes in the business, it is some designated irregular niche image recognition, so the following is a simple implementation of a custom image set classification .
Process
- Collect and classify pictures according to your own definition
- Read image data and classification labels, and save the data set
- Fixed image size (will be deformed), normalized to tensor
- Define hyperparameters, loss functions, optimizers, etc.
- Alchemy, repeatedly check the loss value accuracy and other indicators
- Save the model parameters and load the test image classification effect
Environment
- Python 3.8
- Torch 1.9.0
- Pillow 10.0
- Torchvision
- Numpy
- Pandas
- Matplotlib
Encoding
Before writing the code, the required images have been classified, and the above dependency packages have been installed. Since it is just a demonstration here, no pre-trained models (ResNet, VGG) are used, because Tensor is used for training, so the pictures in the folder need to be read first and converted into PIL object data or Numpy data, and then the pictures can be processed Adjust, and finally convert everything to Tensor (you can also skip PIL and convert directly to tensor). What needs to be noted here is the unified processing of gray image channels and images of different sizes. That is, the single channel of the gray image needs to be copied to create three channels, so the images are set to the same pixel size. Because in a convolutional network, the number of input channels and the input size must be consistent, otherwise errors may be reported during training.
Image data generation
This is to traverse the images in each category folder and convert them into object information data, extract all categories, and save them to designated locations respectively. Of course, you can also divide the training data, verification data, and test data here. If you need it, you can expand it and skip it here. .
# -*- coding: utf-8 -*- import os import pickle as pkl import pandas as pd from PIL import Image all_cate = [] data_set = [] directory = "./data/train" for index, data in enumerate(os.walk(directory)): root, dirs, files = data if index == 0: all_cate + = dirs else: sorted(all_cate) root_names = root.split("\") dir_name = root_names[-1] for img in files: img_path = root + "\" + img img_np = Image.open(img_path) dict = {} dict['img_np'] = img_np dict['label'] = all_cate.index(dir_name) + 1 data_set.append(dict) # Dictionary to DataFrame df = pd.DataFrame(data_set) pkl.dump(df, open('data/train_dataset.p', 'wb')) open("data/all_cate.txt", encoding="utf-8", mode="w + ").write("\\ ".join(all_cate)) print("Archiving data successfully~")
Batch data set standardization
Here, the serialized image information is read, the pixels of all images are unified (generally, it is best to configure the computer within 100px, otherwise it will be very stuck) and standard normalized, and then converted into Tensor. Then determine the number of image channels. If it is a gray image, you can copy the tensor three times to create three channels. Finally, use torch’s DataLoader to complete the loading of the data set before training.
# -*- coding: utf-8 -*- import torch from torchvision import transforms import pickle as pkl from torch.utils.data import Dataset class DataSet(Dataset): def __init__(self, pkl_file): df = pkl.load(open(pkl_file, 'rb')) self.dataFrame = df def __len__(self): return len(self.dataFrame) def __getitem__(self, item): img_np = self.dataFrame.iloc[item, 0] label = self.dataFrame.iloc[item, 1] transform = transforms.Compose([ transforms.Resize((100, 100)), # Resize the image as needed transforms.ToTensor(), transforms.Normalize([0.5], [0.5]) # Standard normalization, p1.mean p2.variance ]) image_tensor = transform(img_np) if image_tensor.shape[0] == 1: image_tensor = image_tensor.repeat(3, 1, 1) res = { 'img_tensor': image_tensor, 'label': torch.LongTensor([label-1]) # Requires actual index value } return res
Neural network model
What is created here is a convolutional neural network, which receives 3 channels, the first convolutional layer has a convolution kernel of 3×3, outputs a 25-dimensional tensor, is normalized through batch normalization (BatchNorm2d), and finally performs nonlinearity through the ReLU activation function. Transform. The first layer of pooling uses a 2×2 max pooling operation to downsample the convolved feature map. The second layer is also convolution and corresponding pooling, and finally the fully connected layer. The pooled feature map is flattened, then passed through a fully connected layer with 1024 neurons, and then nonlinearly transformed through the ReLU activation function. This is followed by a fully connected layer with 128 neurons, and finally a nonlinear transformation is performed through the ReLU activation function to output 5 neurons representing the probability distribution of the classification.
# -*- coding: utf-8 -*- import torch.nn as nn import torch import math import torch.functional as F class CNN(nn.Module): def __init__(self): super(CNN, self).__init__() self.layer1 = nn.Sequential( nn.Conv2d(3, 25, kernel_size=3), nn.BatchNorm2d(25), nn.ReLU(inplace=True) ) self.layer2 = nn.Sequential( nn.MaxPool2d(kernel_size=2, stride=2) ) self.layer3 = nn.Sequential( nn.Conv2d(25, 50, kernel_size=3), nn.BatchNorm2d(50), nn.ReLU(inplace=True) ) self.layer4 = nn.Sequential( nn.MaxPool2d(kernel_size=2, stride=2) ) self.fc = nn.Sequential( nn.Linear(50 * 23 * 23, 1024), nn.ReLU(inplace=True), nn.Linear(1024, 128), nn.ReLU(inplace=True), nn.Linear(128, 5) ) def forward(self, x): x = self.layer1(x) x = self.layer2(x) x = self.layer3(x) x = self.layer4(x) x = x.view(x.size(0), -1) x = self.fc(x) return x
Start training
# -*- coding:utf-8 -*- import torch from torch.utils.data import DataLoader import matplotlib.pyplot as plt from data_set import DataSet from torch.autograd import Variable from utils import * import cnn import torch.nn as nn import numpy as np import torch.optim as optim # Define hyperparameters batch_size = 1 learning_rate = 0.02 num_epoches = 1 #Load the image tensor training set tain_dataset = DataSet("data/train_dataset.p") train_loader = DataLoader(tain_dataset, batch_size=batch_size, shuffle=True) model = cnn.CNN() # Define loss function and optimizer criterion = nn.CrossEntropyLoss() optimizer = optim.SGD(model.parameters(), lr=learning_rate) #Train model train_loses = [] records = [] for i in range(num_epoches): for ii, data in enumerate(train_loader): img = data['img_tensor'] label = data['label'].view(-1) optimizer.zero_grad() out = model(img) loss = criterion(out, label) train_loses.append(loss.data.item()) loss.backward() optimizer.step() if ii % 50 == 0: print('epoch: {}, loop: {}, loss: {:.4}'.format(i, ii, np.mean(train_loses))) records.append([np.mean(train_loses)]) # Draw the model's loss and accuracy trend chart train_loss = [data[0] for data in records] plt.plot(train_loss, label = 'Train Loss') plt.xlabel('Steps') plt.ylabel('Loss') plt.legend() plt.show() # Model evaluation (omitted) # model.eval() # Save model torch.save(model, 'params/cnn_imgs_02.pkl')
Model detection
After the training is completed, save the parameters to the local. The following is to load the parameters to test the classification effect of other pictures. The same is the same conversion operation for the specified pictures as during training. Finally, the maximum distribution index value of the prediction result is taken out, and it can be determined according to the index. The category name is matched. The other is a tool function that displays tensor format images in pyplot after predicting the results.
# -*- coding:utf-8 -*- import torch from torch.utils.data import DataLoader import matplotlib.pyplot as plt from data_set import DataSet from utils import * import torchvision from PIL import Image from torchvision import transforms import cnn def imshow(img): img = img / 2 + 0.5 npimg = img.numpy() plt.imshow(np.transpose(npimg, (1, 2, 0))) plt.show() model = torch.load("params/cnn_imgs_02.pkl") img_path= "imgs/05.jpg" img_np = Image.open(img_path) transform = transforms.Compose([ transforms.Resize((100, 100)), transforms.ToTensor(), transforms.Normalize([0.5], [0.5]) ]) image_tensor = transform(img_np) # If it is a grayscale image if image_tensor.shape[0] == 1: image_tensor = image_tensor.repeat(3, 1, 1) image_tensor = image_tensor.view(-1, 3, 100, 100) predict = model(image_tensor) indices = torch.max(predict, 1)[1].item() all_cate = [] for line in open("data/all_cate.txt", encoding="utf-8", mode="r"): all_cate.append(line.strip()) cate_name = "" try: cate_name = all_cate[indices] except ValueError: cate_name = "Unknown" print("The recognition result is:", cate_name) # imshow(torchvision.utils.make_grid(image_tensor)) # Original image display img_np.show() exit()