[PyTorch Convolution] Practical customized image classification

Foreword

Convolutional neural network is a type of feedforward neural network that contains convolutional calculations and has a deep structure. It is one of the representative algorithms of deep learning. It can effectively process through convolutional layers, pooling layers, fully connected layers and other structures. Such as time series and picture data, etc. There are many concepts about convolution on the Internet, so I won’t describe them one by one here. Of course, we must start from actual problems and use code to deepen our impression. Before writing the code, let me first talk about why I want to write this article?

I also used Tensorflow.js to try image classification with others before. Although the results were obtained, my understanding and impression of the code were not deep. Later, I came into contact with PyTorch due to work and business reasons, and found that this framework was easier to use. After a while, I wanted to use this to classify the pictures I had used before. At the beginning, I also read the article to implement it, but most of the online handwriting recognition is implemented using the MNIST data set, and sometimes in the business, it is some designated irregular niche image recognition, so the following is a simple implementation of a custom image set classification .

Process

Collect and classify pictures according to your own definition
Read image data and classification labels, and save the data set
Fixed image size (will be deformed), normalized to tensor
Define hyperparameters, loss functions, optimizers, etc.
Alchemy, repeatedly check the loss value accuracy and other indicators
Save the model parameters and load the test image classification effect

Environment

Python 3.8
Torch 1.9.0
Pillow 10.0
Torchvision
Numpy
Pandas
Matplotlib

Encoding

Before writing the code, the required images have been classified, and the above dependency packages have been installed. Since it is just a demonstration here, no pre-trained models (ResNet, VGG) are used, because Tensor is used for training, so the pictures in the folder need to be read first and converted into PIL object data or Numpy data, and then the pictures can be processed Adjust, and finally convert everything to Tensor (you can also skip PIL and convert directly to tensor). What needs to be noted here is the unified processing of gray image channels and images of different sizes. That is, the single channel of the gray image needs to be copied to create three channels, so the images are set to the same pixel size. Because in a convolutional network, the number of input channels and the input size must be consistent, otherwise errors may be reported during training.

Image data generation

This is to traverse the images in each category folder and convert them into object information data, extract all categories, and save them to designated locations respectively. Of course, you can also divide the training data, verification data, and test data here. If you need it, you can expand it and skip it here. .

# -*- coding: utf-8 -*-
import os
import pickle as pkl
import pandas as pd
from PIL import Image

all_cate = []
data_set = []
directory = "./data/train"
for index, data in enumerate(os.walk(directory)):
    root, dirs, files = data

    if index == 0:
        all_cate + = dirs
    else:
        sorted(all_cate)

        root_names = root.split("\")
        dir_name = root_names[-1]

        for img in files:
            img_path = root + "\" + img
            img_np = Image.open(img_path)
            dict = {}
            dict['img_np'] = img_np
            dict['label'] = all_cate.index(dir_name) + 1
            data_set.append(dict)

# Dictionary to DataFrame
df = pd.DataFrame(data_set)
pkl.dump(df, open('data/train_dataset.p', 'wb'))
open("data/all_cate.txt", encoding="utf-8", mode="w + ").write("\\
".join(all_cate))

print("Archiving data successfully~")

Batch data set standardization

Here, the serialized image information is read, the pixels of all images are unified (generally, it is best to configure the computer within 100px, otherwise it will be very stuck) and standard normalized, and then converted into Tensor. Then determine the number of image channels. If it is a gray image, you can copy the tensor three times to create three channels. Finally, use torch’s DataLoader to complete the loading of the data set before training.

# -*- coding: utf-8 -*-
import torch
from torchvision import transforms
import pickle as pkl
from torch.utils.data import Dataset

class DataSet(Dataset):

    def __init__(self, pkl_file):
        df = pkl.load(open(pkl_file, 'rb'))
        self.dataFrame = df

    def __len__(self):
        return len(self.dataFrame)

    def __getitem__(self, item):

        img_np = self.dataFrame.iloc[item, 0]
        label = self.dataFrame.iloc[item, 1]

        transform = transforms.Compose([
            transforms.Resize((100, 100)), # Resize the image as needed
            transforms.ToTensor(),
            transforms.Normalize([0.5], [0.5]) # Standard normalization, p1.mean p2.variance
        ])
        image_tensor = transform(img_np)

        if image_tensor.shape[0] == 1:
            image_tensor = image_tensor.repeat(3, 1, 1)

        res = {
            'img_tensor': image_tensor,
            'label': torch.LongTensor([label-1]) # Requires actual index value
        }

        return res

Neural network model

What is created here is a convolutional neural network, which receives 3 channels, the first convolutional layer has a convolution kernel of 3×3, outputs a 25-dimensional tensor, is normalized through batch normalization (BatchNorm2d), and finally performs nonlinearity through the ReLU activation function. Transform. The first layer of pooling uses a 2×2 max pooling operation to downsample the convolved feature map. The second layer is also convolution and corresponding pooling, and finally the fully connected layer. The pooled feature map is flattened, then passed through a fully connected layer with 1024 neurons, and then nonlinearly transformed through the ReLU activation function. This is followed by a fully connected layer with 128 neurons, and finally a nonlinear transformation is performed through the ReLU activation function to output 5 neurons representing the probability distribution of the classification.

# -*- coding: utf-8 -*-
import torch.nn as nn
import torch
import math
import torch.functional as F

class CNN(nn.Module):

    def __init__(self):
        super(CNN, self).__init__()

        self.layer1 = nn.Sequential(
            nn.Conv2d(3, 25, kernel_size=3),
            nn.BatchNorm2d(25),
            nn.ReLU(inplace=True)
        )

        self.layer2 = nn.Sequential(
            nn.MaxPool2d(kernel_size=2, stride=2)
        )

        self.layer3 = nn.Sequential(
            nn.Conv2d(25, 50, kernel_size=3),
            nn.BatchNorm2d(50),
            nn.ReLU(inplace=True)
        )

        self.layer4 = nn.Sequential(
            nn.MaxPool2d(kernel_size=2, stride=2)
        )

        self.fc = nn.Sequential(
            nn.Linear(50 * 23 * 23, 1024),
            nn.ReLU(inplace=True),
            nn.Linear(1024, 128),
            nn.ReLU(inplace=True),
            nn.Linear(128, 5)
        )

    def forward(self, x):
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)
        x = x.view(x.size(0), -1)

        x = self.fc(x)

        return x

Start training

# -*- coding:utf-8 -*-
import torch
from torch.utils.data import DataLoader
import matplotlib.pyplot as plt
from data_set import DataSet
from torch.autograd import Variable
from utils import *
import cnn
import torch.nn as nn
import numpy as np
import torch.optim as optim

# Define hyperparameters
batch_size = 1
learning_rate = 0.02
num_epoches = 1

#Load the image tensor training set
tain_dataset = DataSet("data/train_dataset.p")
train_loader = DataLoader(tain_dataset, batch_size=batch_size, shuffle=True)

model = cnn.CNN()

# Define loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=learning_rate)

#Train model
train_loses = []
records = []
for i in range(num_epoches):
    for ii, data in enumerate(train_loader):
        img = data['img_tensor']
        label = data['label'].view(-1)

        optimizer.zero_grad()
        out = model(img)
        loss = criterion(out, label)
        train_loses.append(loss.data.item())
        loss.backward()
        optimizer.step()

        if ii % 50 == 0:
            print('epoch: {}, loop: {}, loss: {:.4}'.format(i, ii, np.mean(train_loses)))

        records.append([np.mean(train_loses)])

# Draw the model's loss and accuracy trend chart
train_loss = [data[0] for data in records]
plt.plot(train_loss, label = 'Train Loss')
plt.xlabel('Steps')
plt.ylabel('Loss')
plt.legend()
plt.show()

# Model evaluation (omitted)
# model.eval()

# Save model
torch.save(model, 'params/cnn_imgs_02.pkl')

Model detection

After the training is completed, save the parameters to the local. The following is to load the parameters to test the classification effect of other pictures. The same is the same conversion operation for the specified pictures as during training. Finally, the maximum distribution index value of the prediction result is taken out, and it can be determined according to the index. The category name is matched. The other is a tool function that displays tensor format images in pyplot after predicting the results.

# -*- coding:utf-8 -*-
import torch
from torch.utils.data import DataLoader
import matplotlib.pyplot as plt
from data_set import DataSet
from utils import *
import torchvision
from PIL import Image
from torchvision import transforms
import cnn


def imshow(img):
    img = img / 2 + 0.5
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    plt.show()

model = torch.load("params/cnn_imgs_02.pkl")

img_path= "imgs/05.jpg"
img_np = Image.open(img_path)
transform = transforms.Compose([
    transforms.Resize((100, 100)),
    transforms.ToTensor(),
    transforms.Normalize([0.5], [0.5])
])
image_tensor = transform(img_np)

# If it is a grayscale image
if image_tensor.shape[0] == 1:
    image_tensor = image_tensor.repeat(3, 1, 1)

image_tensor = image_tensor.view(-1, 3, 100, 100)

predict = model(image_tensor)
indices = torch.max(predict, 1)[1].item()

all_cate = []
for line in open("data/all_cate.txt", encoding="utf-8", mode="r"):
    all_cate.append(line.strip())

cate_name = ""
try:
    cate_name = all_cate[indices]
except ValueError:
    cate_name = "Unknown"

print("The recognition result is:", cate_name)
# imshow(torchvision.utils.make_grid(image_tensor))
# Original image display
img_np.show()
exit()