The thing about pytorch is to spy on the pytorch mechanism from handwritten font recognition

Friends, parents and folks, hello! ! !

This is a very very very basic blog about deep learning! ! ! So it may be useful for a buddy who is just getting started or has not yet started deep learning. You can bypass this blog, hehe.

The content involved in this blog is as follows:

1. Peep into the working mechanism of pytorch from MNIST handwritten font recognition

2. Summary of personal learning experience

Okay, enough nonsense, let’s go straight to the dry goods.

1. Peep into the working mechanism of pytorch from LeNet handwritten font recognition

From the current point of view, the LeNet handwritten font recognition network is a very simple deep learning network model. This model was proposed by the Turing Award winner Yann LeCun in 1998 and applied to specific practices (Tekura! !!). There are many articles explaining the network structure of the LeNet model on CSDN, so I won’t go into details here. Note: If you see this, you don’t know what the LeNet network structure is, you must definitely read some other bloggers’ blogs, and then remember to come back to this blog. (Because it will get my blog views +1)

In my opinion, any network model is composed of three parts, namely data preprocessing (size change, normalization, etc., if you are training the network based on a private dataset, you also need to rewrite Class datasets ( The content of this part will be mentioned later (maybe it will be the next blog), please don’t worry)), the construction of the network model, and the visualization of the model training process.

Next, let’s take a look at how to build a LeNet handwritten font recognition network.

Part1. Data preprocessing part:

1. Instantiate data processing (Transform) operations

Here, you need to use the transforms package in torchvision, generally use the following code to implement specific data processing operations (Resize, Crop, BN, etc.)

import torchvision.transforms as tf
transforms = tf.Compose([tf.Resize([32, 32]), #change the shape of the image to [32,32]
                                 tf.ToTensor()]) #Convert the image into tensor data that the model can handle

Compose ([]) is to combine multiple data processing operations together. It is necessary to combine multiple data processing operations (such as normalization, BN, etc.) into a list form, and then pass it to Compose. Different data processing in the list Actions need to be separated by commas.

2. Download the MNIST dataset from the official website and use the datasets package in torchvision

(from torchvision import datasets)

datasets has built-in multiple datasets, you can use

dir(datasets)

statement to see which datasets are built in datasets (such as CIFAR10, CIFAR100, FashionMNIST, etc.). (It is recommended to run this command on Jupyter Notebook)

Next, you can use the following code to download the required training set and validation set:

from torchvision import datasets
data_path = "D:\pycharm-projects\handwriteidentity\minsit"
transforms = tf. Compose([tf. Resize([32, 32]),
                        tf.ToTensor()])
data_val = datasets.MNIST(data_path, train=False, download=True, transform=transforms)

Let’s first analyze the above code. The first line is to import the required library, the second line is to define the save path of the dataset, the third line is the preprocessing operation of the instantiation, and the fourth line is to download the MNIST dataset. As you can see, The core code is the fourth line. Let’s look at the picture below first, and we can see that its initialization function contains 5 parameters. root refers to the path where the dataset is saved (string data type), and train refers to whether the currently downloaded dataset is used for model training or not. Model verification, when train = True, it means that the currently downloaded data set is used for model training, otherwise it is used for model verification. download = True means to download the current dataset to the path specified by root. Finally, we use a variable to point to the currently downloaded dataset, which is convenient for subsequent access to the dataset using the data loader. So far, we have talked about how to use datasets.

2.1 About datasets, Dataset and Dataloader

A library named utils is provided in pytorch, and the following code can be used to import the data loader and Dataset

from torch.utils.data import DataLoader,Dataset

Datasets comes from torchvision, which contains many built-in data sets, but Dataset does not have this function; in addition, when using private data sets for model training, you need to customize Dataset (for how to define, please go down look)

2.2 About Dataset and DataLoader, my understanding

When the model is trained, we need to pass data to the model. I understand that there are two methods. The first is that no matter how many pictures there are in the data set, I need to write a lot of programs to take out the pictures one by one from the data set folder and label (which is a big headache) to feed the model; the second is to first convert the dataset into an iterable object, then use a variable to point to the iterable object, and then use the data loader (DataLoader ) to extract pictures and labels from this iterable object according to Batch_size (equivalent to saving a lot of work in method 1), and then feed the data to the model. It’s like you are fishing with a rod and others are fishing with a net. Of course, the latter is more efficient.

Part2. Construction of the network model:

The construction of the network model is simply divided into two parts, one is to instantiate the model, and the other is to define the forward propagation function

First, let’s see how to instantiate a model

Premise: The pytorch framework points out that when you customize a model, you need to inherit nn.module, which is the key to understanding the problem.

In order to better explain how to instantiate objects (such as what function to use?), I recommend you to read this blog

Link to this article: (Declaration: I also gained a lot after reading this blog, Respect!!!) The usage of __init__ in Python and understanding the usage of ___init__ in python_luzhan66’s blog-CSDN blog

Now, you should understand how to use the magic function, that is, the __init__() initialization function to instantiate an object, so I directly attach the code I wrote about the model structure:

import torch.nn as nn


class LeNet5(nn.Module):
    def __init__(self):
        super(LeNet5, self).__init__()
        self.conv1 = nn.Conv2d(1, 6, kernel_size=5, stride=1)
        self.pooling2 = nn.MaxPool2d(kernel_size=2, stride=2)
        self.conv3 = nn.Conv2d(in_channels=6, out_channels=16, kernel_size=5, stride=1)
        self.pooling4 = nn.MaxPool2d(kernel_size=2, stride=2)
        self.linear1 = nn.Linear(in_features=400, out_features=120)
        self.linear2 = nn.Linear(in_features=120, out_features=84)
        self.linear3 = nn.Linear(in_features=84, out_features=10)
        self.softmax = nn.Softmax(dim=1)
        self.relu = nn.ReLU()

    def forward(self, x):
        out = self.conv1(x)
        out = self.relu(out)
        out = self. pooling2(out)
        out = self.conv3(out)
        out = self.relu(out)
        out = self. pooling4(out)
        out = out. view(-1, 400)
        out = self. linear1(out)
        out = self. linear2(out)
        out = self. linear3(out)
        out = self. softmax(out)
        return out. float()

It can be seen that we need to define the attributes of the model in the initialization function, that is, whether we need to use convolutional layers, pooling layers, fully connected layers, activation functions, etc., and then instantiate these attributes through self. After we complete the properties of the object, the next step is how to use the defined properties to implement some operations, such as convolution first and then pooling, or first pooling and then convolution (these are implemented in the forwad() function order in which data is propagated). We can also use the properties of the defined class in other functions in the class. Of course, you may still have questions about super(LeNet5, self).__init__(), we will put this later or describe it in the next blog! ! !

Part3. Visualization of model training results

In this part, I only focus on two points, one is the accuracy of model training, and the other is the loss function loss of the model. The tqdm library is used to display the training results of the model on the terminal.

It should be noted that the tensor data type output by the model needs to be converted into a constant when performing loss calculation and accuracy rate calculation (I can’t think of other words to describe it). There is nothing else to emphasize.

Finally, here is the full code for this article.

import torch
from torch.utils.data import DataLoader
import torchvision.transforms as tf
from torchvision import datasets
from model import LeNet5
from tqdm import tqdm #This library can display the training process in a visual way of terminal progress bar
# Define the path to save the dataset
data_path = "D:\pycharm-projects\handwriteidentity\minsit"
#Use cuda to accelerate
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
#Define the dataset preprocessing method
transforms = tf. Compose([tf. Resize([32, 32]),
                        tf.ToTensor()])

#Use the dataset in torchvision to load the MNIST dataset, which is divided into training set and verification set
data_train = datasets.MNIST(data_path, train=True, download=True, transform=transforms)
data_val = datasets.MNIST(data_path, train=False, download=True, transform=transforms)

#Use the dataloader data loader to realize batch loading of data
train_data = DataLoader(dataset=data_train, batch_size=50, shuffle=True)
val_data = DataLoader(dataset=data_val, batch_size=50, shuffle=True)

#Import the written LeNet5 model
model = LeNet5().to(device)

#define hyperparameters
epoches = 50
lr = 1e-2

#Define loss function and optimizer
loss_fn = torch.nn.CrossEntropyLoss()

optimizer = torch.optim.Adam(model.parameters())
#You can try to change the optimizer to SGD type

#Define the training method

for epoch in range(epochs):
    pbar = tqdm(train_data)
    # model. train(True)
    for img, label in pbar:
        img = img.to(device)
        label = label.to(device)
        optimizer. zero_grad()
        model_out = model(img)
        loss = loss_fn(model_out, label)
        _, pred = torch.max(model_out, dim=1)
        train_acc = (pred == label).sum() / label.shape[0]
        loss. backward()
        optimizer. step()
        pbar.set_description("Processing [%d/%d] loss:%.4f, acc:%.4f" % (epoch, epochs, loss.item(), train_acc.item()))


    with torch.no_grad():
        if epoch % 10 == 0:
            torch.save(model.state_dict(), f"model_epoch_{epoch}.pth")
            print("model saved success")

Tips:

1. During model training, be sure to pay attention to the order among optimizer.zero_grad(), loss.backward(), and optimizer.step(). Make sure that you need to clear the last gradient every time you go through a for loop. zero.

2. loss.backward() will calculate the gradient of the weight in the loss function, and optimizer.step() will perform gradient descent according to the calculated current gradient according to the rules set by the optimizer, so that the loss tends to converge.

3. In fact, I still want to write a lot of things, but due to time constraints (mainly advanced ones can’t!!!), I’ll write so much for now.

The knowledge points of the article match the official knowledge files, and you can further learn related knowledge. OpenCV skill tree Deep learning in OpenCV Image classification 17874 people are learning the system