MNIST handwritten digit recognition

Foreword

For study recording

Preparatory work

Necessary libraries

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
import matplotlib.pyplot as plt

The first is the definition of hyperparameters, first each batch, whether it is gup or cpu, and how many times it is trained.

This varies from person to person. My computer is relatively easy to use and the batch cannot be raised that high. Friends who are good at computers can increase it, for example, 64

#Define hyperparameters. Parameters need to be learned by the neural network itself, and hyperparameters are artificially given.
Batch_size = 16 #Data processed in each batch
Device = torch.device("cuda" if torch.cuda.is_available() else "cpu") #gpu or cpu
Epochs = 10 #rounds

Image processing, (combination), converting images into tensor format, standardization

This is mainly to prepare for the subsequent data set.

# Build pipeline and process images
pipeline = transforms.Compose([
    transforms.ToTensor(), #Convert the image into tensor
    transforms.Normalize((0.1307,),(0.3081,)) #Standardization, the standardization of data is helpful to speed up the training of neural networks
])

MINST

The MNIST data set is a handwritten digit image data set and one of the most commonly used data sets in the field of deep learning. The data set is a grayscale image with a size of 28 pixels, 60,000 training sample images, and 10,000 test sample images. The value of each pixel represents the color depth at that location.

First, you need to download the data set. This is very simple. It can be downloaded in torch very conveniently. from this library

One is the data set and the other is the loading data set

from torchvision import datasets, transforms
from torch.utils.data import DataLoader

One is for training, one is for testing, and then the data is loaded.

# Download dataset
train_set = datasets.MNIST("data",train=True,download=True,transform=pipeline)
test_set = datasets.MNIST("data",train=False,download=True,transform=pipeline)
#Download Data
train_loader =DataLoader(train_set,batch_size=Batch_size,shuffle=True) #shuffle:shuffle
test_loader = DataLoader(test_set,batch_size=Batch_size,shuffle=True)

show data set

fig = plt.figure()
for i in range(12):
    plt.subplot(3, 4, i + 1)
    plt.tight_layout()
    plt.imshow(train_set.train_data[i], cmap='gray', interpolation='none')
    plt.title("Labels: {}".format(train_set.train_labels[i]))
    plt.xticks([])
    plt.yticks([])
plt.show()

Build a network model

The more interesting thing is to build a network model, just like building blocks.

Here are a few knowledge points. If you are interested, you can take a look

Convolutional layer

The so-called convolution operation is to use a filter convolution operation to obtain a new feature map. The filter is a small matrix that can be learned, and its function is equivalent to performing specific filtering on the image. The output of the convolutional layer is multiple feature maps. Each one corresponds to a characteristic.

Pooling layer

The pooling layer is used to reduce the dimension of the feature map, which is the dimensionality reduction part of the convolutional neural network. Two types are introduced here, one is max pooling which takes the maximum value, and the other is average pooling which takes the average value. Pooling operation can reduce the number of parameters, reduce complexity, and improve the robustness and generalization ability of the model.

Fully connected layer

The last layer of the convolutional neural network is the fully connected layer, whose main function is to convert the feature map into a vector. Input to a fully connected neural network for classification or prediction. Multiple neurons are links between upper and lower layers. The output of the fully connected layer is the result of classification or prediction. This result can be used to back propagate the model parameters by calculating the loss function to improve the accuracy of the model.

Model construction

First of all, our data set image is a 28*28 grayscale image, so the input channel of the first convolution layer should be 1, and then its output channel should be 10, and the convolution kernel should be 5*5

Then you should use the input channel of 10 to pick it up, and then let it go out with 20, and the convolution kernel is 3

Calculate what the output is now? I’ll explain it later

At this point, two fully connected layers are used to connect. The input is 20*10*10, which is reduced to 500. 500—->10 categories

explain:

It starts out as 1*28*28, and after a convolutional layer, it becomes 10*24*24, (28-5 + 1=24)

And it has to go through a pooling layer. (halved) 10*12*12

Add another convolution layer and it becomes 20*10*10

#Build network model
class Digit(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1,10,5) #1: channel of grayscale image, 10: output channel, 5: kernel
        self.conv2 = nn.Conv2d(10,20,3) #10: input channel, 20: output channel, 3: kernel
        self.fc1 = nn.Linear(20*10*10,500)#20*10*10: input channel, 500: output channel
        self.fc2 = nn.Linear(500,10) #500: input channel, 10: output channel

    def forward(self,x):
        input_size = x.size(0) #batch_size x 1 x 28 x 28
        x = self.conv1(x) #Input: batch*1*28*28, output: batch*10*24*24 28-5 + 1=24
        x = F.relu(x) #Keep shape unchanged and output batch*10*24*24
        x = F.max_pool2d(x,2,2) #Input: batch*10*24*24 Output: batch*10*12*12
        x = self.conv2(x) #Input: batch*10*12*12 Output: batch*20*10*10 12-3 + 1 = 10
        x = F.relu(x)

        x = x.view(input_size,-1) #Flattening -1: Automatically calculate dimensions, 20*10*10=2000

        x = self.fc1(x) #Input batch*2000 Output: batch*500
        x = F.relu(x)
        x = self.fc2(x) #Input: batch*500 Output: 10
        output = F.log_softmax(x,dim=1) #Calculate the probability value of each number after classification
        return output

Optimizer

Adam optimizer. This optimizer is commonly used by us. Its role is to optimize the loss function to update the parameters of the model. And it has an adaptive learning rate function, which can automatically adjust the learning rate and has better convergence and generalization capabilities.

#Model instance
model = Digit().to(Device)
#Optimizer
optimizer = optim.Adam(model.parameters()) #Adam: an optimizer

Training method

1#Deploy to device,

#Initialize to 0

#crossentropy

#Define training method
def train_model(model,device,train_loader,optimizer,epoch):
    #Model training
    model.train()
    for batch_index ,(data,target) in enumerate(train_loader):
        #Deploy to DEVICE
        data,target = data.to(device),target.to(device)
        #Gradient initialized to 0
        optimizer.zero_grad()
        #Results after training
        output = model(data)
        #Calculate loss
        loss = F.cross_entropy(output,target)#Cross entropy loss: suitable for multi-classification
        #Backpropagation: feedback results, update weights
        loss.backward()
        #Parameter optimization
        optimizer.step()
        if batch_index % 3000 == 0:
            print("Train Epoch: {} \t Loss: {:.6f}".format(epoch,loss.item()))

Test method

with torch.no_grad(): does not calculate gradients and does not perform backpropagation

#Define test method
def test_model(model,device,test_loader):
    #Model verification
    model.eval()
    #Correct rate
    correct = 0.0
    #test loss
    test_loss = 0.0
    with torch.no_grad(): #The gradient will not be calculated and backpropagation will not be performed.
        for data ,target in test_loader:
            data,target = data.to(device),target.to(device)
            #Test Data
            output = model(data)
            #Calculate test loss
            test_loss + = F.cross_entropy(output,target).item()#item:
            #Find the subscript with the largest probability value
            pred = output.max(1,keepdim=True)[1] #value, index
            #Cumulative correct rate
            correct + = pred.eq(target.view_as(pred)).sum().item()
        test_loss /= len(test_loader.dataset)
        print("Test-- Average loss : {:.4f}, Accuracy : {:.3f}\\
".format(
                test_loss,100.0 * correct / len(test_loader.dataset)))

Complete code

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
import matplotlib.pyplot as plt


#Define hyperparameters. Parameters need to be learned by the neural network itself, and hyperparameters are artificially given.
Batch_size = 16 #Data processed in each batch
Device = torch.device("cuda" if torch.cuda.is_available() else "cpu") #gpu or cpu
Epochs = 10 #rounds

# Build pipeline and process images
pipeline = transforms.Compose([
    transforms.ToTensor(), #Convert the image into tensor
    transforms.Normalize((0.1307,),(0.3081,)) #Normalization: Reduce model complexity when overfitting
])

# Download dataset
train_set = datasets.MNIST("data",train=True,download=True,transform=pipeline)
test_set = datasets.MNIST("data",train=False,download=True,transform=pipeline)

#Download Data
train_loader =DataLoader(train_set,batch_size=Batch_size,shuffle=True) #shuffle:shuffle
test_loader = DataLoader(test_set,batch_size=Batch_size,shuffle=True)

fig = plt.figure()
for i in range(12):
    plt.subplot(3, 4, i + 1)
    plt.tight_layout()
    plt.imshow(train_set.train_data[i], cmap='gray', interpolation='none')
    plt.title("Labels: {}".format(train_set.train_labels[i]))
    plt.xticks([])
    plt.yticks([])
plt.show()



#Build network model
class Digit(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1,10,5) #1: channel of grayscale image, 10: output channel, 5: kernel
        self.conv2 = nn.Conv2d(10,20,3) #10: input channel, 20: output channel, 3: kernel
        self.fc1 = nn.Linear(20*10*10,500)#20*10*10: input channel, 500: output channel
        self.fc2 = nn.Linear(500,10) #500: input channel, 10: output channel

    def forward(self,x):
        input_size = x.size(0) #batch_size x 1 x 28 x 28
        x = self.conv1(x) #Input: batch*1*28*28, output: batch*10*24*24 28-5 + 1=24
        x = F.relu(x) #Keep shape unchanged and output batch*10*24*24
        x = F.max_pool2d(x,2,2) #Input: batch*10*24*24 Output: batch*10*12*12
        x = self.conv2(x) #Input: batch*10*12*12 Output: batch*20*10*10 12-3 + 1 = 10
        x = F.relu(x)

        x = x.view(input_size,-1) #Flattening -1: Automatically calculate dimensions, 20*10*10=2000

        x = self.fc1(x) #Input batch*2000 Output: batch*500
        x = F.relu(x)
        x = self.fc2(x) #Input: batch*500 Output: 10
        output = F.log_softmax(x,dim=1) #Calculate the probability value of each number after classification
        return output

#define optimizer
model = Digit().to(Device)

optimizer = optim.Adam(model.parameters()) #Adam: an optimizer

#Define training method
def train_model(model,device,train_loader,optimizer,epoch):
    #Model training
    model.train()
    for batch_index ,(data,target) in enumerate(train_loader):
        #Deploy to DEVICE
        data,target = data.to(device),target.to(device)
        #Gradient initialized to 0
        optimizer.zero_grad()
        #Results after training
        output = model(data)
        #Calculate loss
        loss = F.cross_entropy(output,target)#Cross entropy loss: suitable for multi-classification
        #Backpropagation: feedback results, update weights
        loss.backward()
        #Parameter optimization
        optimizer.step()
        if batch_index % 3000 == 0:
            print("Train Epoch: {} \t Loss: {:.6f}".format(epoch,loss.item()))

#Define test method
def test_model(model,device,test_loader):
    #Model verification
    model.eval()
    #Correct rate
    correct=0.0
    #test loss
    test_loss = 0.0
    with torch.no_grad(): #The gradient will not be calculated and backpropagation will not be performed.
        for data ,target in test_loader:
            data,target = data.to(device),target.to(device)
            #Test Data
            output = model(data)
            #Calculate test loss
            test_loss + = F.cross_entropy(output,target).item()#item:
            #Find the subscript with the largest probability value
            pred = output.max(1,keepdim=True)[1] #value, index
            #Cumulative correct rate
            correct + = pred.eq(target.view_as(pred)).sum().item()
        test_loss /= len(test_loader.dataset)
        print("Test-- Average loss : {:.4f}, Accuracy : {:.3f}\\
".format(
                test_loss,100.0 * correct / len(test_loader.dataset)))

#Call method
for epoch in range(1,Epochs + 1): #1-->10 + 1
    train_model(model,Device,train_loader,optimizer,epoch)
    test_model(model,Device,test_loader)


torch.save(model.state_dict(), 'model.pt')

The result is 99% correct