Week P1: Implementing mnist handwritten digit recognition

  • This article isThe learning record blog in the 365-day deep learning training camp
  • Reference article: [365-day deep learning training camp – Week P1: Implementing mnist handwritten digit recognition] (Pytorch actual combat | Week P1: Implement mnist handwritten digit recognition (qq.com))**
  • Original author: Classmate K | Tutoring, project customization

Table of Contents

1. Code and running results

1.Preliminary preparation

2. Build a simple CNN network

3.Train the model

4. Results visualization

2. Personal summary:

Experience in using different layers when building neural networks:

nn.Conv2d:

nn.MaxPool2d:

nn.ReLU:

nn.Linear:

nn.Sequential:


1. Code and running results

1. Early preparation

import torch
print(torch.__version__)

import torch
import torch.nn as nn
import matplotlib.pyplot as plt
import torchvision

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

device

train_ds = torchvision.datasets.MNIST('data',
                                      train=True,
                                      transform=torchvision.transforms.ToTensor(), # Convert data type to Tensor
                                      download=True)

test_ds = torchvision.datasets.MNIST('data',
                                      train=False,
                                      transform=torchvision.transforms.ToTensor(), # Convert data type to Tensor
                                      download=True)

batch_size = 32

train_dl = torch.utils.data.DataLoader(train_ds,
                                       batch_size=batch_size,
                                       shuffle=True)

test_dl = torch.utils.data.DataLoader(test_ds,
                                       batch_size=batch_size)

# Take a batch to view the data format
#The shape of the data is: [batch_size, channel, height, weight]
# Among them, batch_size is set by yourself, channel, height and weight are the number of channels, height and width of the image respectively.
imgs, labels = next(iter(train_dl))
print(imgs.shape)

import numpy as np

#Specify the image size, the image size is a drawing of 20 width and 5 height (in inches)
plt.figure(figsize=(20, 5))
for i, imgs in enumerate(imgs[:20]):
    # Dimension reduction
    npimg = np.squeeze(imgs.numpy())
    # Divide the entire figure into 2 rows and 10 columns, and draw the i + 1th subfigure.
    plt.subplot(2, 10, i + 1)
    plt.imshow(npimg, cmap=plt.cm.binary)
    plt.axis('off')
    plt.show()
device(type='cuda')

torch.Size([32, 1, 28, 28])

2. Build a simple CNN network

import torch.nn.functional as F

num_classes = 10 # Number of categories of images

class Model(nn.Module):
     def __init__(self):
        super().__init__()
         # Feature extraction network
        self.conv1 = nn.Conv2d(1, 32, kernel_size=3) # The first layer of convolution, the convolution kernel size is 3*3
        self.pool1 = nn.MaxPool2d(2) # Set the pooling layer, the pooling core size is 2*2
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3) # The second layer of convolution, the convolution kernel size is 3*3
        self.pool2 = nn.MaxPool2d(2)
                                      
        # Classification network
        self.fc1 = nn.Linear(1600, 64)
        self.fc2 = nn.Linear(64, num_classes)
     # forward propagation
     def forward(self, x):
        x = self.pool1(F.relu(self.conv1(x)))
        x = self.pool2(F.relu(self.conv2(x)))

        x = torch.flatten(x, start_dim=1)

        x = F.relu(self.fc1(x))
        x = self.fc2(x)
       
        return x
    
from torchinfo import summary
# Transfer the model to the GPU (our model runs are all performed on the GPU)
model = Model().to(device)

summary(model)

3. Training model

loss_fn = nn.CrossEntropyLoss() # Create loss function
learn_rate = 1e-2 # Learning rate
opt = torch.optim.SGD(model.parameters(),lr=learn_rate)

# training loop
def train(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset) #The size of the training set, a total of 60,000 images
    num_batches = len(dataloader) # Number of batches, 1875 (60000/32)

    train_loss, train_acc = 0, 0 # Initialize training loss and accuracy
    
    for X, y in dataloader: # Get the image and its label
        X, y = X.to(device), y.to(device)
        
        # Calculate prediction error
        pred = model(X) #Network output
        loss = loss_fn(pred, y) # Calculate the difference between the network output and the real value. Targets is the real value. Calculating the difference between the two is the loss.
        
        # Backpropagation
        optimizer.zero_grad() # Return grad attribute to zero
        loss.backward() # Backpropagation
        optimizer.step() # Automatically update each step
        
        # Record acc and loss
        train_acc + = (pred.argmax(1) == y).type(torch.float).sum().item()
        train_loss + = loss.item()
            
    train_acc /= size
    train_loss /= num_batches

    return train_acc, train_loss

def test (dataloader, model, loss_fn):
    size = len(dataloader.dataset) #The size of the test set, a total of 10,000 images
    num_batches = len(dataloader) # Number of batches, 313 (10000/32=312.5, rounded up)
    test_loss, test_acc = 0, 0
    
    # When training is not in progress, stop gradient updates to save computing memory consumption.
    with torch.no_grad():
        for imgs, target in dataloader:
            imgs, target = imgs.to(device), target.to(device)
            
            # Calculate loss
            target_pred = model(imgs)
            loss = loss_fn(target_pred, target)
            
            test_loss + = loss.item()
            test_acc + = (target_pred.argmax(1) == target).type(torch.float).sum().item()

    test_acc /= size
    test_loss /= num_batches

    return test_acc, test_loss

epochs = 5
train_loss = []
train_acc = []
test_loss = []
test_acc = []

for epoch in range(epochs):
    model.train()
    epoch_train_acc, epoch_train_loss = train(train_dl, model, loss_fn, opt)
    
    model.eval()
    epoch_test_acc, epoch_test_loss = test(test_dl, model, loss_fn)
    
    train_acc.append(epoch_train_acc)
    train_loss.append(epoch_train_loss)
    test_acc.append(epoch_test_acc)
    test_loss.append(epoch_test_loss)
    
    template = ('Epoch:{:2d}, Train_acc:{:.1f}%, Train_loss:{:.3f}, Test_acc:{:.1f}%, Test_loss:{:.3f}')
    print(template.format(epoch + 1, epoch_train_acc*100, epoch_train_loss, epoch_test_acc*100, epoch_test_loss))
print('Done')

Epoch: 1, Train_acc:79.2%, Train_loss:0.701, Test_acc:92.4%, Test_loss:0.245
Epoch: 2, Train_acc:94.8%, Train_loss:0.172, Test_acc:96.4%, Test_loss:0.119
Epoch: 3, Train_acc:96.6%, Train_loss:0.113, Test_acc:97.4%, Test_loss:0.082
Epoch: 4, Train_acc:97.3%, Train_loss:0.090, Test_acc:97.8%, Test_loss:0.068
Epoch: 5, Train_acc:97.7%, Train_loss:0.077, Test_acc:98.0%, Test_loss:0.061
Done

4. Results visualization

import matplotlib.pyplot as plt
#hidewarning
import warnings
warnings.filterwarnings("ignore") #Ignore warning messages
plt.rcParams['font.sans-serif'] = ['SimHei'] # Used to display Chinese labels normally
plt.rcParams['axes.unicode_minus'] = False # Used to display negative signs normally
plt.rcParams['figure.dpi'] = 100 #Resolution

epochs_range = range(epochs)

plt.figure(figsize=(12, 3))
plt.subplot(1, 2, 1)

plt.plot(epochs_range, train_acc, label='Training Accuracy')
plt.plot(epochs_range, test_acc, label='Test Accuracy')
plt.legend(loc='lower right')
plt.title('Training and Validation Accuracy')

plt.subplot(1, 2, 2)
plt.plot(epochs_range, train_loss, label='Training Loss')
plt.plot(epochs_range, test_loss, label='Test Loss')
plt.legend(loc='upper right')
plt.title('Training and Validation Loss')
plt.show()

2. Personal summary:

When building a neural network, my experience on using different layers:

  1. nn.Conv2d:

    • Function: The convolutional layer is used for feature extraction, and the output feature map is calculated by sliding the convolution kernel on the input image. Each convolution kernel learns a set of weights for detecting different features of the input image, such as edges, texture, etc.
    • Significance: The convolution operation can capture the local spatial structure information in the image and share weights at different positions, making the model have translation invariance and local receptive fields.
  2. nn.MaxPool2d:

    • Function: The pooling layer performs downsampling and reduces the size of the feature map by selecting the maximum or average value in a local area. It can help reduce the number of parameters, reduce the risk of overfitting, and enhance the model’s robustness to input transformations.
    • Significance: The pooling operation can retain the main features and reduce redundant information, while improving the spatial invariance and abstraction level of the features, so that the model has a certain degree of robustness to input translation and scale changes.
  3. nn.ReLU:

    • Function: ReLU (Rectified Linear Unit) is an activation function that sets negative values to zero and keeps positive values unchanged. It introduces nonlinear properties, enabling neural networks to fit nonlinear data.
    • Significance: The ReLU activation function has simple calculations, the ability to solve the vanishing gradient problem, and is easier to train in deep networks. It can also enhance the sparsity and representation capabilities of the model.
  4. nn.Linear:

    • Function: The fully connected layer is one of the most basic layers in the neural network. Each neuron is connected to all neurons in the previous layer. It is used to combine features, map them to a new feature space, and output the final classification result.
    • Significance: The fully connected layer can learn complex relationships between features, thereby improving the expression ability of the model. The last fully connected layer can also be regarded as the output layer, which converts the features into a probability distribution of the corresponding category.
  5. nn.Sequential:

    • Function: nn.Sequential is a container used to connect multiple network modules in sequence. Allows us to set the network structure in the initialization stage without having to rewrite it in the forward propagation.
    • Significance: nn.Sequential provides a simple and convenient way to build deep neural networks. By combining multiple layers in sequence, we can easily define complex network structures and make the code clearer and easier to read.

The knowledge points of the article match the official knowledge archives, and you can further learn relevant knowledge. Python introductory skill treeArtificial intelligenceSupervised learning based on Python 386,453 people are learning the system

syntaxbug.com © 2021 All Rights Reserved.