Week P1: Implementing mnist handwritten digit recognition

This article isThe learning record blog in the 365-day deep learning training camp

Reference article: [365-day deep learning training camp – Week P1: Implementing mnist handwritten digit recognition] (Pytorch actual combat | Week P1: Implement mnist handwritten digit recognition (qq.com))**

Original author: Classmate K | Tutoring, project customization

Table of Contents

1. Code and running results

1.Preliminary preparation

2. Build a simple CNN network

3.Train the model

4. Results visualization

2. Personal summary:

Experience in using different layers when building neural networks:

nn.Conv2d:

nn.MaxPool2d:

nn.ReLU:

nn.Linear:

nn.Sequential:

1. Code and running results

1. Early preparation

import torch print(torch.__version__) import torch import torch.nn as nn import matplotlib.pyplot as plt import torchvision device = torch.device("cuda" if torch.cuda.is_available() else "cpu") device train_ds = torchvision.datasets.MNIST('data', train=True, transform=torchvision.transforms.ToTensor(), # Convert data type to Tensor download=True) test_ds = torchvision.datasets.MNIST('data', train=False, transform=torchvision.transforms.ToTensor(), # Convert data type to Tensor download=True) batch_size = 32 train_dl = torch.utils.data.DataLoader(train_ds, batch_size=batch_size, shuffle=True) test_dl = torch.utils.data.DataLoader(test_ds, batch_size=batch_size) # Take a batch to view the data format #The shape of the data is: [batch_size, channel, height, weight] # Among them, batch_size is set by yourself, channel, height and weight are the number of channels, height and width of the image respectively. imgs, labels = next(iter(train_dl)) print(imgs.shape) import numpy as np #Specify the image size, the image size is a drawing of 20 width and 5 height (in inches) plt.figure(figsize=(20, 5)) for i, imgs in enumerate(imgs[:20]): # Dimension reduction npimg = np.squeeze(imgs.numpy()) # Divide the entire figure into 2 rows and 10 columns, and draw the i + 1th subfigure. plt.subplot(2, 10, i + 1) plt.imshow(npimg, cmap=plt.cm.binary) plt.axis('off') plt.show()

device(type='cuda') torch.Size([32, 1, 28, 28])

2. Build a simple CNN network

import torch.nn.functional as F num_classes = 10 # Number of categories of images class Model(nn.Module): def __init__(self): super().__init__() # Feature extraction network self.conv1 = nn.Conv2d(1, 32, kernel_size=3) # The first layer of convolution, the convolution kernel size is 3*3 self.pool1 = nn.MaxPool2d(2) # Set the pooling layer, the pooling core size is 2*2 self.conv2 = nn.Conv2d(32, 64, kernel_size=3) # The second layer of convolution, the convolution kernel size is 3*3 self.pool2 = nn.MaxPool2d(2) # Classification network self.fc1 = nn.Linear(1600, 64) self.fc2 = nn.Linear(64, num_classes) # forward propagation def forward(self, x): x = self.pool1(F.relu(self.conv1(x))) x = self.pool2(F.relu(self.conv2(x))) x = torch.flatten(x, start_dim=1) x = F.relu(self.fc1(x)) x = self.fc2(x) return x from torchinfo import summary # Transfer the model to the GPU (our model runs are all performed on the GPU) model = Model().to(device) summary(model)

3. Training model

loss_fn = nn.CrossEntropyLoss() # Create loss function learn_rate = 1e-2 # Learning rate opt = torch.optim.SGD(model.parameters(),lr=learn_rate) # training loop def train(dataloader, model, loss_fn, optimizer): size = len(dataloader.dataset) #The size of the training set, a total of 60,000 images num_batches = len(dataloader) # Number of batches, 1875 (60000/32) train_loss, train_acc = 0, 0 # Initialize training loss and accuracy for X, y in dataloader: # Get the image and its label X, y = X.to(device), y.to(device) # Calculate prediction error pred = model(X) #Network output loss = loss_fn(pred, y) # Calculate the difference between the network output and the real value. Targets is the real value. Calculating the difference between the two is the loss. # Backpropagation optimizer.zero_grad() # Return grad attribute to zero loss.backward() # Backpropagation optimizer.step() # Automatically update each step # Record acc and loss train_acc + = (pred.argmax(1) == y).type(torch.float).sum().item() train_loss + = loss.item() train_acc /= size train_loss /= num_batches return train_acc, train_loss def test (dataloader, model, loss_fn): size = len(dataloader.dataset) #The size of the test set, a total of 10,000 images num_batches = len(dataloader) # Number of batches, 313 (10000/32=312.5, rounded up) test_loss, test_acc = 0, 0 # When training is not in progress, stop gradient updates to save computing memory consumption. with torch.no_grad(): for imgs, target in dataloader: imgs, target = imgs.to(device), target.to(device) # Calculate loss target_pred = model(imgs) loss = loss_fn(target_pred, target) test_loss + = loss.item() test_acc + = (target_pred.argmax(1) == target).type(torch.float).sum().item() test_acc /= size test_loss /= num_batches return test_acc, test_loss epochs = 5 train_loss = [] train_acc = [] test_loss = [] test_acc = [] for epoch in range(epochs): model.train() epoch_train_acc, epoch_train_loss = train(train_dl, model, loss_fn, opt) model.eval() epoch_test_acc, epoch_test_loss = test(test_dl, model, loss_fn) train_acc.append(epoch_train_acc) train_loss.append(epoch_train_loss) test_acc.append(epoch_test_acc) test_loss.append(epoch_test_loss) template = ('Epoch:{:2d}, Train_acc:{:.1f}%, Train_loss:{:.3f}, Test_acc:{:.1f}%, Test_loss:{:.3f}') print(template.format(epoch + 1, epoch_train_acc*100, epoch_train_loss, epoch_test_acc*100, epoch_test_loss)) print('Done')

Epoch: 1, Train_acc:79.2%, Train_loss:0.701, Test_acc:92.4%, Test_loss:0.245 Epoch: 2, Train_acc:94.8%, Train_loss:0.172, Test_acc:96.4%, Test_loss:0.119 Epoch: 3, Train_acc:96.6%, Train_loss:0.113, Test_acc:97.4%, Test_loss:0.082 Epoch: 4, Train_acc:97.3%, Train_loss:0.090, Test_acc:97.8%, Test_loss:0.068 Epoch: 5, Train_acc:97.7%, Train_loss:0.077, Test_acc:98.0%, Test_loss:0.061 Done

4. Results visualization

import matplotlib.pyplot as plt #hidewarning import warnings warnings.filterwarnings("ignore") #Ignore warning messages plt.rcParams['font.sans-serif'] = ['SimHei'] # Used to display Chinese labels normally plt.rcParams['axes.unicode_minus'] = False # Used to display negative signs normally plt.rcParams['figure.dpi'] = 100 #Resolution epochs_range = range(epochs) plt.figure(figsize=(12, 3)) plt.subplot(1, 2, 1) plt.plot(epochs_range, train_acc, label='Training Accuracy') plt.plot(epochs_range, test_acc, label='Test Accuracy') plt.legend(loc='lower right') plt.title('Training and Validation Accuracy') plt.subplot(1, 2, 2) plt.plot(epochs_range, train_loss, label='Training Loss') plt.plot(epochs_range, test_loss, label='Test Loss') plt.legend(loc='upper right') plt.title('Training and Validation Loss') plt.show()

2. Personal summary:

When building a neural network, my experience on using different layers:

nn.Conv2d:

Function: The convolutional layer is used for feature extraction, and the output feature map is calculated by sliding the convolution kernel on the input image. Each convolution kernel learns a set of weights for detecting different features of the input image, such as edges, texture, etc.

Significance: The convolution operation can capture the local spatial structure information in the image and share weights at different positions, making the model have translation invariance and local receptive fields.

nn.MaxPool2d:

Function: The pooling layer performs downsampling and reduces the size of the feature map by selecting the maximum or average value in a local area. It can help reduce the number of parameters, reduce the risk of overfitting, and enhance the model’s robustness to input transformations.

Significance: The pooling operation can retain the main features and reduce redundant information, while improving the spatial invariance and abstraction level of the features, so that the model has a certain degree of robustness to input translation and scale changes.

nn.ReLU:

Function: ReLU (Rectified Linear Unit) is an activation function that sets negative values to zero and keeps positive values unchanged. It introduces nonlinear properties, enabling neural networks to fit nonlinear data.

Significance: The ReLU activation function has simple calculations, the ability to solve the vanishing gradient problem, and is easier to train in deep networks. It can also enhance the sparsity and representation capabilities of the model.

nn.Linear:

Function: The fully connected layer is one of the most basic layers in the neural network. Each neuron is connected to all neurons in the previous layer. It is used to combine features, map them to a new feature space, and output the final classification result.

Significance: The fully connected layer can learn complex relationships between features, thereby improving the expression ability of the model. The last fully connected layer can also be regarded as the output layer, which converts the features into a probability distribution of the corresponding category.

nn.Sequential:

Function: nn.Sequential is a container used to connect multiple network modules in sequence. Allows us to set the network structure in the initialization stage without having to rewrite it in the forward propagation.

Significance: nn.Sequential provides a simple and convenient way to build deep neural networks. By combining multiple layers in sequence, we can easily define complex network structures and make the code clearer and easier to read.

The knowledge points of the article match the official knowledge archives, and you can further learn relevant knowledge. Python introductory skill treeArtificial intelligenceSupervised learning based on Python 386,453 people are learning the system

Week P1: Implementing mnist handwritten digit recognition

1. Code and running results

1. Early preparation

2. Build a simple CNN network

3. Training model

4. Results visualization

2. Personal summary:

When building a neural network, my experience on using different layers:

`nn.Conv2d`:

`nn.MaxPool2d`:

`nn.ReLU`:

`nn.Linear`:

`nn.Sequential`: