Foreword
For study recording
Preparatory work
Necessary libraries
import torch import torch.nn as nn import torch.nn.functional as F import torch.optim as optim from torch.utils.data import DataLoader from torchvision import datasets, transforms import matplotlib.pyplot as plt
The first is the definition of hyperparameters, first each batch, whether it is gup or cpu, and how many times it is trained.
This varies from person to person. My computer is relatively easy to use and the batch cannot be raised that high. Friends who are good at computers can increase it, for example, 64
#Define hyperparameters. Parameters need to be learned by the neural network itself, and hyperparameters are artificially given. Batch_size = 16 #Data processed in each batch Device = torch.device("cuda" if torch.cuda.is_available() else "cpu") #gpu or cpu Epochs = 10 #rounds
Image processing, (combination), converting images into tensor format, standardization
This is mainly to prepare for the subsequent data set.
# Build pipeline and process images pipeline = transforms.Compose([ transforms.ToTensor(), #Convert the image into tensor transforms.Normalize((0.1307,),(0.3081,)) #Standardization, the standardization of data is helpful to speed up the training of neural networks ])
MINST
The MNIST data set is a handwritten digit image data set and one of the most commonly used data sets in the field of deep learning. The data set is a grayscale image with a size of 28 pixels, 60,000 training sample images, and 10,000 test sample images. The value of each pixel represents the color depth at that location.
First, you need to download the data set. This is very simple. It can be downloaded in torch very conveniently. from this library
One is the data set and the other is the loading data set
from torchvision import datasets, transforms from torch.utils.data import DataLoader
One is for training, one is for testing, and then the data is loaded.
# Download dataset train_set = datasets.MNIST("data",train=True,download=True,transform=pipeline) test_set = datasets.MNIST("data",train=False,download=True,transform=pipeline) #Download Data train_loader =DataLoader(train_set,batch_size=Batch_size,shuffle=True) #shuffle:shuffle test_loader = DataLoader(test_set,batch_size=Batch_size,shuffle=True)
show data set
fig = plt.figure() for i in range(12): plt.subplot(3, 4, i + 1) plt.tight_layout() plt.imshow(train_set.train_data[i], cmap='gray', interpolation='none') plt.title("Labels: {}".format(train_set.train_labels[i])) plt.xticks([]) plt.yticks([]) plt.show()
Build a network model
The more interesting thing is to build a network model, just like building blocks.
Here are a few knowledge points. If you are interested, you can take a look
Convolutional layer
The so-called convolution operation is to use a filter convolution operation to obtain a new feature map. The filter is a small matrix that can be learned, and its function is equivalent to performing specific filtering on the image. The output of the convolutional layer is multiple feature maps. Each one corresponds to a characteristic.
Pooling layer
The pooling layer is used to reduce the dimension of the feature map, which is the dimensionality reduction part of the convolutional neural network. Two types are introduced here, one is max pooling which takes the maximum value, and the other is average pooling which takes the average value. Pooling operation can reduce the number of parameters, reduce complexity, and improve the robustness and generalization ability of the model.
Fully connected layer
The last layer of the convolutional neural network is the fully connected layer, whose main function is to convert the feature map into a vector. Input to a fully connected neural network for classification or prediction. Multiple neurons are links between upper and lower layers. The output of the fully connected layer is the result of classification or prediction. This result can be used to back propagate the model parameters by calculating the loss function to improve the accuracy of the model.
Model construction
First of all, our data set image is a 28*28 grayscale image, so the input channel of the first convolution layer should be 1, and then its output channel should be 10, and the convolution kernel should be 5*5
Then you should use the input channel of 10 to pick it up, and then let it go out with 20, and the convolution kernel is 3
Calculate what the output is now? I’ll explain it later
At this point, two fully connected layers are used to connect. The input is 20*10*10, which is reduced to 500. 500—->10 categories
explain:
It starts out as 1*28*28, and after a convolutional layer, it becomes 10*24*24, (28-5 + 1=24)
And it has to go through a pooling layer. (halved) 10*12*12
Add another convolution layer and it becomes 20*10*10
#Build network model class Digit(nn.Module): def __init__(self): super().__init__() self.conv1 = nn.Conv2d(1,10,5) #1: channel of grayscale image, 10: output channel, 5: kernel self.conv2 = nn.Conv2d(10,20,3) #10: input channel, 20: output channel, 3: kernel self.fc1 = nn.Linear(20*10*10,500)#20*10*10: input channel, 500: output channel self.fc2 = nn.Linear(500,10) #500: input channel, 10: output channel def forward(self,x): input_size = x.size(0) #batch_size x 1 x 28 x 28 x = self.conv1(x) #Input: batch*1*28*28, output: batch*10*24*24 28-5 + 1=24 x = F.relu(x) #Keep shape unchanged and output batch*10*24*24 x = F.max_pool2d(x,2,2) #Input: batch*10*24*24 Output: batch*10*12*12 x = self.conv2(x) #Input: batch*10*12*12 Output: batch*20*10*10 12-3 + 1 = 10 x = F.relu(x) x = x.view(input_size,-1) #Flattening -1: Automatically calculate dimensions, 20*10*10=2000 x = self.fc1(x) #Input batch*2000 Output: batch*500 x = F.relu(x) x = self.fc2(x) #Input: batch*500 Output: 10 output = F.log_softmax(x,dim=1) #Calculate the probability value of each number after classification return output
Optimizer
Adam optimizer. This optimizer is commonly used by us. Its role is to optimize the loss function to update the parameters of the model. And it has an adaptive learning rate function, which can automatically adjust the learning rate and has better convergence and generalization capabilities.
#Model instance model = Digit().to(Device) #Optimizer optimizer = optim.Adam(model.parameters()) #Adam: an optimizer
Training method
1#Deploy to device,
#Initialize to 0
#crossentropy
#Define training method def train_model(model,device,train_loader,optimizer,epoch): #Model training model.train() for batch_index ,(data,target) in enumerate(train_loader): #Deploy to DEVICE data,target = data.to(device),target.to(device) #Gradient initialized to 0 optimizer.zero_grad() #Results after training output = model(data) #Calculate loss loss = F.cross_entropy(output,target)#Cross entropy loss: suitable for multi-classification #Backpropagation: feedback results, update weights loss.backward() #Parameter optimization optimizer.step() if batch_index % 3000 == 0: print("Train Epoch: {} \t Loss: {:.6f}".format(epoch,loss.item()))
Test method
with torch.no_grad(): does not calculate gradients and does not perform backpropagation
#Define test method def test_model(model,device,test_loader): #Model verification model.eval() #Correct rate correct = 0.0 #test loss test_loss = 0.0 with torch.no_grad(): #The gradient will not be calculated and backpropagation will not be performed. for data ,target in test_loader: data,target = data.to(device),target.to(device) #Test Data output = model(data) #Calculate test loss test_loss + = F.cross_entropy(output,target).item()#item: #Find the subscript with the largest probability value pred = output.max(1,keepdim=True)[1] #value, index #Cumulative correct rate correct + = pred.eq(target.view_as(pred)).sum().item() test_loss /= len(test_loader.dataset) print("Test-- Average loss : {:.4f}, Accuracy : {:.3f}\\ ".format( test_loss,100.0 * correct / len(test_loader.dataset)))
Complete code
import torch import torch.nn as nn import torch.nn.functional as F import torch.optim as optim from torch.utils.data import DataLoader from torchvision import datasets, transforms import matplotlib.pyplot as plt #Define hyperparameters. Parameters need to be learned by the neural network itself, and hyperparameters are artificially given. Batch_size = 16 #Data processed in each batch Device = torch.device("cuda" if torch.cuda.is_available() else "cpu") #gpu or cpu Epochs = 10 #rounds # Build pipeline and process images pipeline = transforms.Compose([ transforms.ToTensor(), #Convert the image into tensor transforms.Normalize((0.1307,),(0.3081,)) #Normalization: Reduce model complexity when overfitting ]) # Download dataset train_set = datasets.MNIST("data",train=True,download=True,transform=pipeline) test_set = datasets.MNIST("data",train=False,download=True,transform=pipeline) #Download Data train_loader =DataLoader(train_set,batch_size=Batch_size,shuffle=True) #shuffle:shuffle test_loader = DataLoader(test_set,batch_size=Batch_size,shuffle=True) fig = plt.figure() for i in range(12): plt.subplot(3, 4, i + 1) plt.tight_layout() plt.imshow(train_set.train_data[i], cmap='gray', interpolation='none') plt.title("Labels: {}".format(train_set.train_labels[i])) plt.xticks([]) plt.yticks([]) plt.show() #Build network model class Digit(nn.Module): def __init__(self): super().__init__() self.conv1 = nn.Conv2d(1,10,5) #1: channel of grayscale image, 10: output channel, 5: kernel self.conv2 = nn.Conv2d(10,20,3) #10: input channel, 20: output channel, 3: kernel self.fc1 = nn.Linear(20*10*10,500)#20*10*10: input channel, 500: output channel self.fc2 = nn.Linear(500,10) #500: input channel, 10: output channel def forward(self,x): input_size = x.size(0) #batch_size x 1 x 28 x 28 x = self.conv1(x) #Input: batch*1*28*28, output: batch*10*24*24 28-5 + 1=24 x = F.relu(x) #Keep shape unchanged and output batch*10*24*24 x = F.max_pool2d(x,2,2) #Input: batch*10*24*24 Output: batch*10*12*12 x = self.conv2(x) #Input: batch*10*12*12 Output: batch*20*10*10 12-3 + 1 = 10 x = F.relu(x) x = x.view(input_size,-1) #Flattening -1: Automatically calculate dimensions, 20*10*10=2000 x = self.fc1(x) #Input batch*2000 Output: batch*500 x = F.relu(x) x = self.fc2(x) #Input: batch*500 Output: 10 output = F.log_softmax(x,dim=1) #Calculate the probability value of each number after classification return output #define optimizer model = Digit().to(Device) optimizer = optim.Adam(model.parameters()) #Adam: an optimizer #Define training method def train_model(model,device,train_loader,optimizer,epoch): #Model training model.train() for batch_index ,(data,target) in enumerate(train_loader): #Deploy to DEVICE data,target = data.to(device),target.to(device) #Gradient initialized to 0 optimizer.zero_grad() #Results after training output = model(data) #Calculate loss loss = F.cross_entropy(output,target)#Cross entropy loss: suitable for multi-classification #Backpropagation: feedback results, update weights loss.backward() #Parameter optimization optimizer.step() if batch_index % 3000 == 0: print("Train Epoch: {} \t Loss: {:.6f}".format(epoch,loss.item())) #Define test method def test_model(model,device,test_loader): #Model verification model.eval() #Correct rate correct=0.0 #test loss test_loss = 0.0 with torch.no_grad(): #The gradient will not be calculated and backpropagation will not be performed. for data ,target in test_loader: data,target = data.to(device),target.to(device) #Test Data output = model(data) #Calculate test loss test_loss + = F.cross_entropy(output,target).item()#item: #Find the subscript with the largest probability value pred = output.max(1,keepdim=True)[1] #value, index #Cumulative correct rate correct + = pred.eq(target.view_as(pred)).sum().item() test_loss /= len(test_loader.dataset) print("Test-- Average loss : {:.4f}, Accuracy : {:.3f}\\ ".format( test_loss,100.0 * correct / len(test_loader.dataset))) #Call method for epoch in range(1,Epochs + 1): #1-->10 + 1 train_model(model,Device,train_loader,optimizer,epoch) test_model(model,Device,test_loader) torch.save(model.state_dict(), 'model.pt')
The result is 99% correct