[pytorch] Manually build a neural network and train it

0.Write in front

The previous blog was all illustrative. It is unclear whether the actual code can run. Please watch carefully. In this article, fashion data is directly used to implement simple training of softmax and the result output is completed. A prediction is implemented and the output result is observed.

And more importantly, here are some explanations of the training process, the form of the data, what we mainly do in softmax and how to do it.

Prerequisite requirements: Some data package requirements are written here first. Just download them according to the requirements according to pip3.

import torch
import torchvision
from torch.utils import data
from torchvision import transforms
from d2l import torch as d2l
import pandas
from torch import nn

1. Download and process of data

(1)Data download:

First of all, for data download, the method we choose here is to download the data collection named fashions directly in the d2l environment (please search on Baidu for what D2L is).

#This is to download the data set. The data set is obtained here and placed in the memory.
d2l.use_svg_display()

After the data is downloaded, it should be in our own memory. Here I specify ./data. Different people may have different directories, please refer to it yourself.

Then the data is processed. This data set is in a PIL form. Here we need to convert each image into a floating point number form. There is a very important function dataset here. It can be understood as a container, usually in the form of A two-dimensional array to store our data.

# Convert image data from PIL type to 32-bit floating point format through ToTensor instance,
# And divide by 255 so that the values of all pixels are between 0 and 1
trans = transforms.ToTensor()
mnist_train = torchvision.datasets.FashionMNIST(
    root="./data", train=True, transform=trans, download=True)
mnist_test = torchvision.datasets.FashionMNIST(
    root="./data", train=False, transform=trans, download=True)

mnist_train and mnist_test are both what we commonly call a dataset object, so the name means it is a data set.

Although this thing is not a two-dimensional array, we can read it using a two-dimensional array.
mnist_train is a dataset object, which is a two-dimensional array of shape [n, 2]
For example, [0][0] is a 1×28×28 image tensor, which is our input
print(mnist_train[0][1]) label is a single number. In this data, it is a number, or a scalar tensor.

If we output it in python, the effect will be like this

print(mnist_train[0][0].shape) #Tensor.Size([1,28,28])
print(mnist_train[0][1]) #Tensor(9)

#The issues that need to be noted here will be explained later

(2)Data processing

In fact, we have to consider two aspects when processing data. One is that when it is used for training, we need to convert this thing into tensor form, and more importantly, we cannot invest so much data at one time (as a reminder) , although the fashion data set only has 60,000 items, this idea is very important), mainly when calculating the loss, we must ensure low coupling and find an appropriate amount of data at once (this will be supplemented later)

Secondly, we need to have an overview of the overall situation, so we need to calculate a loss based on the entire data. But the problem is that the dataset object is not a tensor, and we need to perform some simple processing on it;

So the first thing is the first one: how to read the data. This is also what we need to do when we perform traversal training. Here we need to know another very important object, dataloader. This thing divides the dataset object into multiple batches. , and then set it to an iterator object, as shown in the figure

batch_size=256
train_iter=data.DataLoader(mnist_train,batch_size=256,shuffle=True,num_workers=4) #Divide this data set into 256 dozen, randomly select the shuffle mode, and read it with four threads
test_iter=data.DataLoader(mnist_test,batch_size=256,shuffle=True,num_workers=4) # This will generate an iterator-like thing, which can be read using a for loop

This iter obviously means an iteration, which is very simple, but there is a problem. As far as I know, this thing can only be read by a for loop.

If we force print, we will get nothing.

print(train_iter)
The data generated by #dataloader can be obtained using a for loop, and the object can only be obtained using a for loop

The correct reading method should be like this. Reading data through this method can ensure that a batch of tensors is returned every time.

for features,labels in train_iter:
    print(features,labels)

It needs to be explained separately here. First of all, we previously set the batch to be 256, and the feature is a three-dimensional tensor, and the label is a constant tensor (Note that constant tensors and vector tensors are two completely different things. ), so the tensor printed here should be of these two shapes.

[256,1,28,28]
[256,1]

The embodiment of batches here is that we treat 256 as a batch and then merge them together.

Maybe you have to ask, well, doesn’t the extra dimension of 256 affect the training? Isn’t it a big problem, because the neural network has its own processing method?

————————————————–

Another data that needs to be processed is actually the whole. After all, we need to use the loss function to calculate an overall estimate later to determine an overall training effect.

There is nothing strange here. It requires manual operation to convert the data traversed from mnist_train and muist_test into a tensor property that we can accept.

#Use the method to change the complete data set into a tensor
# Define empty tensors to store input and output
inputs = []
outputs = []

# Iterate through each sample of the data set
for sample in mnist_train:
    image = sample[0] #Image data
    label = sample[1] # label data

    # Add image data and label data to the tensor respectively
    inputs.append(image)
    outputs.append(label)

#Convert list to tensor object
inputs = torch.stack(inputs)
outputs = torch.tensor(outputs)

# Print the shape of the tensor. A total of 60,000 data are detected here.
print("The shape of the input tensor",inputs.shape) #The shape of the input tensor
print("The shape of the output tensor",outputs.shape) # Output the shape of the tensor

As for how to use these two things later, we will explain in detail later.

2. About the construction of neural network

(1) General construction and data processing

First is the general structure of the neural network we build in this code

net=nn.Sequential(
    nn.Flatten(),
    nn.Linear(784,256),
    nn.ReLU(),
    nn.Linear(256,10),
    nn.Softmax(dim=1)
)

Let me first explain what the neural network does. In this neural network, a Flatten is first used to flatten the tensor. The flattening effect is like this:

[256,1,28,28] => [256,784]

Then after two dense layers, it becomes

[256,10]======>

The general internal structure is as follows:
0:[1,2,3,4,5,6,7,8,9,10]
1:[1,2,3,4,5,6,7,8,9,10]
...........
255:[1,2,3,4,5,6,7,8,9,10]

Finally, we will not explain the softmax function here. Here we indicate dim=1. That is, the softmax normalization operation is performed along the growth direction of the row.

0:[0.1,0.3,............]
1:[0.4,............]
2:[....................]
.............

Of course, it is easy to encounter two problems here: First, our data is bound into a huge tensor, binding 256 data together, which is what we want to explain here. things.

The first is The first point. In the process of forward propagation, we can clearly see (in fact, if you perform this operation on each layer, you will also find a similar result) for For a layer {linear([256,10])}, if we pass in a tensor of a size such as Size=[60000,256], then we can get a forward propagation calculation result of [60000, 10], it only achieves neuronal convergence for the last dimension, and nothing else seems to happen.

This is because of an important feature of neural networks. In the code, we often pass in data in batches, which requires batch processing. Therefore, the principle adhered to in pytorch is that when propagating forward, only The dimensions of the last layer are calculated.

Then there is the second point. You can see that the data is transferred into a [256, 1, 28, 28] part when it is passed in. The first step we perform is a flattening operation. , but one thing to note is that after we perform a separate operation on a certain layer, we can see that the final output result internally is [256,784]. Because of flatten, the first dimension will never be operated.

So to sum up, we can see one thing. The operation of batch data by net actually depends on the properties/mathematical operations of some layers and functions themselves, rather than the deliberate preparation of net. In this way, we Combined with matrix operations, a batch calculation is implemented

(2) About weight configuration

First of all, let me explain that there are many ways to initialize the weights of this kind of neural network, and you can directly access and configure the attributes of a certain layer. Of course, we mostly borrow the apply function of the neural network object.

#But here you also need to initialize it first nn.init.normal_(m.weight, std=0.01), this is a more important initialization, right?
def init(m):
    if type(m)==nn.Linear:
        nn.init.normal_(m.weight,std=0.1)
        nn.init.constant_(m.bias,0.1)

The #apply function will ensure that each layer can use this initialization function
net.apply(init)

Among them, apply will automatically execute this function for each layer. As long as we determine that this layer is a neural network layer, we will perform this operation.

init.normal_ is mainly used to give initial random values to weights

init.constant_ is to give the weight a fixed value

(3) About the configuration of some other functions

The first is the training function, which is the iterator. We use the stochastic gradient descent function and pass in the parameters of the net

#Training function
train = torch.optim.SGD(net.parameters(), lr=0.03)

Then there is the loss calculation function. The loss calculation function can actually be written by yourself, but why not use it after it is written?

In prison, we use softmax to calculate a data, so here we use “Cross entropy loss function“

The general principle of this function:

For classification problems like softmax, it is best to use other loss functions, such as the cross quotient loss function nn.CrossEntropyLoss(reduction=’none’)
Here is an explanation of the input [0.7,0.2,0.1],[0.1,0.1,0.8] (the probability of each possible label) and the label [0,2] (what the real label should be), and then this function will return a something in tensor form
This formula has actually been sorted out once. In the csdn I wrote before, the calculation result here is
1×log(0.7) + 0×log(0.2) + 0*log(0.1), and
0×log(0.1) + 0×log(0.2) + 1*log(0.8), these two elements form a one-dimensional tensor

The formula is as follows. For the c-th input and output data, assuming that the i-th label is real, and the predicted probability of this label is yci, then the loss of this data is

$loss(c)=1*log(y_{c}^{i})$

Then for the total batch of data n, we need to take the average to find the cost. This will not be demonstrated here.

(4)Mode

Before starting training, the net object actually has two modes:

net.train() # Training mode
net.eval() #Evaluation mode

The training mode will store the gradient, but the evaluation mode will not store it, that is, one is used for training and the other is used for evaluation.

3. Training process, data processing

In fact, the training process is similar: calculate loss, clear gradient, calculate gradient, update four steps. If necessary, you can check the loss when each cycle is increased.

epoch_num=10
for epoch in range(epoch_num):
    for X,y in train_iter:
        l=loss(net(X),y).mean()
        train.zero_grad()
        l.backward()
        train.step()
    l=loss(net(inputs),outputs)
    print(f'epoch {epoch + 1}, loss {l.mean():f}')

There are two things to note here:

(1) Different losses have different requirements for input and output. For example, the input and output required by the cross-entropy loss function are:

for example

[0.7,0.2,0.1],[0.1,0.1,0.8] (the probability of each possible label) and label [0,2] (what the real label should be), then this function will return a tensor form thing

But for loss functions like MSLoss, the results we get are completely different. The input and output are both one-dimensional arrays of the same size, and then a nanopore data is directly calculated.

(2) Backpropagation can only be calculated for a scalar tensor of a number, which is the result of our compression calculation using functions such as mean and sum.

4. A little summary and complete code

A brief summary:

1. First about net:
net can accept the input of small batches or even a complete data list, which means that the small batch we pass in is actually [256,1,28,28]
Then our final output is [256,1], although this is not what we want
In fact, the essence of net is a tensor processor, which is ‘compressed’ into the required format.
Tensor processor: The initial guess is that the tensors will be processed one by one according to the batch, but the fact is that the net itself does not make too many distinctions between them.
What is passed in is still Zhang Liang as a whole. You need to operate dim yourself in the net to get the required results and forms.

And that custom layer may not be accepted because it has no trainable parameters? Therefore, during the training process, the loss did not change in any way.

Another issue to note is that Fattern cannot flatten the dim=0 dimension, which is why the size of small batches remains stable.

2. There is another problem. In fact, most of the problems lie in loss when calculating. When calculating the error of this function, the most basic requirement is that the input and output are in the same form, such as here
Loss hopes that net(x) and y are both of the same length Size[256], and then automatically calculates sum. However, because the first layer cannot be folded, net(x) is [256,1], so we calculate it in loss Sometimes it is processed using reshape([-1])
This is the problem many times, so we need to do some processing.

3. When estimating the overall error, it must be converted into a tensor. Here, a function is used to process it into a tensor.
This one is a bit complicated, but remember net is just a tensor compressor

4. The dataset object can obtain labels and data through the two-dimensional array method.
dataloader can be converted into an iterator iter. The iterator obtains a small batch of data through a for loop (perhaps by other means), and it is in tensor format.

5. Finally, remember to keep the lr small, otherwise it will explode.
After the explosion, it went straight to nan. I never dreamed that a pit that I had never stepped on in js would be realized here. This was caused by the initial gradient setting of 0.5 being too large.

6. Also remember two initialization functions, which are used to initialize the weight of a certain layer.
nn.init.normal_(m.weight,std=0.1) is used to assign values randomly, usually w
nn.init.constant_(m.bias,0.1) is used for constant assignment, usually b

7.Finally summarize this process
Neural network construction (sequential) ===》Parameter settings (how to use, and then apply the model)===》Loss function settings (just use library functions)===》Optimization function settings (pass in the parameter parameter of the net) ===》Training

Training is: Calculate the loss ===》Perform reaction propagation calculation after gradient clearing===》Perform optimization iteration

The data is: read the dataset object from the file (this object saves the data [][]), and then use dataloader to obtain the iterator that can be used for training

import torch
import torchvision
from torch.utils import data
from torchvision import transforms
from d2l import torch as d2l
import pandas
from torch import nn

#This is to download the data set. The data set is obtained here and placed in the memory.
d2l.use_svg_display()

# Convert image data from PIL type to 32-bit floating point format through ToTensor instance,
# And divide by 255 so that the values of all pixels are between 0 and 1
trans = transforms.ToTensor()
mnist_train = torchvision.datasets.FashionMNIST(
    root="./data", train=True, transform=trans, download=True)
mnist_test = torchvision.datasets.FashionMNIST(
    root="./data", train=False, transform=trans, download=True)



batch_size=256
train_iter=data.DataLoader(mnist_train,batch_size=256,shuffle=True,num_workers=4) #Divide this data set into 256 dozen, randomly select the shuffle mode, and read it with four threads
test_iter=data.DataLoader(mnist_test,batch_size=256,shuffle=True,num_workers=4) # This will generate an iterator-like thing, which can be read using a for loop
print(train_iter)



net=nn.Sequential(
    nn.Flatten(),
    nn.Linear(784,256),
    nn.ReLU(),
    nn.Linear(256,10),
    nn.Softmax(dim=1)#Use softmax to process it first
)


# Come to the conclusion that flattening the layer will never have an effect on the first layer

#So the problem with the above layer is: the output we expect is [256], not [256,1]

#Neuron is first set to training mode
net.train()

#Loss function
#loss = nn.MSELoss()
loss = nn.CrossEntropyLoss(reduction='none')



#But here you also need to initialize it first nn.init.normal_(m.weight, std=0.01), this is a more important initialization, right?
def init(m):
    if type(m)==nn.Linear:
        nn.init.normal_(m.weight,std=0.1)
        nn.init.constant_(m.bias,0.1)

The #apply function will ensure that each layer can use this initialization function
net.apply(init)

#training function
train = torch.optim.SGD(net.parameters(), lr=0.03)

#Use the method to change the complete data set into a tensor
# Define empty tensors to store input and output
inputs = []
outputs = []

# Iterate through each sample of the data set
for sample in mnist_train:
    image = sample[0] #Image data
    label = sample[1] # label data

    # Add image data and label data to the tensor respectively
    inputs.append(image)
    outputs.append(label)

#Convert list to tensor object
inputs = torch.stack(inputs)
outputs = torch.tensor(outputs)

# Print the shape of the tensor. A total of 60,000 data are detected here.
print("The shape of the input tensor",inputs.shape) #The shape of the input tensor
print("The shape of the output tensor",outputs.shape) # Output the shape of the tensor


epoch_num=10
for epoch in range(epoch_num):
    for X,y in train_iter:
        l=loss(net(X),y).mean()
        train.zero_grad()
        l.backward()
        train.step()
    l=loss(net(inputs),outputs)
    print(f'epoch {epoch + 1}, loss {l.mean():f}')



    
#Then switch the model to evaluation mode
net.eval()