error_of_linear_regression_pytorch

《Dive-into-DL-Pytorch》-Simple implementation of linear regression (linear regression pytorch)

I started learning deep learning on November 3rd. I don’t understand a lot of things. I tried my best to give my own understanding of the code in the book and analyze it.

There are two places in the book that will cause errors when running directly. We also briefly analyze them. Because I have just learned it recently, my understanding of many places may not be very correct.

0.Model

Let the first feature

(

)

for

, the second feature is

, real label

(

)

for

and

is the weight

(

)

It’s a deviation

(

)

, which are the parameters of the linear regression model

(

)

is linear regression on the true label

Prediction

Let the first feature be x_1, the second feature be x_2, and the real label be y: \ \hat y = \omega_1x_1 + \omega_2x_2 + b \ \ omega_1 and \omega_2 are weights, b is bias, and they are parameters of the linear regression model. \hat y is the prediction of the real label y by linear regression

Let the first feature be x1?, the second feature be x2?, and the real label be y: y^?=ω1?x1? + ω2?x2? + bω1? and ω2? are the weights ( weight), b is the bias, which are the parameters of the linear regression model. y^? is the linear regression prediction of the true label y

1. Import library

import torch
import numpy as np
import random
from torch import nn

2. Generate data set

Given a randomly generated batch of sample features

∈

1000

, using real weights

[

3.4

]

, deviation

4.2

and a random noise term

to generate labels

Given randomly generated batch sample features \boldsymbol{X}\in\mathbb{R}^{1000\times2}, using real weights \boldsymbol{\omega} = [2, -3.4]^ T, bias b = 4.2 and a random noise term \epsilon to generate labels \ \mathbf{y} = \boldsymbol{X}\boldsymbol{\omega} + b + \epsilon

Given a randomly generated batch of sample features

num_inputs = 2 # Number of inputs for training data (number of features)
num_examples = 1000 # Number of samples of training data
true_w = [2, -3.4] #True weight (there are two features, so there are two weights)
true_b = 4.2 # True deviation
features = torch.tensor(np.random.normal(0, 1, (num_examples, num_inputs)), dtype=torch.float) # Use normal distribution to randomly generate two features of each sample

labels = true_w[0] * features[:, 0] + true_w[1] * features[:, 1] + true_b # Calculate true labels through features and deviations
labels + = torch.tensor(np.random.normal(0, 0.01, size=labels.size()), dtype=torch.float) # The labels calculated above are all a constant value, so a random value is added here Noise terms represent meaningless interference in the data set
# The noise term follows a normal distribution with a mean of 0 and a standard deviation of 0.01

3. Read data

# Import the data package provided by pytorch to read data
import torch.utils.data as Data

# Define the number of mini-batch samples
batch_size = 10
# Combine features and labels of training data
dataset = Data.TensorDataset(features, labels)
# Import dataset into dataloader
data_iter = Data.DataLoader(dataset, batch_size, shuffle=True)
# The first parameter is the imported data set
    #The second parameter is the number of samples read each time
    # The third parameter is whether to scramble the data
    
#Print the first mini-batch of data
for X, y in data_iter:
    print(X, y)
    break

4. Define the model

class LinearNet(nn.Module): # Defines a class LinearNet that inherits from nn.Module
    def __init__(self, n_feature):
        super(LinearNet, self).__init__()
        self.linear = nn.Linear(n_feature, 1) # nn.Linear defines a linear layer of a neural network
        # n_feature is the number of input features
            # 1 is the number of output features
    # forward defines forward propagation
    def forward(self, x): # Accept an input tensor x and return the output tensor y after passing through the linear layer
        y = self.linear(x)
        return y
    
net = LinearNet(num_inputs)
print(net) # Use print to print out the structure of the network

The book also teaches the use of `nn.Sequential` to build a network. `Sequential` is an ordered container. The network layer will follow the incoming `Sequential` > is added to the calculation graph in sequence:

# Writing method one
net = nn.Sequential(
    nn.Linear(num_inputs, 1)
    # Other layers can also be passed in here
)

# Writing method two
net = nn.Sequential()
net.add_module('linear', nn.Linear(num_inputs, 1)) # linear is the name of the layer
# net.add_module... Other layers can also be passed in here

# Writing method three
from collections import OrderedDict
net = nn.Sequential(OrderedDict([
    ('linear', nn.Linear(num_inputs, 1)) # linear is the name of the layer
    #... Other layers can also be passed in here
]))

print(net)
print(net[0]) # Output the first layer of network. The above three writing methods only create one layer of network.

You can view all learnable parameters of the model through `net.patameters()`:

for param in net.parameters():
    print(param)

5.Initialize model parameters

Here is the initialization of the model parameters of net[0], that is, defining the first layer of network passed in the model. In fact, the above code only passes in one layer of network.

# We use init.normal_ to initialize each element of the weight parameter to be randomly sampled from a normal distribution with a mean of 0 and a standard deviation of 0.01, and the deviation will be initialized to 0
from torch.nn import init

init.normal_(net[0].weight, mean=0, std=0.01)
init.constant_(net[0].bias, val=0.0) # You can also directly modify the bias data: net[0].bias.data.fill_(0)

6. Define loss function

# Use the mean square error loss provided by the nn module as the loss function of the model
loss = nn.MSEloss

7. Define optimization algorithm

# Use the SGD optimization algorithm provided by torch.optim and set the learning rate to 0.03
import torch.optim as optim

optimizer = optim.SGD(net.parameters(), lr=0.03) # net.parameters() is the parameters that need to be learned
print(optimizer)

Of course, we can also set different learning rates for different network layers:

optimizer = optim.SGD([
    # If the learning rate is not specified for a parameter, the outermost default learning rate is used.
    {<!-- -->'params': net.subnet1.parameters()}, # lr=0.03
    {<!-- -->'params': net.subnet2.parameters(), 'lr': 0.01}
], lr=0.03)

Note: If the above code is written into the notebook and run, an error will be reported. The reason for the error is: AttributeError: ‘Sequential’ object has no attribute ‘subnet1’ (attribute error: There is no attribute sbunet1 in Sequential). The subnet1 and subnet2 here are the names of the network layers, and the linear in the code that defines the model are the same (I also have comments on that part of the code) ), because the above code does not define the network layers named subnet1 and subnet2, so an error is reported here. If you want the code to run, then commenting out this section of code is the correct solution. The detailed reason is explained here:AttributeError: Sequential’ object has no attribute subnet1’

If we do not want the learning rate to be a fixed constant, we can also use the following method to adjust the learning rate:

# Adjust learning rate
for param_group in optimizer.param_groups:
    param_group['lr'] *= 0.1 # The learning rate is 0.1 times the previous one

Thinking: Why is a for loop used here?

When there are multiple layers of networks, multiple learning rates can be set. Therefore, these learning rates can be modified using the above method.

8. Training model

num_epochs = 3 # Number of iteration cycles for training model
for epoch in range(1, num_epochs + 1): # I don’t quite understand why range(1, num_epochs + 1) is used instead of # range(num_epochs)
    for X, y in data_iter: # X is the feature value, y is the label
        output = net(X) # Use the net model to predict the feature value X and get the output result
        l = loss(output, y.view(-1, 1)) # Calculate the difference between the output result and the label
        optimizer.zero_grad() # Clear gradient
        l.backward()
        optimizer.step()
    print('epoch %d, loss: %f' % (epoch, l.item())) # Output the loss function of each iteration cycle

# Output the real parameters and learned parameters for comparison
dense = net[0]
print(true_w, dense.weight)
print(true_b, dense.bias)

9. Two errors

After typing all the code into the notebook, two errors occurred when running:

RuntimeError: Boolean value of Tensor with more than one value is ambiguous
AttributeError: Sequential’ object has no attribute subnet1’

The second error has been analyzed, now let’s analyze the first one:

RuntimeError: Boolean value of Tensor with more than one value is ambiguous (Runtime error: The Boolean value of Tensor has multiple values and is ambiguous). I searched for this error for a long time, and finally looked at the source code line by line. Looking at it, I found that when I defined the loss function, I forgot to add the following brackets, and it became like this: loss = nn.MSEloss