《Dive-into-DL-Pytorch》-Simple implementation of linear regression (linear regression pytorch)
I started learning deep learning on November 3rd. I don’t understand a lot of things. I tried my best to give my own understanding of the code in the book and analyze it.
There are two places in the book that will cause errors when running directly. We also briefly analyze them. Because I have just learned it recently, my understanding of many places may not be very correct.
0.Model
Let the first feature
(
f
e
a
t
u
r
e
)
for
x
1
, the second feature is
x
2
, real label
(
l
a
b
e
l
)
for
y
:
y
^
=
ω
1
x
1
+
ω
2
x
2
+
b
ω
1
and
ω
2
is the weight
(
w
e
i
g
h
t
)
,
b
It’s a deviation
(
b
i
a
s
)
, which are the parameters of the linear regression model
(
p
a
r
a
m
e
t
e
r
)
.
y
^
is linear regression on the true label
y
Prediction
Let the first feature be x_1, the second feature be x_2, and the real label be y: \ \hat y = \omega_1x_1 + \omega_2x_2 + b \ \ omega_1 and \omega_2 are weights, b is bias, and they are parameters of the linear regression model. \hat y is the prediction of the real label y by linear regression
Let the first feature be x1?, the second feature be x2?, and the real label be y: y^?=ω1?x1? + ω2?x2? + bω1? and ω2? are the weights ( weight), b is the bias, which are the parameters of the linear regression model. y^? is the linear regression prediction of the true label y
1. Import library
import torch import numpy as np import random from torch import nn
2. Generate data set
Given a randomly generated batch of sample features
X
∈
R
1000
×
2
, using real weights
ω
=
[
2
,
?
3.4
]
T
, deviation
b
=
4.2
and a random noise term
?
to generate labels
y
=
X
ω
+
b
+
?
Given randomly generated batch sample features \boldsymbol{X}\in\mathbb{R}^{1000\times2}, using real weights \boldsymbol{\omega} = [2, -3.4]^ T, bias b = 4.2 and a random noise term \epsilon to generate labels \ \mathbf{y} = \boldsymbol{X}\boldsymbol{\omega} + b + \epsilon
Given a randomly generated batch of sample features
num_inputs = 2 # Number of inputs for training data (number of features) num_examples = 1000 # Number of samples of training data true_w = [2, -3.4] #True weight (there are two features, so there are two weights) true_b = 4.2 # True deviation features = torch.tensor(np.random.normal(0, 1, (num_examples, num_inputs)), dtype=torch.float) # Use normal distribution to randomly generate two features of each sample labels = true_w[0] * features[:, 0] + true_w[1] * features[:, 1] + true_b # Calculate true labels through features and deviations labels + = torch.tensor(np.random.normal(0, 0.01, size=labels.size()), dtype=torch.float) # The labels calculated above are all a constant value, so a random value is added here Noise terms represent meaningless interference in the data set # The noise term follows a normal distribution with a mean of 0 and a standard deviation of 0.01
3. Read data
# Import the data package provided by pytorch to read data import torch.utils.data as Data # Define the number of mini-batch samples batch_size = 10 # Combine features and labels of training data dataset = Data.TensorDataset(features, labels) # Import dataset into dataloader data_iter = Data.DataLoader(dataset, batch_size, shuffle=True) # The first parameter is the imported data set #The second parameter is the number of samples read each time # The third parameter is whether to scramble the data #Print the first mini-batch of data for X, y in data_iter: print(X, y) break
4. Define the model
class LinearNet(nn.Module): # Defines a class LinearNet that inherits from nn.Module def __init__(self, n_feature): super(LinearNet, self).__init__() self.linear = nn.Linear(n_feature, 1) # nn.Linear defines a linear layer of a neural network # n_feature is the number of input features # 1 is the number of output features # forward defines forward propagation def forward(self, x): # Accept an input tensor x and return the output tensor y after passing through the linear layer y = self.linear(x) return y net = LinearNet(num_inputs) print(net) # Use print to print out the structure of the network
The book also teaches the use of nn.Sequential
to build a network. Sequential
is an ordered container. The network layer will follow the incoming Sequential
> is added to the calculation graph in sequence:
# Writing method one net = nn.Sequential( nn.Linear(num_inputs, 1) # Other layers can also be passed in here ) # Writing method two net = nn.Sequential() net.add_module('linear', nn.Linear(num_inputs, 1)) # linear is the name of the layer # net.add_module... Other layers can also be passed in here # Writing method three from collections import OrderedDict net = nn.Sequential(OrderedDict([ ('linear', nn.Linear(num_inputs, 1)) # linear is the name of the layer #... Other layers can also be passed in here ])) print(net) print(net[0]) # Output the first layer of network. The above three writing methods only create one layer of network.
You can view all learnable parameters of the model through net.patameters()
:
for param in net.parameters(): print(param)
5.Initialize model parameters
Here is the initialization of the model parameters of net[0]
, that is, defining the first layer of network passed in the model. In fact, the above code only passes in one layer of network.
# We use init.normal_ to initialize each element of the weight parameter to be randomly sampled from a normal distribution with a mean of 0 and a standard deviation of 0.01, and the deviation will be initialized to 0 from torch.nn import init init.normal_(net[0].weight, mean=0, std=0.01) init.constant_(net[0].bias, val=0.0) # You can also directly modify the bias data: net[0].bias.data.fill_(0)
6. Define loss function
# Use the mean square error loss provided by the nn module as the loss function of the model loss = nn.MSEloss
7. Define optimization algorithm
# Use the SGD optimization algorithm provided by torch.optim and set the learning rate to 0.03 import torch.optim as optim optimizer = optim.SGD(net.parameters(), lr=0.03) # net.parameters() is the parameters that need to be learned print(optimizer)
Of course, we can also set different learning rates for different network layers:
optimizer = optim.SGD([ # If the learning rate is not specified for a parameter, the outermost default learning rate is used. {<!-- -->'params': net.subnet1.parameters()}, # lr=0.03 {<!-- -->'params': net.subnet2.parameters(), 'lr': 0.01} ], lr=0.03)
Note: If the above code is written into the notebook and run, an error will be reported. The reason for the error is: AttributeError: ‘Sequential’ object has no attribute ‘subnet1’ (attribute error: There is no attribute
). The sbunet1
in Sequentialsubnet1
and subnet2
here are the names of the network layers, and the linear
in the code that defines the model are the same (I also have comments on that part of the code) ), because the above code does not define the network layers named subnet1
and subnet2
, so an error is reported here. If you want the code to run, then commenting out this section of code is the correct solution. The detailed reason is explained here:AttributeError: Sequential’ object has no attribute subnet1’
If we do not want the learning rate to be a fixed constant, we can also use the following method to adjust the learning rate:
# Adjust learning rate for param_group in optimizer.param_groups: param_group['lr'] *= 0.1 # The learning rate is 0.1 times the previous one
Thinking: Why is a for
loop used here?
When there are multiple layers of networks, multiple learning rates can be set. Therefore, these learning rates can be modified using the above method.
8. Training model
num_epochs = 3 # Number of iteration cycles for training model for epoch in range(1, num_epochs + 1): # I don’t quite understand why range(1, num_epochs + 1) is used instead of # range(num_epochs) for X, y in data_iter: # X is the feature value, y is the label output = net(X) # Use the net model to predict the feature value X and get the output result l = loss(output, y.view(-1, 1)) # Calculate the difference between the output result and the label optimizer.zero_grad() # Clear gradient l.backward() optimizer.step() print('epoch %d, loss: %f' % (epoch, l.item())) # Output the loss function of each iteration cycle
# Output the real parameters and learned parameters for comparison dense = net[0] print(true_w, dense.weight) print(true_b, dense.bias)
9. Two errors
After typing all the code into the notebook, two errors occurred when running:
- RuntimeError: Boolean value of Tensor with more than one value is ambiguous
- AttributeError: Sequential’ object has no attribute subnet1’
The second error has been analyzed, now let’s analyze the first one:
RuntimeError: Boolean value of Tensor with more than one value is ambiguous (Runtime error: The Boolean value of Tensor has multiple values and is ambiguous). I searched for this error for a long time, and finally looked at the source code line by line. Looking at it, I found that when I defined the loss function, I forgot to add the following brackets, and it became like this: loss = nn.MSEloss