PyTorch example: training and backpropagation analysis of simple linear regression

Article directory

introduction
What is backpropagation?
Implementation of backpropagation (code)
Application of back propagation in deep learning
Chain derivation rule
Summarize

Introduction

In neural networks, the backpropagation algorithm is a key concept that plays a vital role in training neural networks. This article will deeply explore the principle, implementation and application of backpropagation algorithm in deep learning.

What is backpropagation?

Backpropagation is a supervised learning algorithm used to train neural networks. Its basic idea is to continuously adjust the weights and biases in the neural network so that it can gradually adapt to the characteristics of the input data, thereby achieving modeling and prediction of complex problems.

The core idea of the backpropagation algorithm is to update the parameters in the neural network by calculating the gradient of the loss function to reduce the error between the predicted value and the actual value. This process involves two key steps: forward propagation and backpropagation.

Forward propagation (forward): In the process of forward propagation, the input data passes through the neural network, and each layer will undergo a series of linear transformations and nonlinear activation function applications to finally obtain a predicted value. . This predicted value is compared with the actual label to obtain the value of the loss function.
Backward: During backpropagation, we calculate the gradient of the loss function with respect to each parameter in the network. This gradient tells us how to fine-tune each parameter to reduce the value of the loss function. Gradient descent algorithm is commonly used to update weights and biases.

Implementation of backpropagation (code)

To implement backpropagation, we need to choose a loss function, usually Mean Squared Error or Cross-Entropy. We then compute the partial derivatives (gradients) of the loss function with respect to each parameter. This can be done using the chain rule, passing layer by layer backward from the output layer.

Next, we update the weights and biases using gradient descent or a variant thereof. The core idea of gradient descent is to adjust parameters in the opposite direction of the gradient to reduce the value of the loss function. This process continues to iterate until the loss function converges to a smaller value or reaches a certain number of iterations.

Before implementing the code, I can first understand what backpropagation is. The following is mainly output in the form of graphics and text
Here we review the gradient, first assuming a simple linear model

Next, let’s show what forward propagation is (actually the literal meaning). In neural networks, it is usually shown on the right. It roughly means that the input x is multiplied by the weight w, and y’ is obtained.

The following figure is the core formula of stochastic gradient descent and the derivative of the loss function

The picture below is a two-layer neural network

If you understand it in the form of a picture, you can understand it from the picture below
First, the same as before, perform matrix multiplication of input and weight (here Liu Er recommends a query book MatrixCookbook)

Then b is introduced, and friends who don’t understand can use it as an intercept

So what’s inside the frame below is a layer of neural network

Then the two layers can be clearly obtained, and finally we get y’

The description just now is too general. Next, we will introduce forward and backward in detail.
In the forward propagation operation, f solves the partial derivative of z with respect to x and w.

In backpropagation, the partial derivative of loss to z is lost, and after f, the partial derivative of loss to x and w is obtained. It stands to reason that we only use the weight w, but if x is the output of the previous layer (multi-layer neural network), then it is needed. As for how to find the partial derivative of loss with respect to x and w, refer to the chain derivation rule at the end.

Next, we can assume that x=2, w=3, and manually solve the partial derivative of loss with respect to x and w. After completing the calculation, we can update the weights.

You can also clearly show the forward and backward propagation from the following calculation diagram.

If x=2, y=4, I wrote it down, please correct me if I’m wrong.

Here is a rough explanation of tensor in pytorch. It probably means that it is important. It also includes data that can store values and grad that stores gradients.

w.requires_grad = True # The default is not to automatically calculate the gradient, you need to design it yourself

The following is the complete code (with comments)

import torch
x_data = [1.0, 2.0, 3.0]
y_data = [2.0, 4.0, 6.0]
w = torch.Tensor([1.0])
w.requires_grad = True

def forward(x):
    return x * w # The weight w here is tensor
def loss(x, y):
    y_pred = forward(x)
    return (y_pred - y) ** 2

print("predict (before training)", 4, forward(4).item())
for epoch in range(100):
    for x, y in zip(x_data, y_data):
        l = loss(x, y) # forward propagation
        l.backward() # Backward propagation
        print('\tgrad:', x, y, w.grad.item()) # item is to prevent calculation graph
        w.data = w.data - 0.01 * w.grad.data # Be careful not to take grad directly, because this is also a re-creation of the calculation graph, as long as the value is
        w.grad.data.zero_() # Be careful to clear it, otherwise the derivative of loss with respect to w will continue to accumulate, as shown in the figure below
    print("progress:", epoch, l.item())
print("predict (after training)", 4, forward(4).item())

Model training is performed cyclically, and 100 training cycles (epochs) are set here.
In each cycle, the input data x_data and the corresponding target data y_data are traversed.
For each data point, forward propagation is computed and then backpropagation is performed to compute the gradient.
Print out the gradient value of weight w after each backpropagation.
Update the weight w and update the parameters using gradient descent to minimize the loss function.
Use .grad.data.zero_() to zero out gradients before updating weights to prevent gradient accumulation.
What .item() does is extract the values in the tensor into Python scalars for printing

The running results are as follows

Application of back propagation in deep learning

The backpropagation algorithm is widely used in deep learning. It enables neural networks to learn complex features and patterns, thus achieving remarkable achievements in various tasks such as image classification, natural language processing, and speech recognition.

Here are some applications of backpropagation in deep learning:

Image classification: Convolutional neural networks (CNNs) use backpropagation to learn image features for image classification tasks.
Natural language processing: Models such as recurrent neural networks (RNNs) and transformers use backpropagation to learn semantic representations of text data for tasks such as machine translation and sentiment analysis.
Reinforcement Learning: In reinforcement learning, backpropagation can be used to train an agent to make appropriate decisions in different environments.
Generative Adversarial Networks: Generative adversarial networks (GANs) use backpropagation to train generators and discriminators to produce realistic images, audio, or text.

Chain derivation rule

In neural networks, the chain derivation rule is a key concept, which is used to calculate the gradient of the weight parameters in the neural network to perform the backpropagation algorithm, which is the core of training neural networks. The following takes a simple neural network as an example to illustrate the application of the chain derivation rule in neural networks:

Suppose we have a simple neural network with an input layer, a hidden layer and an output layer. The output of the network can be expressed as:

y = f(g(h(x)))

in:

x is the input data.
h(x) is the activation function of the hidden layer.
g(h(x)) is the activation function of the output layer.
f(g(h(x))) is the final output of the network.

We want to compute the gradient of the loss function with respect to the network output y in order to update the network’s weight parameters to minimize the loss. Using the chain rule of derivation, we can break this problem down into steps:

First, calculate the gradient of the loss function ?L/?y with respect to the network output y, where L is the loss function.
Next, calculate the gradient of the output layer’s activation function with respect to its input, ?g(h(x))/?h(x).
Then, calculate the gradient of the hidden layer’s activation function with respect to its input, ?h(x)/?x.
Finally, these gradients are multiplied together to obtain the gradient of the loss function with respect to the input data x, ?L/?x, and used to update the weight parameters of the network.

The chain rule of derivation allows us to break the entire process into these steps and calculate the local gradient at each step. This is the key to the backpropagation algorithm in neural networks, which allows us to efficiently update the parameters of the network so that the network can learn complex mappings from inputs to outputs.

Summary

Backpropagation is one of the core algorithms in deep learning, which enables neural networks to automatically learn complex features and patterns, achieving great success in various tasks. Understanding the principles and implementation of backpropagation is very important for deep learning practitioners, as it is the basis for building and training neural networks. I hope this article was helpful and that a deeper understanding of backpropagation will help you better understand how deep learning works and its applications.

This article is organized based on the “PyTorch Deep Learning Practice” by Liu Er from station b after completing the collective study. The pictures and texts in the article do not belong to individuals.

Please add image description

Challenges and creations are painful, but very fulfilling.