pytorch+LSTM implements single-parameter prediction and multi-parameter prediction (code annotation version)

Preparation before development:

Environmental Management: Anaconda
python: 3.8
Graphics card: NVIDIA3060
pytorch: Go to the official website to select the conda version, using CUDA11.8
Compiler: PyCharm

Brief description:

This time we use the flights data set in the seaborn library for experiments. We predict the number of people flying in the future month by obtaining the relationship between the year and month and the number of people flying. (Note: A lot of information is in the comments, so I won’t explain it in detail. Please read more comments)

Modules that need to be imported

import torch
import torch.nn as nn

import seaborn as sns
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Use from instead of import because we only need to import the MinMaxScaler class and do not need to access other functions or variables in the module
from sklearn.preprocessing import MinMaxScaler
# Custom module, used to write and construct lstm
import method

Getting data sets and processing

There are two ways to get it, one is to pull it from the Internet, and the other is to download it locally. Because of network problems, I downloaded it locally. There are also ways to get it online, but it is commented out.

# Get the seaborn data set
# dataset_names = sns.get_dataset_names()
#Print data set name
# for name in dataset_names:
# print(name)
# Because the download always reports an error, load it from local
# Using the flight data set, the data type obtained is DataFrame in Pandas
flight_data = sns.load_dataset("flights", data_home='C:/Users/51699/Desktop/seaborn-data', cache=True)
#Print the first 5 rows of data to see what shape the data is
print("Approximate structure----")
print(flight_data.head())
# Print the data shape. The result is (144,3), 144/12=12, which means there are 12 years of data.
print("Data shape---")
print(flight_data.shape)

You can see the printed data structure, including year, month, and passenger information. The data type of flight_data is DataFrame. The data shape is a matrix with 144 rows and 3 columns, 144/12=12, which means there are a total of 12 years of data and 144 months.

The following gets the column names from the data set and extracts the passenger data from them

print("Column name information----------------------------------------- ----------------------------------")
print(columns)
# Get all the data under the passengers column and convert it into floating point numbers
all_data = flight_data['passengers'].values.astype(float)
print("Passenger number data-------------------------------------------------- ----------------------------------")
print(all_data)

Next, we need to use the last 12 pieces of 144 pieces of data as test data, and 144-12=132 pieces of data as training data. Then normalize the segmented data to eliminate feature relationships. As you can see from the printed content below, all data is limited to between -1 and 1.

# The total amount of data is 144. We use the first 132 items for training and the last 12 items for testing, so the data needs to be divided into a training set and a test set.
test_data_size = 12
# Use all elements in all_data except the last test_data_size elements as the training set and assign them to the variable train_data
train_data = all_data[:-test_data_size]
print("Training set length---")
print(len(train_data))
# Use the last test_data_size elements in all_data as the test set and assign them to the variable test_data
test_data = all_data[-test_data_size:]
print("The test set is long---")
print(len(test_data))
# Normalization processing reduces the number of passengers to between -1 and 1. The purpose is to unify the data dimensions of different features, eliminate the dimensional influence between features, and make different features comparable.
scaler = MinMaxScaler(feature_range=(-1, 1))
train_data_normalized = scaler.fit_transform(train_data.reshape(-1, 1))
print("The first 5 and last 5 data after normalization--------------------------------- --------------------------------------------------")
print(train_data_normalized[:5])
print(train_data_normalized[-5:])

Convert the normalized passenger data into tensor. Only tensors can be operated by GPU.

# Convert the normalized passenger data into tensor tensors, because PyTorch models are trained using tensor tensors, and the parameter -1 indicates that the size of the dimension is automatically inferred based on the data.
# This means that PyTorch will dynamically determine the dimensions of the tensor based on the length and shape of the data.
train_data_normalized = torch.FloatTensor(train_data_normalized).view(-1)
print("Passenger np converted to PyTorch tensor---------------------------------------- ----------------------------------")
print(train_data_normalized)

So far, we already have a one-dimensional tensor. Next, we need to create a training set. The training set generally contains training data and the labels corresponding to this set of training data. Because there are 12 months in a year, we take the first to the 12th in the data as the training data, and the 13th as the label. This is the first set of data. In the second group, we take the 2nd to 13th data as the training data, and the 14th as the label. This is the second group of data, and so on, we have 132 groups of training data.

# Convert our training data into sequences and corresponding labels, any sequence length can be used, depending on domain knowledge. However, in our dataset, since we have monthly data and there are 12 months in a year, it is convenient to use a series length of 12
train_window = 12
# As can be seen from the print below, the first tensor in the first tensor is the training array, and the content is the value from January to December; the second is the label array, and the content is the value from 13 months.
# The first training array in the second tensor is the value from February to March 13, and the second label array is the value from April 14
# The total number of training sets is 132, and every 12 is a group. The 13th is the label. Each time it moves back one number, so there are 132 groups.
train_inout_seq = method.create_inout_sequences(train_data_normalized, train_window)
print("Training sequence and corresponding label:--------------------------------------------- ----------------------------")
print(train_inout_seq)

Among them method.create_inout_sequences is a method in another custom module

# Convert our training data into sequences and corresponding labels
def create_inout_sequences(input_data, tw):
    inout_seq = []
    L = len(input_data)
    for i in range(L - tw):
        train_seq = input_data[i:i + tw]
        train_label = input_data[i + tw:i + tw + 1]
        inout_seq.append((train_seq, train_label))
    return inout_seq

The red box is the training data, and the blue box is the label.

Define model class

Next, continue to define our class LSTM in the method module. Among them, lstm_out, self.hidden_cell = self.lstm(input_seq.view(len(input_seq), 1, -1), self.hidden_cell) is actually the impose function of pytorch. It needs to receive a three-dimensional vector, because LSTM The hidden layer needs to receive three parameters.

# Define LSTM model
class LSTM(nn.Module):
    # Constructor, initialize network usage
    # input_size: Corresponds to the number of features in the input. Although our sequence length is 12, there will only be 1 value for each month, which is the total number of passengers, so the input size will be 1
    #hidden_layer_size: Specify the number of neurons in each layer. We will have a hidden layer of 100 neurons
    # output_size: The number of items in the output, since we want to predict the number of passengers in the next 1 month, the output size will be 1
    def __init__(self, input_size=1, hidden_layer_size=100, output_size=1):
        super().__init__()
        self.hidden_layer_size = hidden_layer_size
        self.lstm = nn.LSTM(input_size, hidden_layer_size)

    def forward(self, input_seq):
        # self.lstm is the lstm that has been instantiated. The first parameter is the input sequence, and the second parameter is the state of the hidden layer. The forward propagation function is implicitly called, which is essentially the input method.
        #The return value lstm_out is the final output, hidden_cell is the state of the hidden layer
        # print(input_seq.view(len(input_seq), 1, -1))
        # input_seq.view(len(input_seq), 1, -1) needs to be converted into a 3-dimensional tensor, because the hidden layer of LSTM receives three parameters
        lstm_out, self.hidden_cell = self.lstm(input_seq.view(len(input_seq), 1, -1), self.hidden_cell)
        # self.linear is a fully connected (linear) neural network layer
        predictions = self.linear(lstm_out.view(len(input_seq), -1))
        # Return the last element in the linear layer output tensor as the final predicted value
        return predictions[-1]

Initialize model information

Next, we need to declare the LSTM class and some initialization. The update methods of the loss function and step size are explained in the comments.

# Create an LSTM model object for processing sequence data
model = method.LSTM()
#Create a mean square error loss function object to calculate the difference between the predicted value and the true value
loss_function = nn.MSELoss()
#Create an Adam optimizer object to update model parameters to minimize the loss function
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
# Define a fully connected (linear) neural network layer in PyTorch and add it to the model
model.add_module('linear', nn.Linear(100, 1))
print("Model information: ---")
print(model)

Training model

When training a model, each set of models must clear the hidden layer information left by the previous set of training. The main reason for gradient clearing is to solve the problem of gradient disappearance and gradient explosion. y_pred = model(seq) calls the forward in the LSTM class above,

epochs = 150
for i in range(epochs):
    for seq, labels in train_inout_seq: # Traverse the training data
        optimizer.zero_grad() # Clear gradient
        model.hidden_cell = (torch.zeros(1, 1, model.hidden_layer_size),
                             torch.zeros(1, 1, model.hidden_layer_size)) #Initialize the hidden layer state (not a parameter)
        # print(seq)
        # print(labels)
        y_pred = model(seq) # Model forward propagation

        single_loss = loss_function(y_pred, labels) # Calculate the loss function
        single_loss.backward() # Reverse propagation to find the gradient
        optimizer.step() # Update parameters

    if i % 25 == 1: # Print loss every 25 epochs
        print(f'epoch: {<!-- -->i:3} loss: {<!-- -->single_loss.item():10.8f}')
print(f'epoch: {<!-- -->i:3} loss: {<!-- -->single_loss.item():10.10f}') # Print the final loss

Input_seq.view(len(input_seq), 1, -1) in the forward method in the LSTM class is deformed as follows. These are the data that will be entered into the LSTM for training in the first round. One entry at a time, total entry 12 times, forward propagation, and then returns a mean square error, which is used to calculate the loss with the label value, calculate the gradient, and back propagation to update the parameter weights in the hidden layer.

The figure below shows the training process of the first set of data.

The following is the error value between each 25 sets of data and the label after training.

Test predictions

The following is the test set code. The comments are very clear, so I won’t go into details.

# Predict the number of passengers in the test set
fut_pred = 12
# Get the last 12 months of data
test_inputs = train_data_normalized[-train_window:].tolist()
print(test_inputs)
# Set the model to evaluation mode
model.eval()
for i in range(fut_pred):
    # Convert input data to PyTorch tensors
    seq = torch.FloatTensor(test_inputs[-train_window:])
    # Clear hidden layer status
    with torch.no_grad():
        model.hidden = (torch.zeros(1, 1, model.hidden_layer_size),
                        torch.zeros(1, 1, model.hidden_layer_size))
        print(seq)
        # In the first cycle, the number of passengers in the 13th month will be predicted
        # In the second cycle, the result of the first prediction will be used as December
        # tensor([0.1253, 0.0462, 0.3275, 0.2835, 0.3890, 0.6176, 0.9516, 1.0000, 0.5780,
        # 0.3319, 0.1341, 0.3231])
        # tensor([0.0462, 0.3275, 0.2835, 0.3890, 0.6176, 0.9516, 1.0000, 0.5780, 0.3319,
        # 0.1341, 0.3231, 0.2997])
        test_inputs.append(model(seq).item())
#Restore the prediction results to the original data range
actual_predictions = scaler.inverse_transform(np.array(test_inputs[train_window:] ).reshape(-1, 1))
print(actual_predictions)

The following is the data forecasted for 12 months.

Multi-parameter prediction

The above test uses a single parameter, but in order to make the model fit better, multiple parameters must be used. Then the __init__ method and input_size in the above LSTM class need to be changed. For example, we want to predict now The number of people flying every day has three conditions: air ticket price, weather, and humidity. The amount of data is one year. Because there are three eigenvalues, then input_size is 3, then we use 3 days as the sequence length, then the training set should be
Training value x=[
[ [0.1,0.2,0.3] ,[0.4,0.5,0.6] ,[0.7,0.8,0.9] ] ,//The first day, the second day, the third day
[ [0.4,0.5,0.6] ,[0.7,0.8,0.9] ,[0.11,0.12,0.13] ],//The second day, the third day, the fourth day
…
All the way to line 363. It is the 363rd day, because there is no data for 366 days
]
Tag value y=[0.8, 0.7, 0.2…all the way to 365th]
0.8 represents the first row of label values in x, which means the number of people on the fourth day.

The structure of the neural network is roughly as above. The red arrow indicates the input of the hidden layer in the last training, and the red box indicates the input layer parameters. It can be seen that a three-dimensional tensor still needs to be input, one of which is input by the input layer, and the other two is the input to the hidden layer at the last moment.