Weather Forecasting Based on LSTM – Time Series Forecasting Computer Competition

0 Preface

A series of high-quality competition projects, what I want to share today is

Machine Learning Big Data Analysis Project

This project is relatively new and suitable as a competition topic. It is highly recommended by senior students!

More information, project sharing:

https://gitee.com/dancheng-senior/postgraduate

1 Introduction to data sets

?
df = pd.read_csv(/home/kesci/input/jena1246/jena_climate_2009_2016.csv’)
df.head()

As shown above, observations are recorded every 10 minutes, there are 6 observations in an hour, and 144 (6×24) observations in a day.

Given a specific time, let’s say you want to predict the temperature for the next 6 hours. To make this prediction, a 5-day observation period was chosen. Therefore, create a window containing the last 720 (5×144) observations to train the model.

The function below returns the above time window for model training. The parameter history_size is the sliding window size of past information. target_size
It is the future time step that the model needs to learn to predict, and also serves as the label that needs to be predicted.

The first 300,000 rows of the data are used as the training data set below, and the rest are used as the validation data set. A total of about 2100 days of training data.

?
def univariate_data(dataset, start_index, end_index, history_size, target_size):
data = []
labels = []

 start_index = start_index + history_size
    if end_index is None:
        end_index = len(dataset) - target_size

    for i in range(start_index, end_index):
        indices = range(i-history_size, i)
        # Reshape data from (history`1_size,) to (history_size, 1)
        data.append(np.reshape(dataset[indices], (history_size, 1)))
        labels.append(dataset[i + target_size])
    return np.array(data), np.array(labels)

2 Start analysis

2.1 Univariate analysis

First, a model is trained using a feature (temperature) and then used to make predictions.

2.1.1 Temperature variable

Extract temperature from data set

?
uni_data = df[T (degC)’]
uni_data.index = df[Date Time’]
uni_data.head()

Observe changes in data over time

Standardize

?
#standardization
uni_train_mean = uni_data[:TRAIN_SPLIT].mean()
uni_train_std = uni_data[:TRAIN_SPLIT].std()

uni_data = (uni_data-uni_train_mean)/uni_train_std
#Write functions to divide features and labels
univariate_past_history = 20
univariate_future_target = 0
x_train_uni, y_train_uni = univariate_data(uni_data, 0, TRAIN_SPLIT, # Starting and ending intervals
                                           univariate_past_history,
                                           univariate_future_target)
x_val_uni, y_val_uni = univariate_data(uni_data, TRAIN_SPLIT, None,
                                       univariate_past_history,
                                       univariate_future_target)

It can be seen that the feature of the first sample is the temperature of the first 20 time points, and its label is the temperature of the 21st time point. According to the same rule, the characteristics of the second sample are the temperature value at the 2nd time point to the temperature value at the 21st time point, and its label is the temperature at the 22nd time point…

2.2 Slice features and labels

?
BATCH_SIZE = 256
BUFFER_SIZE = 10000

train_univariate = tf.data.Dataset.from_tensor_slices((x_train_uni, y_train_uni))
train_univariate = train_univariate.cache().shuffle(BUFFER_SIZE).batch(BATCH_SIZE).repeat()

val_univariate = tf.data.Dataset.from_tensor_slices((x_val_uni, y_val_uni))
val_univariate = val_univariate.batch(BATCH_SIZE).repeat()

2.3 Modeling

?
simple_lstm_model = tf.keras.models.Sequential([
tf.keras.layers.LSTM(8, input_shape=x_train_uni.shape[-2:]), # input_shape=(20,1) does not contain batch dimensions
tf.keras.layers.Dense(1)
])

simple_lstm_model.compile(optimizer='adam', loss='mae')

2.4 Training model

?
EVALUATION_INTERVAL = 200
EPOCHS = 10

simple_lstm_model.fit(train_univariate, epochs=EPOCHS,
                      steps_per_epoch=EVALUATION_INTERVAL,
                      validation_data=val_univariate, validation_steps=50)

training process

Training results – temperature prediction results

2.5 Multivariate analysis

Here, we use some past pressure information, temperature information, and density information to predict the temperature at a point in time in the future. In other words, the data set should include pressure information, temperature information, and density information.

2.5.1 Plot of pressure, temperature and density changing with time

2.5.2 Convert the data set to array type and normalize

?
dataset = features.values
data_mean = dataset[:TRAIN_SPLIT].mean(axis=0)
data_std = dataset[:TRAIN_SPLIT].std(axis=0)

dataset = (dataset-data_mean)/data_std

def multivariate_data(dataset, target, start_index, end_index, history_size,
                      target_size, step, single_step=False):
    data = []
    labels = []

    start_index = start_index + history_size
    
    if end_index is None:
        end_index = len(dataset) - target_size

    for i in range(start_index, end_index):
        indices = range(i-history_size, i, step) # step represents the sliding step size
        data.append(dataset[indices])

        if single_step:
            labels.append(target[i + target_size])
        else:
            labels.append(target[i:i + target_size])

    return np.array(data), np.array(labels)

2.5.3 Multivariable modeling training training


    single_step_model = tf.keras.models.Sequential()
    single_step_model.add(tf.keras.layers.LSTM(32,
                                               input_shape=x_train_single.shape[-2:]))
    single_step_model.add(tf.keras.layers.Dense(1))
    
    single_step_model.compile(optimizer=tf.keras.optimizers.RMSprop(), loss='mae')
    
    single_step_history = single_step_model.fit(train_data_single, epochs=EPOCHS,
                                                steps_per_epoch=EVALUATION_INTERVAL,
                                                validation_data=val_data_single,
                                                validation_steps=50)


    def plot_train_history(history, title):
        loss = history.history['loss']
        val_loss = history.history['val_loss']
    
        epochs = range(len(loss))
    
        plt.figure()
    
        plt.plot(epochs, loss, 'b', label='Training loss')
        plt.plot(epochs, val_loss, 'r', label='Validation loss')
        plt.title(title)
        plt.legend()
    
        plt.show()

    plot_train_history(single_step_history,
                       'Single Step Training and validation loss')

6 Finally

More information, project sharing:

https://gitee.com/dancheng-senior/postgraduate