Recurrent Neural Network (2) (Simple Recurrent Neural Network, implemented by keras)

Article directory

simple recurrent neural network
- model definition
- Model Features
Understanding Simple Recurrent Neural Networks
- NumPy implementation
- keras implementation
References

Recurrent neural network is a general term for a series of neural networks. Its main feature is to reuse the same structure on sequence data, model the dependency relationship in sequence data, and use it for prediction of sequence data. The simple recurrent neural network is the most basic model, most recurrent neural networks are extensions of it, and the learning algorithm is backpropagation.

Simple Recurrent Neural Network

When you’re reading this sentence, you’re reading it word by word (or, swipe by swipe at a time), remembering what came before. This allows you to dynamically understand the meaning conveyed by the sentence. Biological intelligence processes information in a progressive manner while maintaining an internal model of what is being processed, built from past information and constantly updated as new information comes in.

Recurrent neural network (RNN, recurrent neural network) adopts the same principle, but it is an extremely simplified version: the way it processes sequences is to traverse all sequence elements and save a state (state), It contains information related to the viewed content. In fact, RNNs are a class of neural networks with internal loops.

Model definition

Consider the problem of prediction on sequence data. sequence of real vectors for the given input

?
?

\bm{x_1},\bm{x_2},\cdots,\bm{x_T}

x1?,x2?,?,xT?; in

?
?

t=1,2,\cdots,T

t=1,2,?,T positions, for real vectors

\bm{x_t}

xt? makes predictions, giving the probability distribution

\bm{p_t}

pt? ; the sequence of probability vectors that produce the output as a whole

?
?

\bm{p_1},\bm{p_2},\cdots,\bm{p_T}

p1?, p2?,?, pT?.

A naive approach is to use feed-forward neural networks for this task. Assuming that the length of the sequence data is fixed, the input real number vector sequence is concatenated as the input of the feedforward neural network, and the output probability vector is concatenated as the output of the feedforward neural network. However, the length of the sequence data is usually variable, while the width of the input layer of the feed-forward neural network is fixed, need to truncate or complete the data. In addition, sequence data usually have similar local features at different positions, while the feedforward neural network represents and learns the local features at different positions separately, which creates redundancy and reduces the efficiency of representation and learning.

The basic idea of RNN is to reuse the same feed-forward neural network at each position of the sequence data, and connect the neural networks at adjacent positions; use the output of the hidden layer of the feed-forward neural network to represent the current position Status. The definition of a simple recurrent neural network is given below.

The following neural network is called a simple recurrent neural network. Neural Networks with Sequential Data

x

1

,

x

2

,

?
?

,

x

T

\bm{x_1},\bm{x_2},\cdots,\bm{x_T}

x1?,x2?,?,xT? are inputs, each term is a vector of real numbers. The same neural network structure is reused for each location. on the

t

t

At t positions, the hidden layer or middle layer of the neural network starts with

x

t

\bm{x_t}

xt? and

h

t

?

1

\bm{h_{t-1}}

ht?1? as input, take

h

t

\bm{h_t}

ht? is the output, and the following relationship holds:

h

t

=

tanh

(

u

?

h

t

?

1

+

W

?

x

t

+

b

)

\bm{h_t}=\text{tanh}(\bm{U}\cdot \bm{h_{t-1}} + \bm{W}\cdot \bm{x_t} + \bm{b})

ht?=tanh(U?ht?1? + W?xt? + b) where,

h

t

?

1

\bm{h_{t-1}}

ht?1? means the first

t

?

1

t-1

The state of t?1 positions is also a real vector;

h

t

\bm{h_t}

ht? means the first

t

t

The state of t positions is also a real vector;

u

,

W

\bm{U}, \bm{W}

U, W is the weight matrix;

b

\bm{b}

b is the bias vector. The output layer of the neural network starts with

h

t

\bm{h_t}

ht? is the input,

p

t

\bm{p_t}

pt? is the output, the following relations hold:

p

t

=

softmax

(

V

?

h

t

+

c

)

\bm{p_t}=\text{softmax}(\bm{V}\cdot \bm{h_t} + \bm{c})

pt?=softmax(V?ht? + c)

The figure below shows the architecture of a simple recurrent neural network. Can be seen as a feed-forward neural network unfolded on sequential data, where parameters are shared across locations.

Model features

Not only the space of inputs and the space of outputs is involved in the definition, but also the space of states, and the space of states plays an important role. The sequence of real vectors of input given in turn by the simple recurrent neural network

?
?

\bm{x_1},\bm{x_2},\cdots,\bm{x_T}

x1?,x2?,?,xT?, first generate the real number vector sequence of the state in turn

?
?

\bm{h_1},\bm{h_2},\cdots,\bm{h_T}

h1?,h2?,?,hT?, and then sequentially generate the output probability vector sequence

?
?

\bm{p_1},\bm{p_2},\cdots,\bm{p_T}

p1?, p2?,?, pT?. In this process, the core role is

tanh

(

)

\bm{h_t}=\text{tanh}(\bm{U}\cdot \bm{h_{t-1}} + \bm{W}\cdot \bm{x_t} + \bm{b})

ht?=tanh(U?ht?1? + W?xt? + b) Nonlinear transformation, which means the state of the current position

\bm{h_t}

ht? input by current location

\bm{x_t}

xt? and the state of the previous position

\bm{h_{t-1}}

ht?1? decision. The state of each position represents the local and global features of the sequence data up to this position, also known as short-distance dependencies and long-distance dependencies.

The calculation of the recurrent neural network needs to be performed sequentially on the sequence data. The advantage of the cyclic neural network is that it can process sequence data of any length, but the disadvantage is that it cannot be processed in parallel to improve computational efficiency.

Understanding simple recurrent neural networks

NumPy implementation

We use Numpy to implement the forward pass of a simple RNN. The input to this RNN is a sequence of tensors, which we encode into a 2D tensor of size (timesteps, input_features). It traverses the time step (timestep), at each time step, it considers the current state at time t and the input at time t [shape is (input_ features,)], and calculates the two to get output at time t. Then, we set the state of the next time step as the output of the previous time step. For the first time step, the output of the previous time step is undefined, so it has no current state. Therefore, an all-zero vector needs to be initialized, which is called the initial state of the network.

import numpy as np

timesteps = 100
input_features = 32
output_features = 64

inputs = np.random.random((timesteps, input_features))

state_t = np.zeros((output_features,)) # initial state

W = np.random.random((output_features, input_features))
U = np.random.random((output_features, output_features))
b = np.random.random((output_features,))

successive_outputs = []
for input_t in inputs:
output_t = np.tanh(np.dot(W, input_t) + np.dot(U, state_t) + b)
successive_outputs.append(output_t)
state_t = output_t # update state

final_output_sequence = np.stack(successive_outputs, axis=0)

In this case, the final output is a 2D tensor of shape (timesteps, output_features) where each timestep is the output of the loop at time t. Each time step t in the output tensor contains information from time steps 0~t in the input sequence, i.e. information about the entire past. Therefore, in most cases, we don’t need this sequence of all outputs, only the last output (output_t at the end of the loop), because it already contains the information of the entire sequence.

keras implementation

The simple implementation of Numpy above corresponds to an actual Keras layer, the SimpleRNN layer.

from keras.layers import SimpleRNN

There is a small difference between the two: the SimpleRNN layer is able to process batches of sequences like other Keras layers, instead of just a single sequence like the Numpy example. Therefore, it receives an input of shape (batch_size, timesteps, input_features) instead of (timesteps, input_features).

Like all recurrent layers in Keras, SimpleRNN can operate in two different modes: one returns a full sequence of successive outputs at each timestep, i.e. of shape (batch_size, timesteps, output_features); the other is to return only the final output of each input sequence, which is a two-dimensional tensor of shape (batch_size, output_features). These two modes are controlled by the return_sequences constructor parameter. Let’s look at an example using SimpleRNN that only returns the output of the last time step.

from keras.layers import Embedding, SimpleRNN

model = Sequential()
model.add(Embedding(10000, 32))
model.add(SimpleRNN(32, return_sequences=True)) # return the complete state sequence
model.summary()
"""
Model: "sequential"
____________________________________________________________________
 Layer (type) Output Shape Param #
==================================================== ================
 embedding (Embedding) (None, None, 32) 320000
                                                                 
 simple_rnn (SimpleRNN) (None, None, 32) 2080
                                                                 
==================================================== ================
Total params: 322,080
Trainable params: 322,080
Non-trainable params: 0
____________________________________________________________________
"""

It is also sometimes useful to stack multiple recurrent layers one on top of the other in order to increase the representational power of the network. In this case, it is necessary to have all intermediate layers return the full output sequence.

model = Sequential()
model.add(Embedding(10000, 32))
model.add(SimpleRNN(32, return_sequences=True))
model.add(SimpleRNN(32, return_sequences=True))
model.add(SimpleRNN(32, return_sequences=True))
model.add(SimpleRNN(32))

Next, we apply this model to the IMDB movie review classification problem. First, preprocess the data.

from keras.datasets import imdb
from keras.utils import pad_sequences

max_features = 10000
maxlen = 500
batch_size = 32

(input_train, y_train), (input_test, y_test) = imdb.load_data(
    num_words=max_features)

print(len(input_train), 'train sequences')
print(len(input_test), 'test sequences')

input_train = pad_sequences(input_train, maxlen=maxlen)
input_test = pad_sequences(input_test, maxlen=maxlen)
print('input_train shape:', input_train. shape)
print('input_test shape:', input_test.shape)
"""
25000 train sequences
25000 test sequences
input_train shape: (25000, 500)
input_test shape: (25000, 500)
"""

We train a simple recurrent network with a Embedding layer and a SimpleRNN layer.

from keras.layers import Dense

model = Sequential()
model.add(Embedding(max_features, 32))
model.add(SimpleRNN(32))
model.add(Dense(1, activation='sigmoid'))

model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['acc'])

history = model.fit(input_train, y_train,
                    epochs=10,
                    batch_size=128,
                    validation_split=0.2)

import matplotlib.pyplot as plt

acc = history. history['acc']
val_acc = history. history['val_acc']
loss = history. history['loss']
val_loss = history.history['val_loss']

epochs = range(1, len(acc) + 1)
plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt. title('Training and validation accuracy')
plt. legend()

plt. figure()

plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt. title('Training and validation loss')
plt. legend()

plt. show()

References

[1] Deep Learning with Python, Franc?ois Chollet.
[2] “Machine Learning Methods”, Li Hang, Tsinghua University Press.