LSTM and BiLSTM of NLP

Article directory

Code display
Code interpretation
Introduction to Bidirectional LSTM (BiLSTM)

Code display

import pandas as pd
import tensorflow astf
tf.random.set_seed(1)
df = pd.read_csv("../data/Clothing Reviews.csv")
print(df.info())

df['Review Text'] = df['Review Text'].astype(str)
x_train = df['Review Text']
y_train = df['Rating']
print(y_train.unique())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 23486 entries, 0 to 23485
Data columns (total 11 columns):
 # Column Non-Null Count Dtype
--- ------ -------------- -----
 0 Unnamed: 0 23486 non-null int64
 1 Clothing ID 23486 non-null int64
 2 Age 23486 non-null int64
 3 Title 19676 non-null object
 4 Review Text 22641 non-null object
 5 Rating 23486 non-null int64
 6 Recommended IND 23486 non-null int64
 7 Positive Feedback Count 23486 non-null int64
 8 Division Name 23472 non-null object
 9 Department Name 23472 non-null object
 10 Class Name 23472 non-null object

[4 5 3 2 1]

from tensorflow.keras.preprocessing.text import Tokenizer

dict_size = 14848
tokenizer = Tokenizer(num_words=dict_size)

tokenizer.fit_on_texts(x_train)
print(len(tokenizer.word_index),tokenizer.index_word)

x_train_tokenized = tokenizer.texts_to_sequences(x_train)
from tensorflow.keras.preprocessing.sequence import pad_sequences
max_comment_length = 120
x_train = pad_sequences(x_train_tokenized,maxlen=max_comment_length)

for v in x_train[:10]:
    print(v,len(v))

# Build RNN neural network
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense,SimpleRNN,Embedding,LSTM,Bidirectional
import tensorflow astf

rnn = Sequential()
# For rnn, first perform the word vector operation
rnn.add(Embedding(input_dim=dict_size,output_dim=60,input_length=max_comment_length))
# RNN: simple_rnn (SimpleRNN) (None, 100) 16100
# LSTM: simple_rnn (SimpleRNN) (None, 100) 64400
rnn.add(Bidirectional(LSTM(units=100))) # The second layer constructs 100 RNN neurons
rnn.add(Dense(units=10,activation=tf.nn.relu))
rnn.add(Dense(units=6,activation=tf.nn.softmax)) # Output the classification results
rnn.compile(loss='sparse_categorical_crossentropy',optimizer="adam",metrics=['accuracy'])
print(rnn.summary())
result = rnn.fit(x_train,y_train,batch_size=64,validation_split=0.3,epochs=10)
print(result)
print(result.history)

Code interpretation

First, let’s summarize the flow of this code:

The necessary TensorFlow Keras modules are imported.
A Sequential model is initialized, which means that our model will stack layers in order.
An Embedding layer was added to convert integer indices (corresponding words) into dense vectors.
A bidirectional LSTM layer with 100 neurons was added.
Two Dense fully connected layers are added, containing 10 and 6 neurons respectively.
The model was compiled using the sparse_categorical_crossentropy loss function.
A summary of the model is printed.
The model was trained using the given training and validation data.
Printed the training results.

Now, let’s unpack the code line by line:

Import dependencies:

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense,SimpleRNN,Embedding,LSTM,Bidirectional
import tensorflow astf

You imported the TensorFlow Keras library required to create and train RNN models.

Initialize model:

rnn = Sequential()

You’ve chosen a sequential model, which means you can simply add layers in sequence.

Add Embedding layer:

rnn.add(Embedding(input_dim=dict_size,output_dim=60,input_length=max_comment_length))

This layer converts integer indices into fixed-size vectors. dict_size is the size of the vocabulary, and max_comment_length is the maximum length of the input comment.

Add LSTM layer:

rnn.add(Bidirectional(LSTM(units=100)))

You chose a bidirectional LSTM, which means it takes into account both past and future information. It has 100 neurons.

Add fully connected layer:

rnn.add(Dense(units=10,activation=tf.nn.relu))
rnn.add(Dense(units=6,activation=tf.nn.softmax))

These two Dense layers are used for the output of the model, and the last layer uses the softmax activation function for 6-category classification.

Compile model:

rnn.compile(loss='sparse_categorical_crossentropy',optimizer="adam",metrics=['accuracy'])

You choose a loss function suitable for the classification problem and choose the adam optimizer.

Show model summary:

print(rnn.summary())

This will show the structure and number of parameters of the model.

Model: "sequential"
______________________________________________________________
 Layer (type) Output Shape Param #
================================================== ===============
 embedding (Embedding) (None, 120, 60) 890880
                                                                 
 bidirectional (Bidirectiona (None, 200) 128800
 l)
                                                                 
 dense (Dense) (None, 10) 2010
                                                                 
 dense_1 (Dense) (None, 6) 66
                                                                 
================================================== ===============
Total params: 1,021,756
Trainable params: 1,021,756
Non-trainable params: 0
______________________________________________________________
None

Training model:

result = rnn.fit(x_train,y_train,batch_size=64,validation_split=0.3,epochs=10)

You trained the model using the training data set, 30% of which was used for validation, and trained for 10 epochs.

Epoch 1/10
257/257 [==============================] - 74s 258ms/step - loss: 1.2142 - accuracy: 0.5470 - val_loss : 1.0998 - val_accuracy: 0.5521
Epoch 2/10
257/257 [==============================] - 57s 221ms/step - loss: 0.9335 - accuracy: 0.6293 - val_loss : 0.9554 - val_accuracy: 0.6094
Epoch 3/10
257/257 [==============================] - 59s 229ms/step - loss: 0.8363 - accuracy: 0.6616 - val_loss : 0.9321 - val_accuracy: 0.6168
Epoch 4/10
257/257 [==============================] - 61s 236ms/step - loss: 0.7795 - accuracy: 0.6833 - val_loss : 0.9812 - val_accuracy: 0.6089
Epoch 5/10
257/257 [==============================] - 56s 217ms/step - loss: 0.7281 - accuracy: 0.7010 - val_loss : 0.9559 - val_accuracy: 0.6043
Epoch 6/10
257/257 [==============================] - 56s 219ms/step - loss: 0.6934 - accuracy: 0.7156 - val_loss : 1.0197 - val_accuracy: 0.5999
Epoch 7/10
257/257 [==============================] - 57s 220ms/step - loss: 0.6514 - accuracy: 0.7364 - val_loss : 1.1192 - val_accuracy: 0.6080
Epoch 8/10
257/257 [==============================] - 57s 222ms/step - loss: 0.6258 - accuracy: 0.7486 - val_loss : 1.1350 - val_accuracy: 0.6100
Epoch 9/10
257/257 [==============================] - 57s 220ms/step - loss: 0.5839 - accuracy: 0.7749 - val_loss : 1.1537 - val_accuracy: 0.6019
Epoch 10/10
257/257 [==============================] - 57s 222ms/step - loss: 0.5424 - accuracy: 0.7945 - val_loss : 1.1715 - val_accuracy: 0.5744
<keras.callbacks.History object at 0x00000244DCE06D90>

Show training results:

print(result)

<keras.callbacks.History object at 0x0000013AEAAE1A30>

print(result.history)

{<!-- -->'loss': [1.2142471075057983, 0.9334620833396912, 0.8363043069839478, 0.7795010805130005, 0.7280740141868591, 0.69339 3349647522, 0.6514003872871399, 0.6257606744766235, 0.5839114189147949, 0.5423741340637207],
'accuracy': [0.5469586253166199, 0.6292579174041748, 0.6616179943084717, 0.6833333373069763, 0.7010340690612793, 0.7156326174736 023, 0.7363746762275696, 0.748600959777832, 0.7748783230781555, 0.7944647073745728],
'val_loss': [1.0997602939605713, 0.9553984999656677, 0.932131290435791, 0.9812102317810059, 0.9558586478233337, 1.01973080635070 8, 1.11918044090271, 1.1349923610687256, 1.1536787748336792, 1.1715185642242432],
'val_accuracy': [0.5520862936973572, 0.609423816204071, 0.6168038845062256, 0.6088560819625854, 0.6043145060539246, 0.599914848804 4739, 0.6080045700073242, 0.6099914908409119, 0.6019017696380615, 0.574368417263031]
}

This will show information such as loss and accuracy during training.

Introduction to Bidirectional LSTM (BiLSTM)

example: