LSTM and BiLSTM of NLP

Article directory

  • Code display
  • Code interpretation
  • Introduction to Bidirectional LSTM (BiLSTM)

Code display

import pandas as pd
import tensorflow astf
tf.random.set_seed(1)
df = pd.read_csv("../data/Clothing Reviews.csv")
print(df.info())

df['Review Text'] = df['Review Text'].astype(str)
x_train = df['Review Text']
y_train = df['Rating']
print(y_train.unique())
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 23486 entries, 0 to 23485
Data columns (total 11 columns):
 # Column Non-Null Count Dtype
--- ------ -------------- -----
 0 Unnamed: 0 23486 non-null int64
 1 Clothing ID 23486 non-null int64
 2 Age 23486 non-null int64
 3 Title 19676 non-null object
 4 Review Text 22641 non-null object
 5 Rating 23486 non-null int64
 6 Recommended IND 23486 non-null int64
 7 Positive Feedback Count 23486 non-null int64
 8 Division Name 23472 non-null object
 9 Department Name 23472 non-null object
 10 Class Name 23472 non-null object
[4 5 3 2 1]
from tensorflow.keras.preprocessing.text import Tokenizer

dict_size = 14848
tokenizer = Tokenizer(num_words=dict_size)

tokenizer.fit_on_texts(x_train)
print(len(tokenizer.word_index),tokenizer.index_word)

x_train_tokenized = tokenizer.texts_to_sequences(x_train)
from tensorflow.keras.preprocessing.sequence import pad_sequences
max_comment_length = 120
x_train = pad_sequences(x_train_tokenized,maxlen=max_comment_length)

for v in x_train[:10]:
    print(v,len(v))
# Build RNN neural network
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense,SimpleRNN,Embedding,LSTM,Bidirectional
import tensorflow astf

rnn = Sequential()
# For rnn, first perform the word vector operation
rnn.add(Embedding(input_dim=dict_size,output_dim=60,input_length=max_comment_length))
# RNN: simple_rnn (SimpleRNN) (None, 100) 16100
# LSTM: simple_rnn (SimpleRNN) (None, 100) 64400
rnn.add(Bidirectional(LSTM(units=100))) # The second layer constructs 100 RNN neurons
rnn.add(Dense(units=10,activation=tf.nn.relu))
rnn.add(Dense(units=6,activation=tf.nn.softmax)) # Output the classification results
rnn.compile(loss='sparse_categorical_crossentropy',optimizer="adam",metrics=['accuracy'])
print(rnn.summary())
result = rnn.fit(x_train,y_train,batch_size=64,validation_split=0.3,epochs=10)
print(result)
print(result.history)

Code interpretation

First, let’s summarize the flow of this code:

  1. The necessary TensorFlow Keras modules are imported.
  2. A Sequential model is initialized, which means that our model will stack layers in order.
  3. An Embedding layer was added to convert integer indices (corresponding words) into dense vectors.
  4. A bidirectional LSTM layer with 100 neurons was added.
  5. Two Dense fully connected layers are added, containing 10 and 6 neurons respectively.
  6. The model was compiled using the sparse_categorical_crossentropy loss function.
  7. A summary of the model is printed.
  8. The model was trained using the given training and validation data.
  9. Printed the training results.

Now, let’s unpack the code line by line:

  1. Import dependencies:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense,SimpleRNN,Embedding,LSTM,Bidirectional
import tensorflow astf

You imported the TensorFlow Keras library required to create and train RNN models.

  1. Initialize model:
rnn = Sequential()

You’ve chosen a sequential model, which means you can simply add layers in sequence.

  1. Add Embedding layer:
rnn.add(Embedding(input_dim=dict_size,output_dim=60,input_length=max_comment_length))

This layer converts integer indices into fixed-size vectors. dict_size is the size of the vocabulary, and max_comment_length is the maximum length of the input comment.

  1. Add LSTM layer:
rnn.add(Bidirectional(LSTM(units=100)))

You chose a bidirectional LSTM, which means it takes into account both past and future information. It has 100 neurons.

  1. Add fully connected layer:
rnn.add(Dense(units=10,activation=tf.nn.relu))
rnn.add(Dense(units=6,activation=tf.nn.softmax))

These two Dense layers are used for the output of the model, and the last layer uses the softmax activation function for 6-category classification.

  1. Compile model:
rnn.compile(loss='sparse_categorical_crossentropy',optimizer="adam",metrics=['accuracy'])

You choose a loss function suitable for the classification problem and choose the adam optimizer.

  1. Show model summary:
print(rnn.summary())

This will show the structure and number of parameters of the model.

Model: "sequential"
______________________________________________________________
 Layer (type) Output Shape Param #
================================================== ===============
 embedding (Embedding) (None, 120, 60) 890880
                                                                 
 bidirectional (Bidirectiona (None, 200) 128800
 l)
                                                                 
 dense (Dense) (None, 10) 2010
                                                                 
 dense_1 (Dense) (None, 6) 66
                                                                 
================================================== ===============
Total params: 1,021,756
Trainable params: 1,021,756
Non-trainable params: 0
______________________________________________________________
None
  1. Training model:
result = rnn.fit(x_train,y_train,batch_size=64,validation_split=0.3,epochs=10)

You trained the model using the training data set, 30% of which was used for validation, and trained for 10 epochs.

Epoch 1/10
257/257 [==============================] - 74s 258ms/step - loss: 1.2142 - accuracy: 0.5470 - val_loss : 1.0998 - val_accuracy: 0.5521
Epoch 2/10
257/257 [==============================] - 57s 221ms/step - loss: 0.9335 - accuracy: 0.6293 - val_loss : 0.9554 - val_accuracy: 0.6094
Epoch 3/10
257/257 [==============================] - 59s 229ms/step - loss: 0.8363 - accuracy: 0.6616 - val_loss : 0.9321 - val_accuracy: 0.6168
Epoch 4/10
257/257 [==============================] - 61s 236ms/step - loss: 0.7795 - accuracy: 0.6833 - val_loss : 0.9812 - val_accuracy: 0.6089
Epoch 5/10
257/257 [==============================] - 56s 217ms/step - loss: 0.7281 - accuracy: 0.7010 - val_loss : 0.9559 - val_accuracy: 0.6043
Epoch 6/10
257/257 [==============================] - 56s 219ms/step - loss: 0.6934 - accuracy: 0.7156 - val_loss : 1.0197 - val_accuracy: 0.5999
Epoch 7/10
257/257 [==============================] - 57s 220ms/step - loss: 0.6514 - accuracy: 0.7364 - val_loss : 1.1192 - val_accuracy: 0.6080
Epoch 8/10
257/257 [==============================] - 57s 222ms/step - loss: 0.6258 - accuracy: 0.7486 - val_loss : 1.1350 - val_accuracy: 0.6100
Epoch 9/10
257/257 [==============================] - 57s 220ms/step - loss: 0.5839 - accuracy: 0.7749 - val_loss : 1.1537 - val_accuracy: 0.6019
Epoch 10/10
257/257 [==============================] - 57s 222ms/step - loss: 0.5424 - accuracy: 0.7945 - val_loss : 1.1715 - val_accuracy: 0.5744
<keras.callbacks.History object at 0x00000244DCE06D90>
  1. Show training results:
print(result)
<keras.callbacks.History object at 0x0000013AEAAE1A30>
print(result.history)
{<!-- -->'loss': [1.2142471075057983, 0.9334620833396912, 0.8363043069839478, 0.7795010805130005, 0.7280740141868591, 0.69339 3349647522, 0.6514003872871399, 0.6257606744766235, 0.5839114189147949, 0.5423741340637207],
'accuracy': [0.5469586253166199, 0.6292579174041748, 0.6616179943084717, 0.6833333373069763, 0.7010340690612793, 0.7156326174736 023, 0.7363746762275696, 0.748600959777832, 0.7748783230781555, 0.7944647073745728],
'val_loss': [1.0997602939605713, 0.9553984999656677, 0.932131290435791, 0.9812102317810059, 0.9558586478233337, 1.01973080635070 8, 1.11918044090271, 1.1349923610687256, 1.1536787748336792, 1.1715185642242432],
'val_accuracy': [0.5520862936973572, 0.609423816204071, 0.6168038845062256, 0.6088560819625854, 0.6043145060539246, 0.599914848804 4739, 0.6080045700073242, 0.6099914908409119, 0.6019017696380615, 0.574368417263031]
}

This will show information such as loss and accuracy during training.

Introduction to Bidirectional LSTM (BiLSTM)




example: