14-NLP’s Bert implements multi-classification of text

Article directory

  • code
  • Interpretation of the overall code process
  • debug the above code

Code

from pypro.chapters03.demo03_data acquisition and processing import train_list, label_list, val_train_list, val_label_list
import tensorflow astf
from transformers import TFBertForSequenceClassification

bert_model = "bert-base-chinese"

model = TFBertForSequenceClassification.from_pretrained(bert_model, num_labels=32)
model.compile(metrics=['accuracy'], loss=tf.nn.sigmoid_cross_entropy_with_logits)
model.summary()
result = model.fit(x=train_list[:24], y=label_list[:24], batch_size=12, epochs=1)
print(result.history)
# Save the model (the essence of model saving is to save the training parameters, and for deep learning, it also saves the neural network structure)
model.save_weights('../data/model.h5')

model = TFBertForSequenceClassification.from_pretrained(bert_model, num_labels=32)
model.load_weights('../data/model.h5')
result = model.predict(val_train_list[:12]) # Predicted value
print(result)
result = tf.nn.sigmoid(result)
print(result)
result = tf.cast(tf.greater_equal(result, 0.5), tf.float32)
print(result)

Interpretation of the overall code process

The purpose of this code is to use TensorFlow and transformers libraries to perform text sequence classification tasks. Here’s an overview and step-by-step plan of the overall process:

  1. Import necessary libraries and data:

    • Four lists are imported from a module called pypro.chapters03.demo03_Data Acquisition and Processing: train_list, label_list, val_train_list, val_label_list. These lists contain training data, training labels, validation data, and validation labels respectively.
    • Import the TensorFlow and transformers libraries.
  2. Initialize the pre-trained BERT model:

    • Use the bert-base-chinese model to initialize a BERT model for sequence classification.
    • The model was configured to classify 32 different labels.
  3. Compile model:

    • Use sigmoid cross-entropy as the loss function and track accuracy as the performance metric.
  4. Model summary:

    • Outputs summary information about the model, including the name, type, output shape, and number of parameters for each layer.
  5. Training model:

    • Train the model using the provided training data and labels (only the first 24 samples are taken).
    • The batch size is set to 12 and training only takes place for 1 epoch, which means the data will be passed through the model once.
  6. Output training results:

    • Print historical data recorded during training, usually including loss values and accuracy.
  7. Save model weights:

    • Save the trained model weights to the local file model.h5.
  8. Load model weights:

    • Initialize a new model structure and load previously saved weights.
  9. Model prediction:

    • Use the validation data (only the first 12 samples) to make predictions.
  10. Activation function processing:

    • The prediction results are processed through the sigmoid function and converted into a value between 0 and 1.
  11. Conversion predictions:

    • Convert the probabilities into binary classification results by comparing whether the predicted value is greater than or equal to 0.5.

Debug the above code

The above code is explained line by line below:

  1. from pypro.chapters03.demo03_data acquisition and processing import train_list, label_list, val_train_list, val_label_list

    This line of code imports four lists from the demo03_data acquisition and processing module. These lists contain training data and labels (train_list, label_list), and validation data and labels (val_train_list, val_label_list ). This is part of the data preparation step.

  2. import tensorflow as tf

    This line of code imports the TensorFlow library, which is an open source library widely used for machine learning and deep learning tasks.

  3. from transformers import TFBertForSequenceClassification

    The TFBertForSequenceClassification class in the transformers library is imported here. The transformers library contains many pre-trained models for NLP tasks. What is specially imported here is the BERT model suitable for TensorFlow, which is used for sequence classification tasks.

  4. bert_model = "bert-base-chinese"

    Define a string variable bert_model, which holds the name of the pre-trained model. Here, we will use the Chinese BERT base model.

  5. model = TFBertForSequenceClassification.from_pretrained(bert_model, num_labels=32)

    Create a new sequence classification model instance using the bert-base-chinese model and the TFBertForSequenceClassification class. num_labels=32 indicates that there are 32 different categories for classification.

  6. model.compile(metrics=['accuracy'], loss=tf.nn.sigmoid_cross_entropy_with_logits)

    Compile the model, set the metric to accuracy (accuracy), and use sigmoid_cross_entropy_with_logits as the loss function. This is usually used for binary classification problems, but here, since it is multi-label classification (32 categories), it may be two categories for each label.

  7. model.summary()

    Output summary information of the model, including details such as the layers in the model, the output shape and number of parameters for each layer.

  8. result = model.fit(x=train_list[:24], y=label_list[:24], batch_size=12, epochs=1)

    Start training the model, using only the first 24 samples as training data and labels. The batch size is set to 12, meaning each gradient update will be based on 12 samples. epochs=1 means that the entire data set is trained only once by the model.

  9. print(result.history)

    Print out historical data during training, such as loss and accuracy.

  10. model.save_weights('../data/model.h5')

    Save the trained model weights to the local file model.h5.

  11. model = TFBertForSequenceClassification.from_pretrained(bert_model, num_labels=32)

    Initialize a model again to demonstrate how to load a model from scratch.

  12. model.load_weights('../data/model.h5')

    Load previously saved model weights.

  13. result = model.predict(val_train_list[:12]) # Predicted value

    Use the first 12 samples in the validation data set to make predictions and get the output of the model.

  14. print(result)

    Print out the prediction results.

  15. result = tf.nn.sigmoid(result)

    The original output of the model is converted through the sigmoid function to obtain a value between 0 and 1, indicating the probability of belonging to each category.

  16. print(result)

    Print the prediction results processed by the sigmoid activation function again.

  17. result = tf.cast(tf.greater_equal(result, 0.5), tf.float32)

    Convert the probability output by sigmoid into a binary classification result. For each label, if the probability is greater than or equal to 0.5, the sample is considered to belong to that label (converted to 1), otherwise it does not belong (converted to 0).

  18. print(result)

    Finally, print out the converted classification results.

Overall, this code shows the complete process of training, saving, loading and prediction using the pre-trained BERT model on a multi-label text classification task.