[NNLM] Paper implementation: A Neural Probabilistic Language Model [Yoshua Bengio, Rejean Ducharme, Pascal Vincent]

A Neural Probabilistic Language Model

- 1. Complete code
- - 1.1 Python complete program
- 2. Interpretation of thesis
- - 2.1 Objectives
- 3. Process Realization
- - 3.1 Tensorflow model
  - 3.2 Data preparation
  - 3.3 Data training and prediction
- 4. Overall summary

Thesis: A Neural Probabilistic Language Model
Author: Yoshua Bengio; Rejean Ducharme and Pascal Vincent
Time: 2000

1. Complete code

This document seems to be the first document on word embedding models on neural networks. Since the document is relatively early and the structure is relatively simple, here is a brief introduction and implementation using Tensorflow.

1.1 Python complete program

# tf.__version__ == 2.10.1
import tensorflow astf
import numpy as np
import pandas as pd

## Create vocabulary
s = 'There is a flower and fruit mountain on the seaside of Dongsheng Shenzhou Aolai Country. A stone on the top of the mountain received the essence of the sun and moon and gave birth to a stone monkey. The stone monkey bravely explored the waterfalls and springs and discovered the water curtain cave. He was regarded as a beautiful monkey by all the monkeys. King, the Monkey King led his group of monkeys to live freely in the mountains for hundreds of years. He occasionally heard that immortals, Buddhas, and gods could escape reincarnation and live as long as the heaven, earth, mountains and rivers, so he sailed across the sea alone on a raft, traveling through the southern Fanbu continent to the west. Niu Hezhou was finally taken in by Bodhi Patriarch Sun Wukong at Xieyue Sanxing Cave in Fangcun Mountain, Lingtai. He was given the Buddhist name Sun Wukong. Wukong fully understood the wonderful principles of Bodhi in Sanxing Cave, learned the seventy-two transformations and somersault clouds, and returned to Huaguo Mountain. , destroy the demon king in one fell swoop, and the seventy-two cave demon kings such as Huaguoshan wolf, insect, tiger, and leopard all come to worship him'

vocabulary = list(set(list(s)))
n=5
m = len(vocabulary)

data_list = []
for i in range(len(s)-n):
    data_list.append([s[i:i + n], s[i + n]])

## Prepare data
## [['Dongsheng Shenzhou Aolai', 'Lai'], ['Shengshenzhou Aolai', 'Guo'], ['Shenzhou Aolai Guo', 'ocean']]

x_train = np.array(data_list)[:,0]
y_train = np.array(data_list)[:,1]

def get_one_hot(lst):
    one_hot_list = []
    for item in lst:
        one_hot = [0] * len(vocabulary)
        ix = vocabulary.index(item)
        one_hot[ix] = 1
        one_hot_list.append(one_hot)
    return one_hot_list

x_train = [get_one_hot(item) for item in x_train]
y_train = [vocabulary.index(item) for item in y_train]

## Modeling
class Embedding(tf.keras.layers.Layer):
    def __init__(self, out_shape, **kwargs):
        super().__init__(**kwargs)
        self.out_shape = out_shape

    def build(self, input_shape):
        self.H = self.add_weight(
                shape=[input_shape[-1], self.out_shape],
                initializer=tf.initializers.glorot_normal(),
                )

    def call(self, inputs):
        return tf.matmul(inputs, self.H)

model = tf.keras.Sequential([
    tf.keras.layers.Input(shape=(n, m)),
    Embedding(200),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(200, activation='tanh'),
    tf.keras.layers.Dense(m, activation='softmax'),
])

model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(), metrics='accuracy')
history = model.fit(x=x_train, y=y_train, epochs=100, verbose=0)
pd.DataFrame(history.history).plot()


## Predictive model
s = 'There is a flower and fruit on the side'
vocabulary[model.predict([get_one_hot(s)])[0].argmax()]
# 'Mountain'

2. Interpretation of the paper

2.1 Objectives

The purpose of this paper is: given a text sequence, find the probability of the next word appearing in the text sequence. Here we can easily think of a probability formula

(

∣

…

)

P(x_n|x_{n-1},x_{n-2},\dots,x_1)

P(xn?∣xn?1?,xn?2?,…,x1?). Although there are many problems with using this formula now, we must consider that this is a paper written in 2000.

3. Process implementation

3.1 Tensorflow model

n = predict sentence length
m = vocabulary dimension
class Embedding(tf.keras.layers.Layer):
    def __init__(self, out_shape, **kwargs):
        super().__init__(**kwargs)
        self.out_shape = out_shape

    def build(self, input_shape):
        self.H = self.add_weight(
                shape=[input_shape[-1], self.out_shape],
                initializer=tf.initializers.glorot_normal(),
                )

    def call(self, inputs):
        return tf.matmul(inputs, self.H)

model = tf.keras.Sequential([
    tf.keras.layers.Input(shape=(n, m)),
    Embedding(200),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(200, activation='tanh'),
    tf.keras.layers.Dense(m, activation='softmax'),
])

3.2 Data preparation

Select a piece of text from Journey to the West and prepare data input_shape=[n,m]

s = 'There is a flower and fruit mountain on the seaside of the Aolai Kingdom of Dongsheng Shenzhou. A stone on the top of the mountain received the essence of the sun and moon, and gave birth to a stone monkey. The stone monkey bravely explored the waterfalls and springs, discovered the water curtain cave, and was worshiped by all the monkeys. As the Monkey King, the Monkey King led a group of monkeys to live freely in the mountains for hundreds of years. Occasionally, he heard that immortals, Buddhas, and gods could escape reincarnation and live as long as the heaven, earth, mountains and rivers, so he sailed across the sea alone on a raft and visited the southern Fanbu continent. , arrived at Xiniu Hezhou, and finally stayed at Xieyue Sanxing Cave in Fangcun Mountain, Lingtai, where he was taken in by the Bodhi Patriarch and given his Buddhist name Sun Wukong. Wukong fully understood the wonderful principles of Bodhi in Sanxing Cave, learned the seventy-two transformations and somersault clouds, and returned. In Huaguo Mountain, he destroyed the demon king in one fell swoop. The seventy-two cave demon kings in Huaguo Mountain, including wolves, insects, tigers, and leopards, all came to worship him.

vocabulary = list(set(list(s)))
n=5
m = len(vocabulary)

data_list = []
for i in range(len(s)-n):
    data_list.append([s[i:i + n], s[i + n]])

x_train = np.array(data_list)[:,0]
y_train = np.array(data_list)[:,1]

def get_one_hot(lst):
    one_hot_list = []
    for item in lst:
        one_hot = [0] * len(vocabulary)
        ix = vocabulary.index(item)
        one_hot[ix] = 1
        one_hot_list.append(one_hot)
    return one_hot_list

x_train = [get_one_hot(item) for item in x_train]
y_train = [vocabulary.index(item) for item in y_train]

3.3 Data training and prediction

model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(), metrics='accuracy')
history = model.fit(x=x_train, y=y_train, epochs=100, verbose=0)
pd.DataFrame(history.history).plot()

s = 'There is a flower and fruit on the side'
vocabulary[model.predict([get_one_hot(s)])[0].argmax()]
# output mountain

It should be a mountain, and the predicted result is consistent with the actual situation.

The training loss and accuracy are as follows:

The data is relatively small and easy to train.

4. Overall summary

The paper is too early, it is not difficult to implement!