Implement BP neural network without using advanced software packages such as Pytorch, Tensorflow, Keras and so on – citing the iris data set

This article is quoted from:

https://github.com/YeanRoot/BPnetwork

As the title states, the course assignment requires the implementation of BP neural network without using advanced software packages such as Pytorch, Tensorflow, Keras, etc.
This article uses the iris data set, the data set website:

https://github.com/YeanRoot/BPnetwork/tree/main/dataset

The Chinese name of the iris dataset is Anderson’s Iris flower dataset, and the full English name is Anderson’s Iris dataset. iris contains 150 samples, corresponding to each row of data in the data set. Each row of data contains four features of each sample and the category information of the sample, so the iris data set is a two-dimensional table with 150 rows and 5 columns.
In layman’s terms, the iris data set is a data set used to classify flowers. Each sample contains four features (the first 4 columns): sepal length, sepal width, petal length, and petal width. We need to build a classifier. The classifier can use the four characteristics of the sample to determine whether the sample belongs to Mountain Iris, Iris Varicolor, or Iris Virginia (these three nouns are flower species).
Each sample of iris contains variety information, that is, the target attribute (column 5, also called target or label).

iris data set
After understanding the data set, start making BP neural network:

Import of data sets:

First, read the file named iris.csv and convert it into a pandas.DataFrame object. Then, replace the string value in the category attribute column with the number 0, 1, or 2 and convert it to an integer type.
Next, randomly select 80% of the data from the dataset as training data and the remaining 20% as test data, and save them to files named train_data.csv and test_data.csv respectively.

import pandas as pd
path="./dataset/iris.csv"
a=pd.read_csv(path,sep=',')
a['Species']=a['Species'].replace('Iris-setosa','0').replace('Iris-versicolor','1' )\
    .replace('Iris-virginica','2').astype('int32')

train_data=a.sample(frac=0.8,random_state=0)
test_data=a[~a.index.isin(train_data.index)]
train_data.to_csv(path + 'train_data.csv',index=False)
test_data.to_csv(path + 'test_data.csv',index=False)

After importing the data set, define a neuron Neure in the BP neural network model:

class Neure: # Define neuron class

    def __init__(self, input_data=None, input_data_weight=None, offset=None, activate_function='tanh',
                 layer=0): # The input parameter 1 of each neuron has the input signal, 2 the weight of each input signal,
        if input_data_weight is None: # 3 neuron bias, 4 activation function type, 5 layer number
            input_data_weight = []
        if input_data is None:
            input_data = []
        self.input_data = input_data # list
        self.input_data_weight = input_data_weight # list
        self.offset = offset # int
        self.activate_function = activate_function # str
        self.layer = layer # int

    def output(self): # Get the output of the neuron
        output = self.getz()
        # for i in range(len(self.input_data)):
        # output + =self.input_data[i]*self.input_data_weight[i]
        # output + =self.offset
        if self.activate_function == 'tanh':
            output = Function(output).tanh()
        elif self.activate_function == 'self':
            output = output
        return output

    def getz(self): # Get the value of input parameter * input parameter weight + bias
        z = 0
        for i in range(len(self.input_data)):
            z + = self.input_data[i] * self.input_data_weight[i]
        z + = self.offset
        return z

Among them, the constructor __init__ accepts 5 parameters, namely input, weight, bias, activation function type and layer number. There is also an output method and getz method defined. The output method returns the output of the neuron (essentially adding an activation function based on getz), while the getz method returns the value of input parameter * input parameter weight + bias.

Define Function class:

Then, we define a required function class:

class Function: # The function class used in this project. A d in front of the name indicates that it is the derivative of the function.

    def __init__(self, x=None, x_=None):
        self.x = x ##Integer or list
        self.x_ = x_

    def tanh(self):
        return math.tanh(self.x)

    def dtanh(self):
        return 1 - pow(math.tanh(self.x), 2)

    def softmax(self):
        sum = 0
        for i in range(len(self.x)):
            sum + = math.exp(self.x[i])
        output = [math.exp(self.x[i]) / sum for i in range(len(self.x))]
        return output

    def dsoftmax(self): # x predicted value, x_real category number
        doutput = []
        for i in range(len(self.x)):
            if self.x_ == i:
                doutput.append(self.x[i] * (1 - self.x[i]))
            else:
                doutput.append(-self.x[self.x_] * self.x[i])
        return doutput

    def cross_shang(self): # x predicted value, x_actual value
        sum = 0
        for i in range(len(self.x_)):
            sum + = self.x_[i] * math.log(self.x[i])
        sum = -sum
        return sum

    def dcross_shang(self): # x predicted value, x_actual value
        for i in range(len(self.x_)):
            if self.x_[i] != 0:
                return [-1 / self.x[i], i]

Function class, which contains some functions used for calculations in this project. The constructor __init__ of this class accepts two parameters x and x_, which represent an integer or list and the real category number respectively. This class contains the following functions:

tanh: Returns the tanh activation function value of parameter x.
dtanh: Returns the tanh derivative of parameter x.
softmax: Returns the softmax value of parameter x.
dsoftmax: Returns the softmax derivative of parameter x.
cross_shang: Returns the cross entropy between parameter x_ and parameter x.
dcross_shang: Returns the cross-entropy derivative between parameter x_ and parameter x.

Define forward propagation method:

def forward_prop(layer_num, last_num, networklayer, lastlayer): # Forward propagation algorithm. Parameters: number of neurons in this layer, number of neurons in the previous layer, neural network layer, upper neural network layer
    for i in range(layer_num):
        inputdata = [lastlayer[j].output() for j in range(last_num)] # The output of the previous layer’s neurons is organized into a list
        networklayer[i].input_data = inputdata
        # print(networklayer[i].input_data)
    return networklayer

This function accepts four parameters: the number of neurons in this layer, the number of neurons in the previous layer, the neural network layer and the upper neural network layer. The main function of this function is to organize the output of the neurons in the previous layer into a list and pass it as input data to each neuron in the current layer. This function returns the updated neural network layer.

Define the gradient function:

def get_grad(layer_num, last_num, delta, networklayer): # Get the gradient value, parameters: number of neurons in this layer, number of neurons in the previous layer, neural network error, neural network layer
    grad = []
    for i in range(layer_num):
        grad_single = []
        for j in range(last_num):
            grad_single.append(delta[i] * networklayer[j].output()) # Gradient of input signal weight = neural network error * output signal value of the previous layer neuron
        grad.append(grad_single)
    # print(grad)
    return grad

Define the update weight function:

def update_weight(layer_num, last_num, lr, networklayer, grad, delta): # Update weight (SGD), parameters: number of neurons in this layer, number of neurons in the previous layer, learning rate, neural network layer, gradient, neural network error
    for i in range(layer_num):
        update_weight = []
        for j in range(last_num):
            update_weight.append(networklayer[i].input_data_weight[j] + grad[i][j] * lr) # Weight = weight - gradient * learning rate
        update_offset = networklayer[i].offset + delta[i] * lr # Bias = Bias - neural network error * learning rate
        networklayer[i].input_data_weight = update_weight
        networklayer[i].offset = update_offset
        # print(networklayer[i].input_data_weight)
    return networklayer

def update_weight_mobp(layer_num, last_num, lr, networklayer, grad, delta, vdm, vdm_offset,
                       eta): # Update weight (mobp), parameters: number of neurons in this layer, number of neurons in the previous layer, learning rate, neural network layer, gradient, neural network error,
    for i in range(layer_num): # Momentum gradient, bias momentum gradient, momentum coefficient
        update_weight = []
        for j in range(last_num):
            vdm[i][j] = eta * vdm[i][j] + (1 - eta) * grad[i][j] # Momentum gradient = momentum coefficient * momentum gradient + (1-momentum coefficient) * gradient
            update_weight.append(networklayer[i].input_data_weight[j] + vdm[i][j] * lr)
        vdm_offset[i] = eta * vdm_offset[i] + (1 - eta) * delta[i]
        update_offset = networklayer[i].offset + vdm_offset[i] * lr
        networklayer[i].input_data_weight = update_weight
        networklayer[i].offset = update_offset
        # print(networklayer[i].input_data_weight)
    return networklayer, vdm, vdm_offset # Returns the updated momentum gradient for next time passing parameters

The update_weight function is used in the weight update (SGD) algorithm in neural networks. This function accepts six parameters: the number of neurons in this layer, the number of neurons in the previous layer, learning rate, neural network layer, gradient and neural network error. The main function of this function is to update the weights and biases of the neural network layer. The update_weight function accepts six parameters: the number of neurons in this layer, the number of neurons in the previous layer, learning rate, neural network layer, gradient and neural network error. Finally, the updated neural network layer is returned.
Weight=weight-gradient×learning rate
Bias = Bias – Neural Network Error × Learning Rate

Alternatively, the momentum gradient descent algorithm can be used. See the update_weight_mobp function for details.

Get the maximum value of the output layer:

def get_max_index(final_out): # Get the predicted maximum index (that is, which neuron in the output layer has the largest output value)
    max_index = 0
    max = final_out[0]
    for i in range(len(final_out) - 1):
        if final_out[i + 1] > max:
            max = final_out[i + 1]
            max_index = i + 1
    return max_index

for final output.

main function:

if __name__ == "__main__":
    inputLayer_num = 4 # Number of neurons in the output layer
    hiddenLayer_num = 5 # Number of hidden layer neurons
    outputLayer_num = 3 # Number of neurons in the output layer
    lr = 0.02
    eta = 0.8
    epoch = 50
    update_way = "SGD"

    all_best_acc = []
    xdata = []

    fig = plt.figure()
    ax1 = fig.add_subplot(2, 2, 1)
    ax1.grid()
    ax1.set_xlabel("epoch")
    ax1.set_ylabel("train_acc")
    ax2 = fig.add_subplot(2, 2, 2)
    ax2.grid()
    ax2.set_xlabel("epoch")
    ax2.set_ylabel("test_acc")
    ax3 = fig.add_subplot(2, 2, 3)
    ax3.grid()
    ax3.set_xlabel("epoch")
    ax3.set_ylabel("train_loss")
    ax4 = fig.add_subplot(2, 2, 4)
    ax4.grid()
    ax4.set_xlabel("epoch")
    ax4.set_ylabel("test_loss")

    for update_way in ["SGD", "mobp"]: # Explore the impact of different optimization methods
        print(update_way)
        vdm_hidden = [] # Hidden layer momentum gradient
        vdm_output = [] # Output layer momentum gradient
        vdm_offset_hidden = [] # Momentum gradient of hidden layer offset
        vdm_offset_output = [] # Momentum gradient of output layer bias

        random.seed(1) # Generate random number seeds and control the generated weights and biases
        inputLayer = [Neure(input_data_weight=[1], # Generate input layer, input weight is 1, bias is 0, inputLayer_num neurons
                            offset=0,
                            layer=0,
                            activate_function='self') for i in range(inputLayer_num)]

        hiddenLayer = [Neure(input_data_weight=[random.uniform(-1, 1) for i in range(inputLayer_num)],
                             # Generate a hidden layer. The input weights and biases are pseudo-randomly generated in the range of -1 to 1. Each neuron has inputLayer_num number of input weights.
                             offset=random.uniform(-1, 1), # There are hiddenLayer_num neurons
                             layer=1) for j in range(hiddenLayer_num)]

        outputLayer = [Neure(input_data_weight=[random.uniform(-1, 1) for i in range(hiddenLayer_num)], # Generate the output layer, the method is the same as the hidden layer
                             offset=random.uniform(-1, 1),
                             layer=2,
                             activate_function='self') for j in range(outputLayer_num)]

        train_reader = pd.read_csv("./dataset/train_data.csv", sep=',') # Read the training set
        test_reader = pd.read_csv("./dataset/test_data.csv", sep=',') # Read the test set
        train_len = len(train_reader) # Training set length
        test_len = len(test_reader) # Test set length
        train_acc = [] # Training set accuracy of all epochs
        train_loss = [] #Training set loss for all epochs
        test_acc = [] # Test set accuracy of all epochs
        test_loss = [] # Test set loss for all epochs
        best_acc = 0 # Best accuracy of the test set of all epochs

        for t in range(epoch): #Each round of training
            acc_sum = 0
            loss_sum = 0

            if update_way == "mobp": # Initialize momentum gradient
                vdm_hidden = [[0 for j in range(inputLayer_num)] for i in range(hiddenLayer_num)]
                vdm_output = [[0 for j in range(hiddenLayer_num)] for i in range(outputLayer_num)]
                vdm_offset_hidden = [0 for i in range(hiddenLayer_num)]
                vdm_offset_output = [0 for i in range(outputLayer_num)]

            for s in range(train_len): # Enter each photo
                inputdata = train_reader.loc[
                    s, ['SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm']].tolist() # Get the list of input features
                trueclass = [] # The output layer actually outputs correctly
                fclass = train_reader.loc[s, ['Species']].tolist()[0] #Flower category
                if fclass == 0:
                    trueclass = [1, 0, 0]
                elif fclass == 1:
                    trueclass = [0, 1, 0]
                elif fclass == 2:
                    trueclass = [0, 0, 1]
                # ------------Start forward propagation------------#
                for i in range(inputLayer_num): # Pass the photo features into the input layer
                    inputLayer[i].input_data = inputdata[i:i + 1]
                hiddenLayer = forward_prop(hiddenLayer_num, inputLayer_num, hiddenLayer, inputLayer) # Enter the hidden layer
                outputLayer = forward_prop(outputLayer_num, hiddenLayer_num, outputLayer, hiddenLayer) # Enter the output layer
                temp_out = [outputLayer[i].output() for i in range(outputLayer_num)] # Temporary output value of the output layer
                final_out = Function(temp_out).softmax() # Apply the softmax activation function to the temporary output value to obtain the final output value

                max_index = get_max_index(final_out) # Get the number of neurons with the maximum output of the output layer
                if max_index == fclass: # Count the number of correct outputs
                    acc_sum + = 1
                # print(s, final_out,max_index)
                loss_sum + = Function(final_out, trueclass).cross_shang() # Statistical loss value
                # print(s, loss)

                # ------------Start backpropagation--------------#
                dshang = Function(final_out, trueclass).dcross_shang() # Get the differential of cross entropy
                delta_output = Function(final_out, dshang[1]).dsoftmax() # Neural network error = cross entropy differential * softmax function differential
                delta_hidden = []
                for i in range(hiddenLayer_num):
                    delta = 0
                    for j in range(outputLayer_num):
                        delta + = delta_output[j] * outputLayer[j].input_data_weight[i]
                    delta *= Function(hiddenLayer[i].getz()).dtanh() # Use the recursion relationship to obtain the neural network error of the hidden layer
                    delta_hidden.append(delta)

                grad_output = get_grad(outputLayer_num, hiddenLayer_num, delta_output, hiddenLayer) # Get the output layer gradient
                grad_hidden = get_grad(hiddenLayer_num, inputLayer_num, delta_hidden, inputLayer) # Get the input layer gradient

                if update_way == "SGD":
                    outputLayer = update_weight(outputLayer_num, hiddenLayer_num, lr, outputLayer, grad_output,
                                                delta_output)
                    hiddenLayer = update_weight(hiddenLayer_num, inputLayer_num, lr, hiddenLayer, grad_hidden,
                                                delta_hidden)
                elif update_way == "mobp":
                    outputLayer, vdm_output, vdm_offset_output = update_weight_mobp(outputLayer_num, hiddenLayer_num,
                                                                                    lr, outputLayer, grad_output,
                                                                                    delta_output, vdm_output,
                                                                                    vdm_offset_output, eta)
                    hiddenLayer, vdm_hidden, vdm_offset_hidden = update_weight_mobp(hiddenLayer_num, inputLayer_num, lr,
                                                                                    hiddenLayer, grad_hidden,
                                                                                    delta_hidden, vdm_hidden,
                                                                                    vdm_offset_hidden, eta)

            train_acc.append(acc_sum / train_len)
            train_loss.append(loss_sum / train_len)
            acc_sum = 0
            loss_sum = 0
            for v in range(test_len):
                inputdata = test_reader.loc[
                    v, ['SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm']].tolist()
                trueclass = []
                fclass = test_reader.loc[v, ['Species']].tolist()[0]
                if fclass == 0:
                    trueclass = [1, 0, 0]
                elif fclass == 1:
                    trueclass = [0, 1, 0]
                elif fclass == 2:
                    trueclass = [0, 0, 1]
                for i in range(inputLayer_num):
                    inputLayer[i].input_data = inputdata[i:i + 1]
                hiddenLayer = forward_prop(hiddenLayer_num, inputLayer_num, hiddenLayer, inputLayer)
                outputLayer = forward_prop(outputLayer_num, hiddenLayer_num, outputLayer, hiddenLayer)
                temp_out = [outputLayer[i].output() for i in range(outputLayer_num)]
                final_out = Function(temp_out).softmax()

                max_index = get_max_index(final_out)
                if fclass == max_index:
                    acc_sum + = 1
                loss_sum + = Function(final_out, trueclass).cross_shang()

            test_acc.append(acc_sum / test_len)
            test_loss.append(loss_sum / test_len)
            if test_acc[t] > best_acc:
                best_acc = test_acc[t]
            print('epoch:%d' % t)
            print('train_acc:%f test_acc:%f best_acc%f' % (train_acc[t], test_acc[t], best_acc))
            print('train_loss:%f test_loss:%f' % (train_loss[t], test_loss[t]))

        all_best_acc.append(best_acc)
        epochx = [i + 1 for i in range(epoch)]
        ax1.plot(epochx, train_acc)
        ax2.plot(epochx, test_acc)
        ax3.plot(epochx, train_loss)
        ax4.plot(epochx, test_loss)

    plt.show()

Output: