This article is quoted from:
https://github.com/YeanRoot/BPnetwork
As the title states, the course assignment requires the implementation of BP neural network without using advanced software packages such as Pytorch, Tensorflow, Keras, etc.
This article uses the iris data set, the data set website:
https://github.com/YeanRoot/BPnetwork/tree/main/dataset
- The Chinese name of the iris dataset is Anderson’s Iris flower dataset, and the full English name is Anderson’s Iris dataset. iris contains 150 samples, corresponding to each row of data in the data set. Each row of data contains four features of each sample and the category information of the sample, so the iris data set is a two-dimensional table with 150 rows and 5 columns.
- In layman’s terms, the iris data set is a data set used to classify flowers. Each sample contains four features (the first 4 columns): sepal length, sepal width, petal length, and petal width. We need to build a classifier. The classifier can use the four characteristics of the sample to determine whether the sample belongs to Mountain Iris, Iris Varicolor, or Iris Virginia (these three nouns are flower species).
- Each sample of iris contains variety information, that is, the target attribute (column 5, also called target or label).
After understanding the data set, start making BP neural network:
Import of data sets:
First, read the file named iris.csv and convert it into a pandas.DataFrame object. Then, replace the string value in the category attribute column with the number 0, 1, or 2 and convert it to an integer type.
Next, randomly select 80% of the data from the dataset as training data and the remaining 20% as test data, and save them to files named train_data.csv and test_data.csv respectively.
import pandas as pd path="./dataset/iris.csv" a=pd.read_csv(path,sep=',') a['Species']=a['Species'].replace('Iris-setosa','0').replace('Iris-versicolor','1' )\ .replace('Iris-virginica','2').astype('int32') train_data=a.sample(frac=0.8,random_state=0) test_data=a[~a.index.isin(train_data.index)] train_data.to_csv(path + 'train_data.csv',index=False) test_data.to_csv(path + 'test_data.csv',index=False)
After importing the data set, define a neuron Neure in the BP neural network model:
class Neure: # Define neuron class def __init__(self, input_data=None, input_data_weight=None, offset=None, activate_function='tanh', layer=0): # The input parameter 1 of each neuron has the input signal, 2 the weight of each input signal, if input_data_weight is None: # 3 neuron bias, 4 activation function type, 5 layer number input_data_weight = [] if input_data is None: input_data = [] self.input_data = input_data # list self.input_data_weight = input_data_weight # list self.offset = offset # int self.activate_function = activate_function # str self.layer = layer # int def output(self): # Get the output of the neuron output = self.getz() # for i in range(len(self.input_data)): # output + =self.input_data[i]*self.input_data_weight[i] # output + =self.offset if self.activate_function == 'tanh': output = Function(output).tanh() elif self.activate_function == 'self': output = output return output def getz(self): # Get the value of input parameter * input parameter weight + bias z = 0 for i in range(len(self.input_data)): z + = self.input_data[i] * self.input_data_weight[i] z + = self.offset return z
Among them, the constructor __init__ accepts 5 parameters, namely input, weight, bias, activation function type and layer number. There is also an output method and getz method defined. The output method returns the output of the neuron (essentially adding an activation function based on getz), while the getz method returns the value of input parameter * input parameter weight + bias.
Define Function class:
Then, we define a required function class:
class Function: # The function class used in this project. A d in front of the name indicates that it is the derivative of the function. def __init__(self, x=None, x_=None): self.x = x ##Integer or list self.x_ = x_ def tanh(self): return math.tanh(self.x) def dtanh(self): return 1 - pow(math.tanh(self.x), 2) def softmax(self): sum = 0 for i in range(len(self.x)): sum + = math.exp(self.x[i]) output = [math.exp(self.x[i]) / sum for i in range(len(self.x))] return output def dsoftmax(self): # x predicted value, x_real category number doutput = [] for i in range(len(self.x)): if self.x_ == i: doutput.append(self.x[i] * (1 - self.x[i])) else: doutput.append(-self.x[self.x_] * self.x[i]) return doutput def cross_shang(self): # x predicted value, x_actual value sum = 0 for i in range(len(self.x_)): sum + = self.x_[i] * math.log(self.x[i]) sum = -sum return sum def dcross_shang(self): # x predicted value, x_actual value for i in range(len(self.x_)): if self.x_[i] != 0: return [-1 / self.x[i], i]
Function class, which contains some functions used for calculations in this project. The constructor __init__ of this class accepts two parameters x and x_, which represent an integer or list and the real category number respectively. This class contains the following functions:
tanh: Returns the tanh activation function value of parameter x.
dtanh: Returns the tanh derivative of parameter x.
softmax: Returns the softmax value of parameter x.
dsoftmax: Returns the softmax derivative of parameter x.
cross_shang: Returns the cross entropy between parameter x_ and parameter x.
dcross_shang: Returns the cross-entropy derivative between parameter x_ and parameter x.
Define forward propagation method:
def forward_prop(layer_num, last_num, networklayer, lastlayer): # Forward propagation algorithm. Parameters: number of neurons in this layer, number of neurons in the previous layer, neural network layer, upper neural network layer for i in range(layer_num): inputdata = [lastlayer[j].output() for j in range(last_num)] # The output of the previous layer’s neurons is organized into a list networklayer[i].input_data = inputdata # print(networklayer[i].input_data) return networklayer
This function accepts four parameters: the number of neurons in this layer, the number of neurons in the previous layer, the neural network layer and the upper neural network layer. The main function of this function is to organize the output of the neurons in the previous layer into a list and pass it as input data to each neuron in the current layer. This function returns the updated neural network layer.
Define the gradient function:
def get_grad(layer_num, last_num, delta, networklayer): # Get the gradient value, parameters: number of neurons in this layer, number of neurons in the previous layer, neural network error, neural network layer grad = [] for i in range(layer_num): grad_single = [] for j in range(last_num): grad_single.append(delta[i] * networklayer[j].output()) # Gradient of input signal weight = neural network error * output signal value of the previous layer neuron grad.append(grad_single) # print(grad) return grad
Define the update weight function:
def update_weight(layer_num, last_num, lr, networklayer, grad, delta): # Update weight (SGD), parameters: number of neurons in this layer, number of neurons in the previous layer, learning rate, neural network layer, gradient, neural network error for i in range(layer_num): update_weight = [] for j in range(last_num): update_weight.append(networklayer[i].input_data_weight[j] + grad[i][j] * lr) # Weight = weight - gradient * learning rate update_offset = networklayer[i].offset + delta[i] * lr # Bias = Bias - neural network error * learning rate networklayer[i].input_data_weight = update_weight networklayer[i].offset = update_offset # print(networklayer[i].input_data_weight) return networklayer def update_weight_mobp(layer_num, last_num, lr, networklayer, grad, delta, vdm, vdm_offset, eta): # Update weight (mobp), parameters: number of neurons in this layer, number of neurons in the previous layer, learning rate, neural network layer, gradient, neural network error, for i in range(layer_num): # Momentum gradient, bias momentum gradient, momentum coefficient update_weight = [] for j in range(last_num): vdm[i][j] = eta * vdm[i][j] + (1 - eta) * grad[i][j] # Momentum gradient = momentum coefficient * momentum gradient + (1-momentum coefficient) * gradient update_weight.append(networklayer[i].input_data_weight[j] + vdm[i][j] * lr) vdm_offset[i] = eta * vdm_offset[i] + (1 - eta) * delta[i] update_offset = networklayer[i].offset + vdm_offset[i] * lr networklayer[i].input_data_weight = update_weight networklayer[i].offset = update_offset # print(networklayer[i].input_data_weight) return networklayer, vdm, vdm_offset # Returns the updated momentum gradient for next time passing parameters
The update_weight function is used in the weight update (SGD) algorithm in neural networks. This function accepts six parameters: the number of neurons in this layer, the number of neurons in the previous layer, learning rate, neural network layer, gradient and neural network error. The main function of this function is to update the weights and biases of the neural network layer. The update_weight function accepts six parameters: the number of neurons in this layer, the number of neurons in the previous layer, learning rate, neural network layer, gradient and neural network error. Finally, the updated neural network layer is returned.
Weight=weight-gradient×learning rate
Bias = Bias – Neural Network Error × Learning Rate
Alternatively, the momentum gradient descent algorithm can be used. See the update_weight_mobp function for details.
Get the maximum value of the output layer:
def get_max_index(final_out): # Get the predicted maximum index (that is, which neuron in the output layer has the largest output value) max_index = 0 max = final_out[0] for i in range(len(final_out) - 1): if final_out[i + 1] > max: max = final_out[i + 1] max_index = i + 1 return max_index
for final output.
main function:
if __name__ == "__main__": inputLayer_num = 4 # Number of neurons in the output layer hiddenLayer_num = 5 # Number of hidden layer neurons outputLayer_num = 3 # Number of neurons in the output layer lr = 0.02 eta = 0.8 epoch = 50 update_way = "SGD" all_best_acc = [] xdata = [] fig = plt.figure() ax1 = fig.add_subplot(2, 2, 1) ax1.grid() ax1.set_xlabel("epoch") ax1.set_ylabel("train_acc") ax2 = fig.add_subplot(2, 2, 2) ax2.grid() ax2.set_xlabel("epoch") ax2.set_ylabel("test_acc") ax3 = fig.add_subplot(2, 2, 3) ax3.grid() ax3.set_xlabel("epoch") ax3.set_ylabel("train_loss") ax4 = fig.add_subplot(2, 2, 4) ax4.grid() ax4.set_xlabel("epoch") ax4.set_ylabel("test_loss") for update_way in ["SGD", "mobp"]: # Explore the impact of different optimization methods print(update_way) vdm_hidden = [] # Hidden layer momentum gradient vdm_output = [] # Output layer momentum gradient vdm_offset_hidden = [] # Momentum gradient of hidden layer offset vdm_offset_output = [] # Momentum gradient of output layer bias random.seed(1) # Generate random number seeds and control the generated weights and biases inputLayer = [Neure(input_data_weight=[1], # Generate input layer, input weight is 1, bias is 0, inputLayer_num neurons offset=0, layer=0, activate_function='self') for i in range(inputLayer_num)] hiddenLayer = [Neure(input_data_weight=[random.uniform(-1, 1) for i in range(inputLayer_num)], # Generate a hidden layer. The input weights and biases are pseudo-randomly generated in the range of -1 to 1. Each neuron has inputLayer_num number of input weights. offset=random.uniform(-1, 1), # There are hiddenLayer_num neurons layer=1) for j in range(hiddenLayer_num)] outputLayer = [Neure(input_data_weight=[random.uniform(-1, 1) for i in range(hiddenLayer_num)], # Generate the output layer, the method is the same as the hidden layer offset=random.uniform(-1, 1), layer=2, activate_function='self') for j in range(outputLayer_num)] train_reader = pd.read_csv("./dataset/train_data.csv", sep=',') # Read the training set test_reader = pd.read_csv("./dataset/test_data.csv", sep=',') # Read the test set train_len = len(train_reader) # Training set length test_len = len(test_reader) # Test set length train_acc = [] # Training set accuracy of all epochs train_loss = [] #Training set loss for all epochs test_acc = [] # Test set accuracy of all epochs test_loss = [] # Test set loss for all epochs best_acc = 0 # Best accuracy of the test set of all epochs for t in range(epoch): #Each round of training acc_sum = 0 loss_sum = 0 if update_way == "mobp": # Initialize momentum gradient vdm_hidden = [[0 for j in range(inputLayer_num)] for i in range(hiddenLayer_num)] vdm_output = [[0 for j in range(hiddenLayer_num)] for i in range(outputLayer_num)] vdm_offset_hidden = [0 for i in range(hiddenLayer_num)] vdm_offset_output = [0 for i in range(outputLayer_num)] for s in range(train_len): # Enter each photo inputdata = train_reader.loc[ s, ['SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm']].tolist() # Get the list of input features trueclass = [] # The output layer actually outputs correctly fclass = train_reader.loc[s, ['Species']].tolist()[0] #Flower category if fclass == 0: trueclass = [1, 0, 0] elif fclass == 1: trueclass = [0, 1, 0] elif fclass == 2: trueclass = [0, 0, 1] # ------------Start forward propagation------------# for i in range(inputLayer_num): # Pass the photo features into the input layer inputLayer[i].input_data = inputdata[i:i + 1] hiddenLayer = forward_prop(hiddenLayer_num, inputLayer_num, hiddenLayer, inputLayer) # Enter the hidden layer outputLayer = forward_prop(outputLayer_num, hiddenLayer_num, outputLayer, hiddenLayer) # Enter the output layer temp_out = [outputLayer[i].output() for i in range(outputLayer_num)] # Temporary output value of the output layer final_out = Function(temp_out).softmax() # Apply the softmax activation function to the temporary output value to obtain the final output value max_index = get_max_index(final_out) # Get the number of neurons with the maximum output of the output layer if max_index == fclass: # Count the number of correct outputs acc_sum + = 1 # print(s, final_out,max_index) loss_sum + = Function(final_out, trueclass).cross_shang() # Statistical loss value # print(s, loss) # ------------Start backpropagation--------------# dshang = Function(final_out, trueclass).dcross_shang() # Get the differential of cross entropy delta_output = Function(final_out, dshang[1]).dsoftmax() # Neural network error = cross entropy differential * softmax function differential delta_hidden = [] for i in range(hiddenLayer_num): delta = 0 for j in range(outputLayer_num): delta + = delta_output[j] * outputLayer[j].input_data_weight[i] delta *= Function(hiddenLayer[i].getz()).dtanh() # Use the recursion relationship to obtain the neural network error of the hidden layer delta_hidden.append(delta) grad_output = get_grad(outputLayer_num, hiddenLayer_num, delta_output, hiddenLayer) # Get the output layer gradient grad_hidden = get_grad(hiddenLayer_num, inputLayer_num, delta_hidden, inputLayer) # Get the input layer gradient if update_way == "SGD": outputLayer = update_weight(outputLayer_num, hiddenLayer_num, lr, outputLayer, grad_output, delta_output) hiddenLayer = update_weight(hiddenLayer_num, inputLayer_num, lr, hiddenLayer, grad_hidden, delta_hidden) elif update_way == "mobp": outputLayer, vdm_output, vdm_offset_output = update_weight_mobp(outputLayer_num, hiddenLayer_num, lr, outputLayer, grad_output, delta_output, vdm_output, vdm_offset_output, eta) hiddenLayer, vdm_hidden, vdm_offset_hidden = update_weight_mobp(hiddenLayer_num, inputLayer_num, lr, hiddenLayer, grad_hidden, delta_hidden, vdm_hidden, vdm_offset_hidden, eta) train_acc.append(acc_sum / train_len) train_loss.append(loss_sum / train_len) acc_sum = 0 loss_sum = 0 for v in range(test_len): inputdata = test_reader.loc[ v, ['SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm']].tolist() trueclass = [] fclass = test_reader.loc[v, ['Species']].tolist()[0] if fclass == 0: trueclass = [1, 0, 0] elif fclass == 1: trueclass = [0, 1, 0] elif fclass == 2: trueclass = [0, 0, 1] for i in range(inputLayer_num): inputLayer[i].input_data = inputdata[i:i + 1] hiddenLayer = forward_prop(hiddenLayer_num, inputLayer_num, hiddenLayer, inputLayer) outputLayer = forward_prop(outputLayer_num, hiddenLayer_num, outputLayer, hiddenLayer) temp_out = [outputLayer[i].output() for i in range(outputLayer_num)] final_out = Function(temp_out).softmax() max_index = get_max_index(final_out) if fclass == max_index: acc_sum + = 1 loss_sum + = Function(final_out, trueclass).cross_shang() test_acc.append(acc_sum / test_len) test_loss.append(loss_sum / test_len) if test_acc[t] > best_acc: best_acc = test_acc[t] print('epoch:%d' % t) print('train_acc:%f test_acc:%f best_acc%f' % (train_acc[t], test_acc[t], best_acc)) print('train_loss:%f test_loss:%f' % (train_loss[t], test_loss[t])) all_best_acc.append(best_acc) epochx = [i + 1 for i in range(epoch)] ax1.plot(epochx, train_acc) ax2.plot(epochx, test_acc) ax3.plot(epochx, train_loss) ax4.plot(epochx, test_loss) plt.show()
Output: