Pure handwriting version of handwritten digit recognition–no frame
- Background knowledge introduction
- Complete code
- MiNIST handwriting dataset
- There is a problem with the code
- references
Background knowledge introduction
The code mainly includes four parts, namely preprocessing, training, prediction and visualization, which will be introduced separately next.
-
preprocessing
Before training, you need to read the contents of the MINIST data set and reshape the image to a size of 28 X28. The model weights and biases are assigned random initial values, and each element is randomly sampled from a normal distribution with a mean of 0 and a standard deviation of 0.01. The code mainly consists of four steps. Take reading the training set as an example.
1).Open and read binary filesbinfile = open(filename, 'rb') buf = binfile.read()
2). Data analysis, mainly analyzing magic numbers (checking whether the file format is correct), numbers, and rows. Index is the offset, which determines where the current parsing is.
magic, self.train_img_num, self.numRows, self.numColums = struct.unpack_from('>IIII', buf, index)
3). Data analysis, mainly analyzing pixel values, because the picture needs to be changed into 28X28 pixels.
im = struct.unpack_from('>784B', buf, index) # After parsing 784 bytes, offset is required and continue to parse the next image. index + = struct.calcsize('>784B')
4).Convert to Numpy array and reshape.
im = np.array(im) im = im.reshape(1, 28 * 28) #Put the parsed images into the training image list. The list here is actually a two-dimensional array. self.train_img_list[ i , : ] = im
-
train
There are 1000 times of training here, you can adjust the number of training by yourself. At the same time, batchsize is used during training, which means that a batchsize number of images are input for training each time, and batch represents a batch.
for i in range( 1000 ): # Randomly shuffle data np.random.shuffle( self.train_data ) #Select pictures of batchsize data img_list = self.train_data[:self.BATCHSIZE,:-1] label_list = self.train_data[:self.BATCHSIZE, -1:] print("Train Time: ",i) self.train_network(img_list, label_list)
Formal training uses two hidden layers. The hidden layers are 30 and 60 respectively. The activation function of the hidden layer is the relu function. The activation function of the final output layer is the softmax function. Cross entropy is used to calculate the loss.
1).Forward propagation
hidden_layer1 = np.maximum(0, np.matmul( img_batch_list, self.W1 ) + self.b1 ) hidden_layer2 = np.maximum(0, np.matmul( hidden_layer1, self.W2 ) + self.b2 ) scores = np.matmul( hidden_layer2, self.W3 ) + self.b3) #Use the softmax activation function, or you can call the function directly. The following ones are calculated separately. axis=1 means taking the vertical axis as the benchmark and adding rows. Keepdims is mainly used to maintain the two-dimensional characteristics of the matrix. scores_e = np.exp( scores ) scores_e_sum = np.sum( scores_e, axis = 1, keepdims= True ) probs = scores_e / scores_e_sum
2).Calculate losses
# Initialize loss list loss_list_tmp = np.zeros((train_example_num, 1)) #Calculate the loss for each sample of this batch, because we prefer to increase the predicted probability of the correct label rather than decrease the predicted probability of the wrong label. #For example, the correct label is 2. The probability of 1 during prediction is 0.5 and the probability of 2 is 0.6. If we calculate the loss here and perform back propagation, we will pay more attention to how to increase the prediction probability of 2, and Do not reduce the predicted probability of 1. #So only the softmax value of the predicted probability of the correct label is calculated here. for i in range(train_example_num): loss_list_tmp[i] = scores_e[i][int(label_batch_list[i])] / scores_e_sum[ i ] #Calculate the loss of each sample. We only look at the correct ones here, because if they are wrong, the one-hot encoding will be 0, and the multiplication will also be 0. There is no difference between adding or not adding. loss_list = -np.log(loss_list_tmp) #Because it is calculated in one batch, the average must be taken to represent the loss of this batch. #0.5 * Regular penalty term * (W*W): avoid overfitting loss = np.mean(loss_list, axis=0)[0] + \ 0.5 * self.reg_factor * np.sum( self.W1 * self.W1 ) + \ 0.5 * self.reg_factor * np.sum( self.W2 * self.W2 ) + \ 0.5 * self.reg_factor * np.sum( self.W3 * self.W3 ) #Add batch losses to the overall loss list self.loss_list.append( loss )
3).Backpropagation
dscore = np.zeros((train_example_num, self.K)) #softmax + gradient calculation formula of cross entropy = prediction-actual for i in range(train_example_num): dscore[i][:] = probs[i][:] dscore[i][int(label_batch_list[i])] -= 1 #Normalize the gradient and divide by the number of training samples dscore /= train_example_num dW3 = np.dot(hidden_layer2.T, dscore) #For the bias, you only need to calculate the sum of the loss partial derivatives of each sample, because it is in a batch db3 = np.sum(dscore, axis = 0, keepdims= True) dh2 = np.dot(dscore, self.W3.T) #Use the relu function, if h2 is originally 0, then its gradient is 0 dh2[ hidden_layer2 <= 0 ] = 0 dW2 = np.dot(hidden_layer1.T, dh2) db2 = np.sum( dh2, axis = 0, keepdims= True )
To calculate the gradient calculation formula of softmax + cross-entropy loss, see this video:
[Simple explanation of cross entropy softmax derivation] https://www.bilibili.com/video/BV1NU4y1w7C9/?share_source=copy_web & amp;vd_source=1adc4d4a0c37a44e9bda94a95f3f9032The calculation of the partial derivative of the weight can be seen in the picture below:
# Add regular terms to avoid the obtained w being too concentrated. dW3 + = self.reg_factor * self.W3 dW2 + = self.reg_factor * self.W2 dW1 + = self.reg_factor * self.W1 #Add the learning rate to adjust the training speed to prevent the steps from being too large self.W3 + = -self.stepsize * dW3 self.W2 + = -self.stepsize * dW2 self.W1 + = -self.stepsize * dW1 self.b3 + = -self.stepsize * db3 self.b2 + = -self.stepsize * db2 self.b1 + = -self.stepsize * db1
-
predict
# Forward propagation hidden_layer1 = np.maximum(0, np.matmul(self.test_img_list, self.W1) + self.b1) hidden_layer2 = np.maximum(0, np.matmul(hidden_layer1, self.W2) + self.b2) scores = np.matmul(hidden_layer2, self.W3) + self.b3 # Find the one with the highest predicted probability in each picture prediction = np.argmax( scores, axis = 1 ) # Reshape the prediction to be the same as the test training set prediction = np.reshape( prediction, ( 10000,1 ) ) print(prediction.shape) print(self.test_label_list.shape) accuracy = np.mean( prediction == self.test_label_list ) print('The accuracy is: ',accuracy)
-
Visualization
# Select the first ten images of the test set for visualization for i in range(10): # Calling the query function is also a simple forward propagation outputs = data.query(data.test_img_list[i]) label = np.argmax(outputs) print(label) # draw image_array = data.test_img_list[i].reshape(28, 28) plt.imshow(image_array, cmap="Greys", interpolation='None') plt.pause(0.000001) plt.show() print('done')
Complete code
# -*- coding:utf-8 import numpy as np import struct import matplotlib.pyplot as plt import random import pickle class Data: def __init__(self): self.K = 10 self.N = 60000 self.M = 10000 self.BATCHSIZE = 2000 self.reg_factor = 1e-3 self.stepsize = 1e-2 self.train_img_list = np.zeros((self.N, 28 * 28)) self.train_label_list = np.zeros((self.N, 1)) self.test_img_list = np.zeros((self.M, 28 * 28)) self.test_label_list = np.zeros((self.M, 1)) self.loss_list = [] self.init_network() \t\t #You need to change these paths to the path where you store MINIST. self.read_train_images( 'D://software//PycharmProjects//pythonProject//train-images.idx3-ubyte') self.read_train_labels( 'D://software//PycharmProjects//pythonProject//train-labels.idx1-ubyte') self.train_data = np.append( self.train_img_list, self.train_label_list, axis = 1 ) self.read_test_images('D://software//PycharmProjects//pythonProject//t10k-images.idx3-ubyte') self.read_test_labels('D://software//PycharmProjects//pythonProject//t10k-labels.idx1-ubyte') def predict(self): hidden_layer1 = np.maximum(0, np.matmul(self.test_img_list, self.W1) + self.b1) hidden_layer2 = np.maximum(0, np.matmul(hidden_layer1, self.W2) + self.b2) scores = np.matmul(hidden_layer2, self.W3) + self.b3 prediction = np.argmax( scores, axis = 1 ) prediction = np.reshape( prediction, ( 10000,1 ) ) print(prediction.shape) print(self.test_label_list.shape) accuracy = np.mean( prediction == self.test_label_list ) print('The accuracy is: ',accuracy) return def query(self,inputs_list): hidden_layer1 = np.maximum(0, np.matmul(inputs_list, self.W1) + self.b1) hidden_layer2 = np.maximum(0, np.matmul(hidden_layer1, self.W2) + self.b2) scores = np.matmul(hidden_layer2, self.W3) + self.b3 return scores def train(self): for i in range( 1000 ): np.random.shuffle( self.train_data ) img_list = self.train_data[:self.BATCHSIZE,:-1] label_list = self.train_data[:self.BATCHSIZE, -1:] print("Train Time: ",i) self.train_network(img_list, label_list) def train_network(self, img_batch_list, label_batch_list): train_example_num = img_batch_list.shape[0] hidden_layer1 = np.maximum(0, np.matmul( img_batch_list, self.W1 ) + self.b1 ) hidden_layer2 = np.maximum(0, np.matmul( hidden_layer1, self.W2 ) + self.b2 ) scores = np.matmul( hidden_layer2, self.W3 ) + self.b3 scores_e = np.exp( scores ) scores_e_sum = np.sum( scores_e, axis = 1, keepdims= True ) probs = scores_e / scores_e_sum loss_list_tmp = np.zeros((train_example_num, 1)) for i in range(train_example_num): loss_list_tmp[i] = scores_e[i][int(label_batch_list[i])] / scores_e_sum[ i ] loss_list = -np.log(loss_list_tmp) loss = np.mean(loss_list, axis=0)[0] + \ 0.5 * self.reg_factor * np.sum( self.W1 * self.W1 ) + \ 0.5 * self.reg_factor * np.sum( self.W2 * self.W2 ) + \ 0.5 * self.reg_factor * np.sum( self.W3 * self.W3 ) self.loss_list.append( loss ) print(loss, " ", len(self.loss_list)) dscore = np.zeros((train_example_num, self.K)) for i in range(train_example_num): dscore[i][:] = probs[i][:] dscore[i][int(label_batch_list[i])] -= 1 dscore /= train_example_num dW3 = np.dot(hidden_layer2.T, dscore) db3 = np.sum(dscore, axis = 0, keepdims= True) dh2 = np.dot(dscore, self.W3.T) dh2[ hidden_layer2 <= 0 ] = 0 dW2 = np.dot(hidden_layer1.T, dh2) db2 = np.sum( dh2, axis = 0, keepdims= True ) dh1 = np.dot(dh2, self.W2.T) dh1[ hidden_layer1 <= 0 ] = 0 dW1 = np.dot( img_batch_list.T, dh1 ) db1 = np.sum( dh1, axis = 0, keepdims= True ) dW3 + = self.reg_factor * self.W3 dW2 + = self.reg_factor * self.W2 dW1 + = self.reg_factor * self.W1 self.W3 + = -self.stepsize * dW3 self.W2 + = -self.stepsize * dW2 self.W1 + = -self.stepsize * dW1 self.b3 + = -self.stepsize * db3 self.b2 + = -self.stepsize * db2 self.b1 + = -self.stepsize * db1 return def init_network(self): self.W1 = 0.01 * np.random.randn( 28 * 28, 30 ) self.b1 = 0.01 * np.random.randn(1, 30) self.W2 = 0.01 * np.random.randn(30, 60) self.b2 = 0.01 * np.random.randn(1, 60) self.W3 = 0.01 * np.random.randn( 60, self.K ) self.b3 = 0.01 * np.random.randn( 1, self.K ) def read_train_images(self,filename): binfile = open(filename, 'rb') buf = binfile.read() index = 0 magic, self.train_img_num, self.numRows, self.numColums = struct.unpack_from('>IIII', buf, index) print(magic, ' ', self.train_img_num, ' ', self.numRows, ' ', self.numColums) index + = struct.calcsize('>IIII') for i in range(self.train_img_num): im = struct.unpack_from('>784B', buf, index) index + = struct.calcsize('>784B') im = np.array(im) im = im.reshape(1, 28 * 28) self.train_img_list[ i , : ] = im print("train_img_list.shape:") print(self.train_img_list.shape) def read_train_labels(self,filename): binfile = open(filename, 'rb') index = 0 buf = binfile.read() binfile.close() magic, self.train_label_num = struct.unpack_from('>II', buf, index) index + = struct.calcsize('>II') for i in range(self.train_label_num): # for x in xrange(2000): label_item = int(struct.unpack_from('>B', buf, index)[0]) self.train_label_list[ i , : ] = label_item index + = struct.calcsize('>B') def read_test_images(self, filename): binfile = open(filename, 'rb') buf = binfile.read() index = 0 magic, self.test_img_num, self.numRows, self.numColums = struct.unpack_from('>IIII', buf, index) print(magic, ' ', self.test_img_num, ' ', self.numRows, ' ', self.numColums) index + = struct.calcsize('>IIII') for i in range(self.test_img_num): im = struct.unpack_from('>784B', buf, index) index + = struct.calcsize('>784B') im = np.array(im) im = im.reshape(1, 28 * 28) self.test_img_list[i, :] = im def read_test_labels(self,filename): binfile = open(filename, 'rb') index = 0 buf = binfile.read() binfile.close() magic, self.test_label_num = struct.unpack_from('>II', buf, index) index + = struct.calcsize('>II') for i in range(self.test_label_num): label_item = int(struct.unpack_from('>B', buf, index)[0]) self.test_label_list[i, :] = label_item index + = struct.calcsize('>B') def main(): data = Data() data.train() data.predict() for i in range(10): outputs = data.query(data.test_img_list[i]) label = np.argmax(outputs) print(label) image_array = data.test_img_list[i].reshape(28, 28) plt.imshow(image_array, cmap="Greys", interpolation='None') plt.pause(0.000001) plt.show() print('done') if __name__ == '__main__': main()
MiNIST handwriting data set
Handwritten digits dataset
There is a problem with the code
- The place where the function can be called is not called
- When calculating the back propagation of the loss, it is still calculated one by one, and the batch is not used.
- The code structure is confusing, the neural network and Data should be separated, and the training process should be together instead of calling the training function in a for loop. Some parameters do not need to be defined, such as train_example_num, batchsize can be written directly.
References
https://blog.csdn.net/superCally/article/details/54312625?fromshare=blogdetail