Pure handwritten version of handwritten digit recognition – no frame

Pure handwriting version of handwritten digit recognition–no frame

  • Background knowledge introduction
  • Complete code
  • MiNIST handwriting dataset
  • There is a problem with the code
  • references

Background knowledge introduction

The code mainly includes four parts, namely preprocessing, training, prediction and visualization, which will be introduced separately next.

  1. preprocessing

    Before training, you need to read the contents of the MINIST data set and reshape the image to a size of 28 X28. The model weights and biases are assigned random initial values, and each element is randomly sampled from a normal distribution with a mean of 0 and a standard deviation of 0.01. The code mainly consists of four steps. Take reading the training set as an example.
    1).Open and read binary files

    binfile = open(filename, 'rb')
    buf = binfile.read()
    

    2). Data analysis, mainly analyzing magic numbers (checking whether the file format is correct), numbers, and rows. Index is the offset, which determines where the current parsing is.

    magic, self.train_img_num, self.numRows, self.numColums = struct.unpack_from('>IIII', buf, index)
    

    3). Data analysis, mainly analyzing pixel values, because the picture needs to be changed into 28X28 pixels.

    im = struct.unpack_from('>784B', buf, index)
    # After parsing 784 bytes, offset is required and continue to parse the next image.
    index + = struct.calcsize('>784B')
    

    4).Convert to Numpy array and reshape.

    im = np.array(im)
    im = im.reshape(1, 28 * 28)
    #Put the parsed images into the training image list. The list here is actually a two-dimensional array.
    self.train_img_list[ i , : ] = im
    
  2. train

    There are 1000 times of training here, you can adjust the number of training by yourself. At the same time, batchsize is used during training, which means that a batchsize number of images are input for training each time, and batch represents a batch.

    for i in range( 1000 ):
    # Randomly shuffle data
    np.random.shuffle( self.train_data )
    #Select pictures of batchsize data
    img_list = self.train_data[:self.BATCHSIZE,:-1]
    label_list = self.train_data[:self.BATCHSIZE, -1:]
    print("Train Time: ",i)
    self.train_network(img_list, label_list)
    

    Formal training uses two hidden layers. The hidden layers are 30 and 60 respectively. The activation function of the hidden layer is the relu function. The activation function of the final output layer is the softmax function. Cross entropy is used to calculate the loss.

    1).Forward propagation

    hidden_layer1 = np.maximum(0, np.matmul( img_batch_list, self.W1 ) + self.b1 )
    
    hidden_layer2 = np.maximum(0, np.matmul( hidden_layer1, self.W2 ) + self.b2 )
    
    scores = np.matmul( hidden_layer2, self.W3 ) + self.b3)
    
    #Use the softmax activation function, or you can call the function directly. The following ones are calculated separately. axis=1 means taking the vertical axis as the benchmark and adding rows. Keepdims is mainly used to maintain the two-dimensional characteristics of the matrix.
    scores_e = np.exp( scores )
    scores_e_sum = np.sum( scores_e, axis = 1, keepdims= True )
    probs = scores_e / scores_e_sum
    

    2).Calculate losses

    # Initialize loss list
    loss_list_tmp = np.zeros((train_example_num, 1))
    #Calculate the loss for each sample of this batch, because we prefer to increase the predicted probability of the correct label rather than decrease the predicted probability of the wrong label.
    #For example, the correct label is 2. The probability of 1 during prediction is 0.5 and the probability of 2 is 0.6. If we calculate the loss here and perform back propagation, we will pay more attention to how to increase the prediction probability of 2, and Do not reduce the predicted probability of 1.
    #So only the softmax value of the predicted probability of the correct label is calculated here.
    for i in range(train_example_num):
        loss_list_tmp[i] = scores_e[i][int(label_batch_list[i])] / scores_e_sum[ i ]
    #Calculate the loss of each sample. We only look at the correct ones here, because if they are wrong, the one-hot encoding will be 0, and the multiplication will also be 0. There is no difference between adding or not adding.
    loss_list = -np.log(loss_list_tmp)
    
    #Because it is calculated in one batch, the average must be taken to represent the loss of this batch.
    #0.5 * Regular penalty term * (W*W): avoid overfitting
    loss = np.mean(loss_list, axis=0)[0] + \
               0.5 * self.reg_factor * np.sum( self.W1 * self.W1 ) + \
               0.5 * self.reg_factor * np.sum( self.W2 * self.W2 ) + \
               0.5 * self.reg_factor * np.sum( self.W3 * self.W3 )
    
    #Add batch losses to the overall loss list
    self.loss_list.append( loss )
    

    3).Backpropagation

    dscore = np.zeros((train_example_num, self.K))
    #softmax + gradient calculation formula of cross entropy = prediction-actual
    for i in range(train_example_num):
        dscore[i][:] = probs[i][:]
        dscore[i][int(label_batch_list[i])] -= 1
    
    #Normalize the gradient and divide by the number of training samples
    dscore /= train_example_num
    
    dW3 = np.dot(hidden_layer2.T, dscore)
    #For the bias, you only need to calculate the sum of the loss partial derivatives of each sample, because it is in a batch
    db3 = np.sum(dscore, axis = 0, keepdims= True)
    
    dh2 = np.dot(dscore, self.W3.T)
    #Use the relu function, if h2 is originally 0, then its gradient is 0
    dh2[ hidden_layer2 <= 0 ] = 0
    
    dW2 = np.dot(hidden_layer1.T, dh2)
    db2 = np.sum( dh2, axis = 0, keepdims= True )
    

    To calculate the gradient calculation formula of softmax + cross-entropy loss, see this video:
    [Simple explanation of cross entropy softmax derivation] https://www.bilibili.com/video/BV1NU4y1w7C9/?share_source=copy_web & amp;vd_source=1adc4d4a0c37a44e9bda94a95f3f9032

    The calculation of the partial derivative of the weight can be seen in the picture below:

     # Add regular terms to avoid the obtained w being too concentrated.
        dW3 + = self.reg_factor * self.W3
        dW2 + = self.reg_factor * self.W2
        dW1 + = self.reg_factor * self.W1
    
        #Add the learning rate to adjust the training speed to prevent the steps from being too large
        self.W3 + = -self.stepsize * dW3
        self.W2 + = -self.stepsize * dW2
        self.W1 + = -self.stepsize * dW1
    
        self.b3 + = -self.stepsize * db3
        self.b2 + = -self.stepsize * db2
        self.b1 + = -self.stepsize * db1
    
  3. predict

     # Forward propagation
        hidden_layer1 = np.maximum(0, np.matmul(self.test_img_list, self.W1) + self.b1)
    
        hidden_layer2 = np.maximum(0, np.matmul(hidden_layer1, self.W2) + self.b2)
    
        scores = np.matmul(hidden_layer2, self.W3) + self.b3
    
    
    # Find the one with the highest predicted probability in each picture
        prediction = np.argmax( scores, axis = 1 )
        # Reshape the prediction to be the same as the test training set
        prediction = np.reshape( prediction, ( 10000,1 ) )
        print(prediction.shape)
        print(self.test_label_list.shape)
        accuracy = np.mean( prediction == self.test_label_list )
        print('The accuracy is: ',accuracy)
    
  4. Visualization

    # Select the first ten images of the test set for visualization
    for i in range(10):
    # Calling the query function is also a simple forward propagation
        outputs = data.query(data.test_img_list[i])
        label = np.argmax(outputs)
        print(label)
        # draw
        image_array = data.test_img_list[i].reshape(28, 28)
        plt.imshow(image_array, cmap="Greys", interpolation='None')
        plt.pause(0.000001)
        plt.show()
    print('done')
    

Complete code

# -*- coding:utf-8
import numpy as np
import struct
import matplotlib.pyplot as plt
import random
import pickle
class Data:
    def __init__(self):

        self.K = 10
        self.N = 60000
        self.M = 10000
        self.BATCHSIZE = 2000
        self.reg_factor = 1e-3
        self.stepsize = 1e-2
        self.train_img_list = np.zeros((self.N, 28 * 28))
        self.train_label_list = np.zeros((self.N, 1))

        self.test_img_list = np.zeros((self.M, 28 * 28))
        self.test_label_list = np.zeros((self.M, 1))

        self.loss_list = []
        self.init_network()
\t\t
#You need to change these paths to the path where you store MINIST.
        self.read_train_images( 'D://software//PycharmProjects//pythonProject//train-images.idx3-ubyte')
        self.read_train_labels( 'D://software//PycharmProjects//pythonProject//train-labels.idx1-ubyte')


        self.train_data = np.append( self.train_img_list, self.train_label_list, axis = 1 )

        self.read_test_images('D://software//PycharmProjects//pythonProject//t10k-images.idx3-ubyte')
        self.read_test_labels('D://software//PycharmProjects//pythonProject//t10k-labels.idx1-ubyte')

    def predict(self):

        hidden_layer1 = np.maximum(0, np.matmul(self.test_img_list, self.W1) + self.b1)

        hidden_layer2 = np.maximum(0, np.matmul(hidden_layer1, self.W2) + self.b2)

        scores = np.matmul(hidden_layer2, self.W3) + self.b3


        prediction = np.argmax( scores, axis = 1 )
        prediction = np.reshape( prediction, ( 10000,1 ) )
        print(prediction.shape)
        print(self.test_label_list.shape)
        accuracy = np.mean( prediction == self.test_label_list )
        print('The accuracy is: ',accuracy)


        return

    def query(self,inputs_list):
        hidden_layer1 = np.maximum(0, np.matmul(inputs_list, self.W1) + self.b1)

        hidden_layer2 = np.maximum(0, np.matmul(hidden_layer1, self.W2) + self.b2)

        scores = np.matmul(hidden_layer2, self.W3) + self.b3
        return scores

    def train(self):

        for i in range( 1000 ):
            np.random.shuffle( self.train_data )
            img_list = self.train_data[:self.BATCHSIZE,:-1]
            label_list = self.train_data[:self.BATCHSIZE, -1:]
            print("Train Time: ",i)
            self.train_network(img_list, label_list)

    def train_network(self, img_batch_list, label_batch_list):


        train_example_num = img_batch_list.shape[0]
        hidden_layer1 = np.maximum(0, np.matmul( img_batch_list, self.W1 ) + self.b1 )

        hidden_layer2 = np.maximum(0, np.matmul( hidden_layer1, self.W2 ) + self.b2 )

        scores = np.matmul( hidden_layer2, self.W3 ) + self.b3
        scores_e = np.exp( scores )
        scores_e_sum = np.sum( scores_e, axis = 1, keepdims= True )

        probs = scores_e / scores_e_sum
        loss_list_tmp = np.zeros((train_example_num, 1))
        for i in range(train_example_num):
            loss_list_tmp[i] = scores_e[i][int(label_batch_list[i])] / scores_e_sum[ i ]
        loss_list = -np.log(loss_list_tmp)

        loss = np.mean(loss_list, axis=0)[0] + \
               0.5 * self.reg_factor * np.sum( self.W1 * self.W1 ) + \
               0.5 * self.reg_factor * np.sum( self.W2 * self.W2 ) + \
               0.5 * self.reg_factor * np.sum( self.W3 * self.W3 )

        self.loss_list.append( loss )

        print(loss, " ", len(self.loss_list))

        dscore = np.zeros((train_example_num, self.K))
        for i in range(train_example_num):
            dscore[i][:] = probs[i][:]
            dscore[i][int(label_batch_list[i])] -= 1

        dscore /= train_example_num

        dW3 = np.dot(hidden_layer2.T, dscore)
        db3 = np.sum(dscore, axis = 0, keepdims= True)

        dh2 = np.dot(dscore, self.W3.T)
        dh2[ hidden_layer2 <= 0 ] = 0

        dW2 = np.dot(hidden_layer1.T, dh2)
        db2 = np.sum( dh2, axis = 0, keepdims= True )

        dh1 = np.dot(dh2, self.W2.T)
        dh1[ hidden_layer1 <= 0 ] = 0

        dW1 = np.dot( img_batch_list.T, dh1 )
        db1 = np.sum( dh1, axis = 0, keepdims= True )


        dW3 + = self.reg_factor * self.W3
        dW2 + = self.reg_factor * self.W2
        dW1 + = self.reg_factor * self.W1

        self.W3 + = -self.stepsize * dW3
        self.W2 + = -self.stepsize * dW2
        self.W1 + = -self.stepsize * dW1

        self.b3 + = -self.stepsize * db3
        self.b2 + = -self.stepsize * db2
        self.b1 + = -self.stepsize * db1

        return

    def init_network(self):

        self.W1 = 0.01 * np.random.randn( 28 * 28, 30 )
        self.b1 = 0.01 * np.random.randn(1, 30)

        self.W2 = 0.01 * np.random.randn(30, 60)
        self.b2 = 0.01 * np.random.randn(1, 60)

        self.W3 = 0.01 * np.random.randn( 60, self.K )
        self.b3 = 0.01 * np.random.randn( 1, self.K )

    def read_train_images(self,filename):
        binfile = open(filename, 'rb')
        buf = binfile.read()
        index = 0
        magic, self.train_img_num, self.numRows, self.numColums = struct.unpack_from('>IIII', buf, index)
        print(magic, ' ', self.train_img_num, ' ', self.numRows, ' ', self.numColums)
        index + = struct.calcsize('>IIII')
        for i in range(self.train_img_num):
            im = struct.unpack_from('>784B', buf, index)
            index + = struct.calcsize('>784B')
            im = np.array(im)
            im = im.reshape(1, 28 * 28)
            self.train_img_list[ i , : ] = im
        print("train_img_list.shape:")
        print(self.train_img_list.shape)


    def read_train_labels(self,filename):
        binfile = open(filename, 'rb')
        index = 0
        buf = binfile.read()
        binfile.close()

        magic, self.train_label_num = struct.unpack_from('>II', buf, index)
        index + = struct.calcsize('>II')

        for i in range(self.train_label_num):
            # for x in xrange(2000):
            label_item = int(struct.unpack_from('>B', buf, index)[0])
            self.train_label_list[ i , : ] = label_item
            index + = struct.calcsize('>B')

    def read_test_images(self, filename):
        binfile = open(filename, 'rb')
        buf = binfile.read()
        index = 0
        magic, self.test_img_num, self.numRows, self.numColums = struct.unpack_from('>IIII', buf, index)
        print(magic, ' ', self.test_img_num, ' ', self.numRows, ' ', self.numColums)
        index + = struct.calcsize('>IIII')
        for i in range(self.test_img_num):
            im = struct.unpack_from('>784B', buf, index)
            index + = struct.calcsize('>784B')
            im = np.array(im)
            im = im.reshape(1, 28 * 28)
            self.test_img_list[i, :] = im
    def read_test_labels(self,filename):
        binfile = open(filename, 'rb')
        index = 0
        buf = binfile.read()
        binfile.close()

        magic, self.test_label_num = struct.unpack_from('>II', buf, index)
        index + = struct.calcsize('>II')

        for i in range(self.test_label_num):
            label_item = int(struct.unpack_from('>B', buf, index)[0])
            self.test_label_list[i, :] = label_item
            index + = struct.calcsize('>B')

def main():
    data = Data()
    data.train()
    data.predict()

    for i in range(10):
        outputs = data.query(data.test_img_list[i])
        label = np.argmax(outputs)
        print(label)
        image_array = data.test_img_list[i].reshape(28, 28)
        plt.imshow(image_array, cmap="Greys", interpolation='None')
        plt.pause(0.000001)
        plt.show()
    print('done')

if __name__ == '__main__':
    main()

MiNIST handwriting data set

Handwritten digits dataset

There is a problem with the code

  1. The place where the function can be called is not called
  2. When calculating the back propagation of the loss, it is still calculated one by one, and the batch is not used.
  3. The code structure is confusing, the neural network and Data should be separated, and the training process should be together instead of calling the training function in a for loop. Some parameters do not need to be defined, such as train_example_num, batchsize can be written directly.

References

https://blog.csdn.net/superCally/article/details/54312625?fromshare=blogdetail