[Neural Network and Deep Learning] Using the MNIST data set to train a handwritten digit recognition model – [Attached complete training code]

[Neural Network and Deep Learning] Use the MNIST data set to train a handwritten digit recognition model – [complete training code attached]

  • 1. Introduction to the MNIST data set
    • MNIST data set structure
  • 2. Model training ideas
    • ①Load data
    • ②Data preprocessing
    • ③Build a model
    • ④Configure model training method
    • ⑤Training model
    • ⑥Evaluation model
    • ⑦Save the model
    • ⑧Result visualization
    • ⑨Use model
  • 3. Code implementation – use directly
    • 3.1 Train the model and save the model
      • Convolutional neural network training model
    • 3.2 Load the trained model
      • 3.2.1 Model saved using model.save_weights() method
      • 3.2.2 Model saved using model.save() method
  • 4. Trained model files – use them directly
    • Fully connected neural network model
    • Convolutional neural network model

1. Introduction to MNIST data set

I found that there seems to be no complete introduction to the MNIST data set, complete training code and trained model files on the csdn platform, so I posted this article to make a detailed integration of MNIST.
The MNIST dataset (Handwritten Digits dataset) is an open public dataset that is freely available to anyone. At present, it is already one of the most versatile data sets for entry into machine learning, so those who want to learn machine learning classification, deep neural network classification, image recognition and processing can choose MNIST data. Collection entry.

MNIST data set structure

The MNIST data set contains 70,000 (60,000 + 10,000) samples, including 60,000 training samples and 10,000 test samples, and the pixel size of each sample is 28*28.

There are many ways to download the MNIST data set. This article only introduces two.
Method 1:
Download address: http://yann.lecun.com/exdb/mnist/

You can directly download these four files, which are:
① Images of training samples (60,000)
② Labels (0~9) corresponding to the numbers on each image in the training sample (60,000)
③Images of test samples (10,000)
④ Labels (0~9) corresponding to the numbers on each image on the test sample (10,000)

Method 2:
A variety of public data sets have been built into Keras, including the MNIST data set, as shown in the figure.

Therefore, you can directly call tf.keras.datasets.mnist to download the data set directly.

2. Model training ideas

This is the neural network structure of the digit recognition model, as shown in the figure.
·Each sample has 2828 pixels, and the input sample has 2828=784 pixel values;
·Therefore, the input layer is set to 784 nodes;
·The hidden layer should be set to the nth power of 2 nodes as much as possible, so 128 nodes are selected and the ReLu activation function is used;
·The output layer is used to output digital recognition results 0~9, so the output layer is set to 10 nodes.

We use neural network to train the handwritten digit model, which is divided into the following steps:
Note: These chunked codes are for step-by-step explanation and cannot be run alone. I will attach the complete code at the end, which can be copied and used directly.

①Load data

mnist = tf.keras.datasets.mnist
(train_x,train_y),(test_x,test_y) = mnist.load_data()
print('\\
 train_x:%s, train_y:%s, test_x:%s, test_y:%s'%(train_x.shape,train_y.shape,test_x.shape,test_y.shape))

Run results:

train_x:(60000, 28, 28), train_y:(60000,), test_x:(10000, 28, 28), test_y:(10000,)

It can be seen from here that the structural shape of the data set train_x stores 60,000 digital images of 28*28 pixels, and train_y has 60,000 label values corresponding to it, which is used to train the model; the same is true for test_x and test_y, which are used for testing. Model.

②Data preprocessing

#Normalize and convert to tensor, the data type is float32.
X_train,X_test = tf.cast(train_x/255.0,tf.float32),tf.cast(test_x/255.0,tf.float32)
y_train,y_test = tf.cast(train_y,tf.int16),tf.cast(test_y,tf.int16)

③Build a model

Since the data set is relatively simple, a neural network using a single hidden layer can achieve sufficiently low loss and high accuracy.

model = tf.keras.Sequential()
model.add(tf.keras.layers.Flatten(input_shape=(28,28))) #Add a Flatten layer to describe the shape of the input data
model.add(tf.keras.layers.Dense(128,activation='relu')) #Add a hidden layer, a fully connected layer, 128 nodes, relu activation function
model.add(tf.keras.layers.Dense(10,activation='softmax')) #Add an output layer, which is a fully connected layer, 10 nodes, softmax activation function
print('\\
',model.summary()) #View network structure and parameter information

④Configure model training method

#adam algorithm parameters use the default public parameters of keras, the loss function uses the sparse cross entropy loss function, and the accuracy uses the sparse classification accuracy function
model.compile(optimizer='adam',loss='sparse_categorical_crossentropy',metrics=['sparse_categorical_accuracy'])

⑤Training model

#The batch training size is 64, iteration is 5 times, and the test set ratio is 0.2 (48,000 training set data, 12,000 test set data)
history = model.fit(X_train,y_train,batch_size=64,epochs=5,validation_split=0.2)

⑥Evaluation model

model.evaluate(X_test,y_test,verbose=2) #Each iteration outputs a record to evaluate whether the model has better generalization ability

⑦Save model

There are two formats for saving models:
·HDF5 format
·SavedModel format

I will introduce the difference between the two in subsequent articles. I use the HDF5 format, so I won’t go into details here.

There are two ways to save models:
·Save only model parameters model.save_weights()
·Save the entire model model.save()

I will also introduce the difference between the two in subsequent articles. You can try both methods. The code is as follows.

#Save model parameters
#model.save_weights('C:\Users\\xuyansong\Desktop\Deep Learning\python\MNIST\Model parameters\ mnist_weights.h5')
#Save the entire model
model.save('C:\Users\\xuyansong\Desktop\Deep Learning\python\MNIST\Entire model\\mnist_weights .h5')

Among them, the path in ( ) can be modified according to your own needs. If the path is omitted, it will be saved to the current working path by default.

#Save model parameters
#model.save_weights('mnist_weights.h5')
#Save the entire model
model.save('mnist_weights.h5')

⑧Result visualization

#Result visualization
print(history.history)
loss = history.history['loss'] #Training set loss
val_loss = history.history['val_loss'] #Test set loss
acc = history.history['sparse_categorical_accuracy'] #Training set accuracy
val_acc = history.history['val_sparse_categorical_accuracy'] #Test set accuracy

plt.figure(figsize=(10,3))

plt.subplot(121)
plt.plot(loss,color='b',label='train')
plt.plot(val_loss,color='r',label='test')
plt.ylabel('loss')
plt.legend()

plt.subplot(122)
plt.plot(acc,color='b',label='train')
plt.plot(val_acc,color='r',label='test')
plt.ylabel('Accuracy')
plt.legend()

#Pause for 5 seconds to close the canvas, otherwise the canvas will continue to occupy GPU memory while it is open.
#plt.ion() #Open interactive operation mode
#plt.show()
#plt.pause(5)
#plt.close()

plt.show()

Run results:

⑨Use model

Randomly select 10 images from the test set samples and display the results:

plt.figure()
for i in range(10):
    num = np.random.randint(1,10000) #Generate a random integer between 1 and 10000

    plt.subplot(2,5,i + 1)
    plt.axis('off')
    plt.imshow(test_x[num],cmap='gray')
    demo = tf.reshape(X_test[num],(1,28,28))
    y_pred = np.argmax(model.predict(demo))
    plt.title('Tag value:' + str(test_y[num]) + '\\
Prediction value:' + str(y_pred))

If you don’t understand the demo statement and y_pred statement in this code, you can read this article I posted:
URL: https://blog.csdn.net/weixin_45954454/article/details/114437165?spm=1001.2014.3001.5501 (Using the Sequential framework of Keras to build a neural network model, an error occurs when using the model for classification…)

3. Code implementation – use directly

I am using Tensorflow2.2.0-gpu version. Judging from the currently updated versions, versions 2.0.0 and above can be used directly.
If your GPU is not configured properly, the code will report an error on the specified GPU. If you still don’t know how to solve this error, just choose to use the CPU to run the program, that is, comment out these two lines of code:

gpus = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(gpus[0],True)

After commenting it out, you can basically run it through.

3.1 Train the model and save the model

Without further ado, the complete code is attached:
Emphasize again: The path in model.save() can be modified according to your own needs. If you omit the path, it will be saved to the current working path by default.

########Handwritten digit data set##########
###########Save model############
########1 hidden layer (fully connected layer)##########
#60000 training data and 10000 test data, 28x28 pixel grayscale image
#Hidden layer activation function: ReLU function
#Output layer activation function: softmax function (to achieve multi-classification)
#Loss function: sparse cross entropy loss function
#The input layer has 784 nodes, the hidden layer has 128 neurons, and the output layer has 10 nodes
import tensorflow astf
import matplotlib.pyplot as plt
import numpy as np

import time
print('-------------')
nowtime = time.strftime('%Y-%m-%d %H:%M:%S')
print(nowtime)

#Specify GPU
#importos
#os.environ["CUDA_VISIBLE_DEVICES"] = "0"
gpus = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(gpus[0],True)
#initialization
plt.rcParams['font.sans-serif'] = ['SimHei']

#Download Data
mnist = tf.keras.datasets.mnist
(train_x,train_y),(test_x,test_y) = mnist.load_data()
print('\\
 train_x:%s, train_y:%s, test_x:%s, test_y:%s'%(train_x.shape,train_y.shape,test_x.shape,test_y.shape))

#Data preprocessing
#X_train = train_x.reshape((60000,28*28))
#Y_train = train_y.reshape((60000,28*28)) #Later, use tf.keras.layers.Flatten() to change the array shape
X_train,X_test = tf.cast(train_x/255.0,tf.float32),tf.cast(test_x/255.0,tf.float32) #Normalization
y_train,y_test = tf.cast(train_y,tf.int16),tf.cast(test_y,tf.int16)

#Modeling
model = tf.keras.Sequential()
model.add(tf.keras.layers.Flatten(input_shape=(28,28))) #Add a Flatten layer to describe the shape of the input data
model.add(tf.keras.layers.Dense(128,activation='relu')) #Add a hidden layer, a fully connected layer, 128 nodes, relu activation function
model.add(tf.keras.layers.Dense(10,activation='softmax')) #Add an output layer, which is a fully connected layer, 10 nodes, softmax activation function
print('\\
',model.summary()) #View network structure and parameter information

#Configure model training method
#adam algorithm parameters use the default public parameters of keras, the loss function uses the sparse cross entropy loss function, and the accuracy uses the sparse classification accuracy function
model.compile(optimizer='adam',loss='sparse_categorical_crossentropy',metrics=['sparse_categorical_accuracy'])

#Training model
#Batch training size is 64, iteration is 5 times, test set ratio is 0.2 (48000 training set data, 12000 test set data)
print('-------------')
nowtime = time.strftime('%Y-%m-%d %H:%M:%S')
print('Pre-training time:' + str(nowtime))

history = model.fit(X_train,y_train,batch_size=64,epochs=5,validation_split=0.2)

print('-------------')
nowtime = time.strftime('%Y-%m-%d %H:%M:%S')
print('Time after training:' + str(nowtime))
#evaluate model
model.evaluate(X_test,y_test,verbose=2) #Each iteration outputs a record to evaluate whether the model has better generalization ability

#Save model parameters
#model.save_weights('C:\Users\\xuyansong\Desktop\Deep Learning\python\\MNIST\Model Parameters\ mnist_weights.h5')
#Save the entire model
model.save('mnist_weights.h5')


#Result visualization
print(history.history)
loss = history.history['loss'] #Training set loss
val_loss = history.history['val_loss'] #Test set loss
acc = history.history['sparse_categorical_accuracy'] #Training set accuracy
val_acc = history.history['val_sparse_categorical_accuracy'] #Test set accuracy

plt.figure(figsize=(10,3))

plt.subplot(121)
plt.plot(loss,color='b',label='train')
plt.plot(val_loss,color='r',label='test')
plt.ylabel('loss')
plt.legend()

plt.subplot(122)
plt.plot(acc,color='b',label='train')
plt.plot(val_acc,color='r',label='test')
plt.ylabel('Accuracy')
plt.legend()

#Pause for 5 seconds to close the canvas, otherwise the canvas will continue to occupy GPU memory while it is open.
#Choose according to your needs
#plt.ion() #Open interactive operation mode
#plt.show()
#plt.pause(5)
#plt.close()

#Use model
plt.figure()
for i in range(10):
    num = np.random.randint(1,10000)

    plt.subplot(2,5,i + 1)
    plt.axis('off')
    plt.imshow(test_x[num],cmap='gray')
    demo = tf.reshape(X_test[num],(1,28,28))
    y_pred = np.argmax(model.predict(demo))
    plt.title('Tag value:' + str(test_y[num]) + '\\
Prediction value:' + str(y_pred))
#y_pred = np.argmax(model.predict(X_test[0:5]),axis=1)
#print('X_test[0:5]: %s'%(X_test[0:5].shape))
#print('y_pred: %s'%(y_pred))

#plt.ion() #Open interactive operation mode
plt.show()
#plt.pause(5)
#plt.close()

Run results:

Convolutional neural network training model

Regarding the introduction and use of convolutional neural networks, I have written an article [Neural Networks and Deep Learning] Introduction to the CIFAR10 data set, and used convolutional neural networks to train image classification models – Attach complete code And trained model files – use directly: https://blog.csdn.net/weixin_45954454/article/details/114519299?spm=1001.2014.3001.5501
The usage of the MNIST data set is similar to the above. If you need the source code of the convolutional neural network training of the MNIST data set, see the end of this article (Chapter 4) [Four. Trained model files – download directly using].

3.2 Load the trained model

3.2.1 Model saved using model.save_weights()

For models saved using model.save_weights(), use the following complete code to load the model:

########Handwritten digit data set##########
###########Load model parameter method############
########1 hidden layer (fully connected layer)##########
#60000 training data and 10000 test data, 28x28 pixel grayscale image
#Hidden layer activation function: ReLU function
#Output layer activation function: softmax function (to achieve multi-classification)
#Loss function: sparse cross entropy loss function
#The input layer has 784 nodes, the hidden layer has 128 neurons, and the output layer has 10 nodes
import tensorflow astf
import matplotlib.pyplot as plt
import numpy as np

import time
print('-------------')
nowtime = time.strftime('%Y-%m-%d %H:%M:%S')
print(nowtime)

#Specify GPU
#importos
#os.environ["CUDA_VISIBLE_DEVICES"] = "0"
gpus = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(gpus[0],True)
#initialization
plt.rcParams['font.sans-serif'] = ['SimHei']

#Download Data
mnist = tf.keras.datasets.mnist
(train_x,train_y),(test_x,test_y) = mnist.load_data()
print('\\
 train_x:%s, train_y:%s, test_x:%s, test_y:%s'%(train_x.shape,train_y.shape,test_x.shape,test_y.shape))

#Data preprocessing
#X_train = train_x.reshape((60000,28*28))
#Y_train = train_y.reshape((60000,28*28)) #Later, use tf.keras.layers.Flatten() to change the array shape
X_train,X_test = tf.cast(train_x/255.0,tf.float32),tf.cast(test_x/255.0,tf.float32) #Normalization
y_train,y_test = tf.cast(train_y,tf.int16),tf.cast(test_y,tf.int16)

#Modeling
model = tf.keras.Sequential()
model.add(tf.keras.layers.Flatten(input_shape=(28,28))) #Add a Flatten layer to describe the shape of the input data
model.add(tf.keras.layers.Dense(128,activation='relu')) #Add a hidden layer, a fully connected layer, 128 nodes, relu activation function
model.add(tf.keras.layers.Dense(10,activation='softmax')) #Add an output layer, which is a fully connected layer, 10 nodes, softmax activation function
print('\\
',model.summary()) #View network structure and parameter information

#Configure model training method
#adam algorithm parameters use the default public parameters of keras, the loss function uses the sparse cross entropy loss function, and the accuracy uses the sparse classification accuracy function
model.compile(optimizer='adam',loss='sparse_categorical_crossentropy',metrics=['sparse_categorical_accuracy'])

#Load model parameters
history = model.load_weights('C:\Users\\xuyansong\Desktop\Deep Learning\python\MNIST\Model Parameters\\ \mnist_weights.h5') #The path is modified according to the actual location of the file, otherwise an error will be reported


#evaluate model
model.evaluate(X_test,y_test,verbose=2) #Each iteration outputs a record to evaluate whether the model has better generalization ability

#Use model
plt.figure()
for i in range(10):
    num = np.random.randint(1,10000)

    plt.subplot(2,5,i + 1)
    plt.axis('off')
    plt.imshow(test_x[num],cmap='gray')
    demo = tf.reshape(X_test[num],(1,28,28))
    y_pred = np.argmax(model.predict(demo))
    plt.title('Tag value:' + str(test_y[num]) + '\\
Prediction value:' + str(y_pred))
#y_pred = np.argmax(model.predict(X_test[0:5]),axis=1)
#print('X_test[0:5]: %s'%(X_test[0:5].shape))
#print('y_pred: %s'%(y_pred))

plt.ion() #Open interactive operation mode
plt.show()
plt.pause(5)
plt.close()

Run results:

3.2.2 Model saved using model.save()

For models saved using model.save(), use the following complete code to load the model:

########Handwritten digit data set##########
###########Load the entire model method############
########1 hidden layer (fully connected layer)##########
#60000 training data and 10000 test data, 28x28 pixel grayscale image
#Hidden layer activation function: ReLU function
#Output layer activation function: softmax function (to achieve multi-classification)
#Loss function: sparse cross entropy loss function
#The input layer has 784 nodes, the hidden layer has 128 neurons, and the output layer has 10 nodes
import tensorflow astf
import matplotlib.pyplot as plt
import numpy as np

import time
print('-------------')
nowtime = time.strftime('%Y-%m-%d %H:%M:%S')
print(nowtime)

#Specify GPU
#importos
#os.environ["CUDA_VISIBLE_DEVICES"] = "0"
gpus = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(gpus[0],True)
#initialization
plt.rcParams['font.sans-serif'] = ['SimHei']

#Download Data
mnist = tf.keras.datasets.mnist
(train_x,train_y),(test_x,test_y) = mnist.load_data()
print('\\
 train_x:%s, train_y:%s, test_x:%s, test_y:%s'%(train_x.shape,train_y.shape,test_x.shape,test_y.shape))

#Data preprocessing
#X_train = train_x.reshape((60000,28*28))
#Y_train = train_y.reshape((60000,28*28)) #Later, use tf.keras.layers.Flatten() to change the array shape
X_train,X_test = tf.cast(train_x/255.0,tf.float32),tf.cast(test_x/255.0,tf.float32) #Normalization
y_train,y_test = tf.cast(train_y,tf.int16),tf.cast(test_y,tf.int16)


#Load the entire model
model = tf.keras.models.load_model('C:\Users\\xuyansong\Desktop\Deep Learning\python\\MNIST\Entire Model\mnist_weights.h5') #The path is modified according to the actual location of the file, otherwise an error will be reported
model.summary() #View summary

#evaluate model
model.evaluate(X_test,y_test,verbose=2) #Each iteration outputs a record to evaluate whether the model has better generalization ability.

#Use model
plt.figure()
for i in range(10):
    num = np.random.randint(1,10000)

    plt.subplot(2,5,i + 1)
    plt.axis('off')
    plt.imshow(test_x[num],cmap='gray')
    demo = tf.reshape(X_test[num],(1,28,28))
    y_pred = np.argmax(model.predict(demo))
    plt.title('Tag value:' + str(test_y[num]) + '\\
Prediction value:' + str(y_pred))
#y_pred = np.argmax(model.predict(X_test[0:5]),axis=1)
#print('X_test[0:5]: %s'%(X_test[0:5].shape))
#print('y_pred: %s'%(y_pred))

plt.ion() #Open interactive operation mode
plt.show()
plt.pause(5)
plt.close()

Run results:

Note: Comparing the results of these three runs, if you are careful, you will notice: Why are the results of the loss function and sparse_categorical_accuracy accuracy I showed different?



This is because every time the model is trained, due to the random initial values of the neural network parameters, even if the neural network uses the same method in the learning process, the learning process and results will be different. If you don’t believe me, you can check The results of each training will be slightly different (that is, the same model is not trained each time). This is the charm of deep learning.
Because when I save these two models, they are not saved under the same training model, so the model results will be different. But if yousave these two models under the same training model (that is, if you save the model parameters, the model is determined), as long as the test set remains unchanged, the test results must be the same, and they will always be the same. It won’t change.

4. Trained model file – use directly

Fully connected neural network model

csdn resource download link: Use the MNIST data set to train a handwritten digit recognition model – Attach complete code and Trained model file – Use directly.: https://download .csdn.net/download/weixin_45954454/15621509?spm=1001.2014.3001.5501

Convolutional neural network model

csdn resource download link: [Neural Network and Deep Learning] Introduction to the MNIST data set, and using convolutional neural networks to train handwritten digit recognition models – Attached is the complete code and trained model files< /strong>–Use directly: https://download.csdn.net/download/weixin_45954454/15650033?spm=1001.2014.3001.5503

If there is a problem, you can send me a private message to solve it and check back from time to time.

If you have any questions, please criticize and correct me^^

syntaxbug.com © 2021 All Rights Reserved.