A complete example of applying the CNN model trained on MNIST to recognize handwritten digit pictures (pictures from the Internet)

1 Think about how to apply the training model

How can the MNIST model trained through CNN be applied to recognize pictures of handwritten digits (pictures come from the Internet)?

This problem bothered me for 2 days. Many of the codes I found online were to train the model and call the model in a .py file. In this way, every time the model is called, the model needs to be retrained. This method is obviously inefficient;

I thought of separating the .py file for training the model and the .py file for calling the model prediction, but how to write the .py file for calling the model? Many answers are as follows:

saver = tf.train.Saver() # define saver
with tf.Session() as sess:
    sess.run(intt)
    #Load model
    saver.restore(sess,"./save/model.ckpt")

This answer is not the answer I want. I think that for the loaded model to work, it should at least have input and output parameters, so I want to pass parameters between two .py files. The result I receive is:

from xxx import parameters
Get the parameters of xxx.py file

But after I wrote this, I directly ran the training model file again. This was not the effect I wanted, and the final image recognition also reported an error, and the program execution was interrupted;

Finally I stumbled upon the following article:

Multi-layer neural network modeling and model preservation and restoration
https://www.cnblogs.com/HuangYJ/p/11681357.html

Simply put, saver.restore() is a parameter for loading a model: first define a model with the same structure (to define a model with the same structure as the previously saved model, only if their structures are the same can these variables match and can they be read out The value of the variable is assigned to the value of the variable waiting to be overwritten).

2 Main graph of training model

From the main graph above, we can intuitively see that we need to define a total of 10 model structures:

input
image
conv_layer1
pooling_layer1
conv_layer2
pooling_layer2
fc_layer3
dropout
output_fc_layer4
softmax

Codes for 10 structures (codes for function definitions are not included):

with tf.name_scope('input'):
    x=tf.placeholder(tf.float32,[None,784])
    y_=tf.placeholder('float',[None,10])
    
with tf.name_scope('image'):
    x_image=tf.reshape(x,[-1,28,28,1])
    tf.summary.image('input_image',x_image,8)
  
with tf.name_scope('conv_layer1'):
    W_conv1=weight_variable([5,5,1,32])
    b_conv1=bias_variable([32])
    h_conv1=tf.nn.relu(conv2d(x_image,W_conv1) + b_conv1)

with tf.name_scope('pooling_layer1'):
    h_pool1=max_pool_2x2(h_conv1)

with tf.name_scope('conv_layer2'):
    W_conv2=weight_variable([5,5,32,64])
    b_conv2=bias_variable([64])
    h_conv2=tf.nn.relu(conv2d(h_pool1,W_conv2) + b_conv2)
    
with tf.name_scope('pooling_layer2'):
    h_pool2=max_pool_2x2(h_conv2)
    
with tf.name_scope('fc_layer3'):
    W_fc1=weight_variable([7*7*64,1024])
    b_fc1=bias_variable([1024])
    h_pool2_flat=tf.reshape(h_pool2,[-1,7*7*64])
    h_fc1=tf.nn.relu(tf.matmul(h_pool2_flat,W_fc1) + b_fc1)
  
with tf.name_scope('dropout'):
    keep_prob=tf.placeholder(tf.float32)
    h_fc1_drop=tf.nn.dropout(h_fc1,keep_prob)

with tf.name_scope('output_fc_layer4'):
    W_fc2=weight_variable([1024,10])
    b_fc2=bias_variable([10])
  
with tf.name_scope('softmax'):
    y_conv=tf.nn.softmax(tf.matmul(h_fc1_drop,W_fc2) + b_fc2)

3 Constructing the input and output of the model

In other words, when we call this trained model, we hope that if we input a picture of handwritten digits, the model can automatically help us identify the numbers on this picture and print it out. The above is what we want to achieve, but the essence of the trained model is to do mathematical operations. The image input and the recognition digital output must be determined according to the model.

The input requirement of the model is a one-dimensional tensor (vector), and the image requirement is 28*28 size, with a total of 784 pixels. It needs to be expanded from a 2-dimensional tensor (matrix) into a one-dimensional tensor. The following code implements :

text = Image.open('./images/text3.png') # Load images
data = list(text.getdata())
picture=[(255-x)*1.0/255.0 for x in data] #picture is used as input to call the model

The output of the model is the output of the softmax function operation. It is a long array of probabilities. We need to find the number corresponding to the maximum probability. This number is the result predicted by the transferred model. The following code implements: strong>

 # Make predictions
    prediction = tf.argmax(y_conv,1)#Find the number corresponding to the maximum probability
    predict_result = prediction.eval(feed_dict={<!-- -->x: [picture],keep_prob:1.0},session=sess)
    print("The picture you imported is:",predict_result[0])

4 Complete .py code for applying the model for recognition

from PIL import Image
import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()

#---Set model parameters---
def weight_variable(shape):#Weight function
    initial=tf.truncated_normal(shape,stddev=0.1)
    return tf.Variable(initial)
 
def bias_variable(shape):#bias function
    initial=tf.constant(0.1,shape=shape)
    return tf.Variable(initial)

def conv2d(x,W):
    return tf.nn.conv2d(x,W, strides=[1,1,1,1],padding='SAME')

def max_pool_2x2(x):
    return tf.nn.max_pool(x,ksize=[1,2,2,1], strides=[1,2,2,1],padding='SAME')

with tf.name_scope('input'):
    x=tf.placeholder(tf.float32,[None,784])
    y_=tf.placeholder('float',[None,10])
    
with tf.name_scope('image'):
    x_image=tf.reshape(x,[-1,28,28,1])
    tf.summary.image('input_image',x_image,8)
  
with tf.name_scope('conv_layer1'):
    W_conv1=weight_variable([5,5,1,32])
    b_conv1=bias_variable([32])
    h_conv1=tf.nn.relu(conv2d(x_image,W_conv1) + b_conv1)

with tf.name_scope('pooling_layer1'):
    h_pool1=max_pool_2x2(h_conv1)

with tf.name_scope('conv_layer2'):
    W_conv2=weight_variable([5,5,32,64])
    b_conv2=bias_variable([64])
    h_conv2=tf.nn.relu(conv2d(h_pool1,W_conv2) + b_conv2)
    
with tf.name_scope('pooling_layer2'):
    h_pool2=max_pool_2x2(h_conv2)
    
with tf.name_scope('fc_layer3'):
    W_fc1=weight_variable([7*7*64,1024])
    b_fc1=bias_variable([1024])
    h_pool2_flat=tf.reshape(h_pool2,[-1,7*7*64])
    h_fc1=tf.nn.relu(tf.matmul(h_pool2_flat,W_fc1) + b_fc1)
  
with tf.name_scope('dropout'):
    keep_prob=tf.placeholder(tf.float32)
    h_fc1_drop=tf.nn.dropout(h_fc1,keep_prob)

with tf.name_scope('output_fc_layer4'):
    W_fc2=weight_variable([1024,10])
    b_fc2=bias_variable([10])
  
with tf.name_scope('softmax'):
    y_conv=tf.nn.softmax(tf.matmul(h_fc1_drop,W_fc2) + b_fc2)

#---Load the model and test with imported images--
text = Image.open('./images/text2.png') # Load images
data = list(text.getdata())
picture=[(255-x)*1.0/255.0 for x in data]

intt=tf.global_variables_initializer()
saver = tf.train.Saver() # define saver

with tf.Session() as sess:
    sess.run(intt)

    #Load model parameters
    saver.restore(sess,"./save/model.ckpt")
 
    # Make predictions
    prediction = tf.argmax(y_conv,1)
    
    predict_result = prediction.eval(feed_dict={<!-- -->x: [picture],keep_prob:1.0},session=sess)


    print("The picture you imported is:",predict_result[0])

text2.png

Identification results (Spyder compilation)

Model and image download link:
https://download.csdn.net/download/weixin_42899627/12672965

5 Running Tips

Every time the code is run, you need to restart kernel before it can be run again, otherwise an error will be reported. I have not delved into the specific reasons.

Reference article:
1 [Python] MNIST handwritten digit recognition based on CNN – Dongdan – Blog Park
2 Using MNIST training model to recognize handwritten digits under TensorFlow – qiuhlee – Blog Park
3 Multi-layer neural network modeling and model preservation and restoration
4 TensorFlow Practical (3) Introduction to Classification Application: MNIST Handwritten Digit Recognition

The above is my personal understanding. I hope you will criticize and correct me if I am wrong.