If you are bored and don’t want to study, just have some fun. The following is the original article.
Reference article address: 100 examples of deep learning – convolutional neural network (CNN) to realize mnist handwritten digit recognition | Day 1_CNN-based case of deep learning_K’s blog-CSDN blog
By default, the preliminary work is fine.
1.minist data set
The MNIST handwritten digit data set comes from the National Institute of Standards and Technology and is one of the famous public data sets. The digital pictures in the data set are purely handwritten and drawn by 250 people from different professions. The website where the data set is obtained is: http://yann.lecun.com/exdb/mnist/ (you need to unzip it after downloading).
We generally use the line of code (train_images, train_labels), (test_images, test_labels) = datasets.mnist.load_data() to call it directly, which is relatively simple.
Code:
# Import mnist data, which are training set images, training set labels, test set images, and test set labels. (train_images, train_labels), (test_images, test_labels) = datasets.mnist.load_data()
The MNIST handwritten digits data set contains 70,000 pictures, of which 60,000 are training data and 10,000 are test data (already divided and included in keras, which is more convenient). The 70,000 pictures are all 28*28.
If we convert the pixels in each image into vectors, we will get a vector with a length of 28*28=784
. Therefore, we can think of the training set as a [60000,784]
tensor. The first dimension represents the index of the image, and the second dimension represents the pixels in each image. The value of each pixel in the picture is between 0-1
(generally speaking, the value of a pixel is 0-255, so simple normalization is required).
1.5 Set up gpu (set as needed)
import tensorflow as tf
gpus = tf.config.list_physical_devices("GPU") if gpus: gpu0 = gpus[0] # If there are multiple GPUs, only use the 0th GPU tf.config.experimental.set_memory_growth(gpu0, True) # Set GPU memory usage as needed tf.config.set_visible_devices([gpu0],"GPU")
2.Normalization
The data set has been imported before.
# Normalize pixel values to the range from 0 to 1. (For grayscale images, the maximum value of each pixel is 255, and the minimum value of each pixel is 0. That is, normalization can be completed by directly dividing by 255.) train_images, test_images = train_images / 255.0, test_images / 255.0 # View data dimension information print(train_images.shape, test_images.shape, train_labels.shape, test_labels.shape)
3. Data visualization diagram (let you see what is in the data set)
# Visually display the first 50 image data of the data set # Draw with an image size of 20 width and 10 length (units are inches) plt.figure(figsize=(20, 10)) # Traverse the MNIST data set subscript values 0~49 for i in range(50): # Divide the entire figure into 5 rows and 10 columns, and draw the i + 1th subfigure. plt.subplot(5,10,i + 1) # Set not to display x-axis scale plt.xticks([]) #Set not to display y-axis scale plt.yticks([]) #Set not to display sub-picture grid lines plt.grid(False) # Image display, cmap is the color map, "plt.cm.binary" is the color table in matplotlib.cm plt.imshow(train_images[i], cmap=plt.cm.binary) #Set the x-axis label to display as the number corresponding to the picture plt.xlabel(train_labels[i]) # display image plt.show()
The picture shown is the one in the data set section above.
Because we need to enter CNN, we need to adjust the tensor dimension, as follows:
train_images = train_images.reshape((60000, 28, 28, 1)) test_images = test_images.reshape((10000, 28, 28, 1)) print(train_images.shape, test_images.shape, train_labels.shape, test_labels.shape)
4. Construction of CNN model
Next is the most important construction of CNN, using sequential method here. (Borrowing the network structure diagram of the original article)
model = models.Sequential([ # Set the two-dimensional convolution layer 1, set 32 3*3 convolution kernels (better), and the input_shape parameter sets the input shape of the layer to (28, 28, 1) layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)), # After this, w=(w1-kernel + 1)/s, 26*26*32 (number of 32 channels) # Maximum pooling layer 1, 2*2 sampling (the role of parameter reduction does not seem to be good for network accuracy, but it can greatly speed up the speed) layers.MaxPooling2D((2, 2)), #Set two-dimensional convolution layer 2 and set 64 3*3 convolution kernels layers.Conv2D(64, (3, 3), activation='relu'), # Pooling layer 2, 2*2 sampling layers.MaxPooling2D((2, 2)), # becomes 5*5*64 (11 to 5) layers.Flatten(), # Flatten layer (expansion), which flattens each element in the input data into a one-dimensional array according to its position in the multi-dimensional array, connecting the convolutional layer and the fully connected layer layers.Dense(64, activation='relu'), # Fully connected layer, features are further extracted, 64 is the dimension of the output space, the activation parameter sets the activation function to the ReLu function layers.Dense(10) # Output layer, outputs the expected result, 10 is the dimension of the output space ]) #Print network structure model.summary()
The printed structure is as follows:
No problem, let’s compile and train the model.
5. Model compilation and training
code show as below:
# The model.compile() method is used to inform the optimizer, loss function and accuracy evaluation standard used in training when configuring the training method. model.compile( # Set the optimizer to Adam optimizer optimizer='adam', # Set the loss function to the cross entropy loss function (tf.keras.losses.SparseCategoricalCrossentropy()) # When from_logits is True, y_pred will be converted into probability (using softmax), otherwise no conversion will be performed. Usually, using True will result in a more stable result. loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), #Set a list of performance indicators. The indicators in the list will be monitored during model training. metrics=['accuracy']) history = model.fit( #Input training set images train_images, # Enter training set labels train_labels, #Set 10 epochs. Each epoch will input all data into the model to complete training. epochs=10, # Set up verification set validation_data=(test_images, test_labels))
6. Make predictions
plt.imshow(test_images[2]) pre = model.predict(test_images) # Predict all test images plt.show() print(pre[2])
If acc is too high, the curve will not be drawn. .
Okay, okay, that’s it for now, it would be rude to learn any more. The overall code is as follows:
import tensorflow as tf from keras import datasets, layers, models import matplotlib.pyplot as plt gpus = tf.config.list_physical_devices("GPU") if gpus: gpu0 = gpus[0] # If there are multiple GPUs, only use the 0th GPU tf.config.experimental.set_memory_growth(gpu0, True) # Set GPU memory usage as needed tf.config.set_visible_devices([gpu0],"GPU") # Import mnist data, which are training set images, training set labels, test set images, and test set labels in order. (train_images, train_labels), (test_images, test_labels) = datasets.mnist.load_data() # Normalize pixel values to the range from 0 to 1. (For grayscale images, the maximum value of each pixel is 255, and the minimum value of each pixel is 0. That is, normalization can be completed by directly dividing by 255.) train_images, test_images = train_images / 255.0, test_images / 255.0 # View data dimension information print(train_images.shape, test_images.shape, train_labels.shape, test_labels.shape) # Visually display the first 50 image data of the data set # Draw with an image size of 20 width and 10 length (units are inches) plt.figure(figsize=(20, 10)) # Traverse the MNIST data set subscript values 0~49 for i in range(50): # Divide the entire figure into 5 rows and 10 columns, and draw the i + 1th subfigure. plt.subplot(5,10,i + 1) # Set not to display x-axis scale plt.xticks([]) #Set not to display y-axis scale plt.yticks([]) #Set not to display sub-picture grid lines plt.grid(False) # Image display, cmap is the color map, "plt.cm.binary" is the color table in matplotlib.cm plt.imshow(train_images[i], cmap=plt.cm.binary) #Set the x-axis label to display as the number corresponding to the picture plt.xlabel(train_labels[i]) # display image plt.show() # Adjust the data to the format we need train_images = train_images.reshape((60000, 28, 28, 1)) test_images = test_images.reshape((10000, 28, 28, 1)) print(train_images.shape, test_images.shape, train_labels.shape, test_labels.shape) #Create and set up a convolutional neural network # Convolution layer: perform dimensionality reduction and feature extraction on the input image through convolution operations # Pooling layer: It is a non-linear form of downsampling. It is mainly used for feature dimensionality reduction, compressing the number of data and parameters, reducing overfitting, and improving the robustness of the model. # Fully connected layer: After several convolution and pooling layers, advanced reasoning in neural networks is completed through fully connected layers. model = models.Sequential([ # Set the two-dimensional convolution layer 1, set 32 3*3 convolution kernels (better), and the input_shape parameter sets the input shape of the layer to (28, 28, 1) layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)), # After this, w=(w1-kernel + 1)/s, 26*26*32 (number of 32 channels) # Maximum pooling layer 1, 2*2 sampling (the role of parameter reduction does not seem to be good for network accuracy, but it can greatly speed up the speed) layers.MaxPooling2D((2, 2)), #Set two-dimensional convolution layer 2 and set 64 3*3 convolution kernels layers.Conv2D(64, (3, 3), activation='relu'), # Pooling layer 2, 2*2 sampling layers.MaxPooling2D((2, 2)), # becomes 5*5*64 layers.Flatten(), # Flatten layer (expansion), which flattens each element in the input data into a one-dimensional array according to its position in the multi-dimensional array, connecting the convolutional layer and the fully connected layer layers.Dense(64, activation='relu'), # Fully connected layer, features are further extracted, 64 is the dimension of the output space, the activation parameter sets the activation function to the ReLu function layers.Dense(10) # Output layer, outputs the expected result, 10 is the dimension of the output space ]) #Print network structure model.summary() # The model.compile() method is used to inform the optimizer, loss function and accuracy evaluation standard used during training when configuring the training method. model.compile( # Set the optimizer to Adam optimizer optimizer='adam', #Set the loss function to the cross entropy loss function (tf.keras.losses.SparseCategoricalCrossentropy()) # When from_logits is True, y_pred will be converted into probability (using softmax), otherwise no conversion will be performed. Usually, the result of using True is more stable. loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), #Set a list of performance indicators. The indicators in the list will be monitored during model training. metrics=['accuracy']) history = model.fit( #Input training set images train_images, # Enter training set labels train_labels, #Set 10 epochs. Each epoch will input all data into the model to complete training. epochs=10, # Set up verification set validation_data=(test_images, test_labels)) plt.imshow(test_images[2]) pre = model.predict(test_images) # Predict all test images plt.show() print(pre[2])