1. Convolutional neural network (CNN) implements minist data set

If you are bored and don’t want to study, just have some fun. The following is the original article.
Reference article address: 100 examples of deep learning – convolutional neural network (CNN) to realize mnist handwritten digit recognition | Day 1_CNN-based case of deep learning_K’s blog-CSDN blog

By default, the preliminary work is fine.

1.minist data set

The MNIST handwritten digit data set comes from the National Institute of Standards and Technology and is one of the famous public data sets. The digital pictures in the data set are purely handwritten and drawn by 250 people from different professions. The website where the data set is obtained is: http://yann.lecun.com/exdb/mnist/ (you need to unzip it after downloading).

We generally use the line of code (train_images, train_labels), (test_images, test_labels) = datasets.mnist.load_data() to call it directly, which is relatively simple.

Code:

# Import mnist data, which are training set images, training set labels, test set images, and test set labels.
(train_images, train_labels), (test_images, test_labels) = datasets.mnist.load_data()

The MNIST handwritten digits data set contains 70,000 pictures, of which 60,000 are training data and 10,000 are test data (already divided and included in keras, which is more convenient). The 70,000 pictures are all 28*28.

If we convert the pixels in each image into vectors, we will get a vector with a length of 28*28=784. Therefore, we can think of the training set as a [60000,784] tensor. The first dimension represents the index of the image, and the second dimension represents the pixels in each image. The value of each pixel in the picture is between 0-1 (generally speaking, the value of a pixel is 0-255, so simple normalization is required).

1.5 Set up gpu (set as needed)

import tensorflow as tf
gpus = tf.config.list_physical_devices("GPU")

if gpus:
    gpu0 = gpus[0] # If there are multiple GPUs, only use the 0th GPU
    tf.config.experimental.set_memory_growth(gpu0, True) # Set GPU memory usage as needed
    tf.config.set_visible_devices([gpu0],"GPU")

2.Normalization

The data set has been imported before.

# Normalize pixel values to the range from 0 to 1. (For grayscale images, the maximum value of each pixel is 255, and the minimum value of each pixel is 0. That is, normalization can be completed by directly dividing by 255.)
train_images, test_images = train_images / 255.0, test_images / 255.0
# View data dimension information
print(train_images.shape, test_images.shape, train_labels.shape, test_labels.shape)

3. Data visualization diagram (let you see what is in the data set)

# Visually display the first 50 image data of the data set
# Draw with an image size of 20 width and 10 length (units are inches)
plt.figure(figsize=(20, 10))
# Traverse the MNIST data set subscript values 0~49
for i in range(50):
    # Divide the entire figure into 5 rows and 10 columns, and draw the i + 1th subfigure.
    plt.subplot(5,10,i + 1)
    # Set not to display x-axis scale
    plt.xticks([])
    #Set not to display y-axis scale
    plt.yticks([])
    #Set not to display sub-picture grid lines
    plt.grid(False)
    # Image display, cmap is the color map, "plt.cm.binary" is the color table in matplotlib.cm
    plt.imshow(train_images[i], cmap=plt.cm.binary)
    #Set the x-axis label to display as the number corresponding to the picture
    plt.xlabel(train_labels[i])
# display image
plt.show()

The picture shown is the one in the data set section above.

Because we need to enter CNN, we need to adjust the tensor dimension, as follows:

train_images = train_images.reshape((60000, 28, 28, 1))
test_images = test_images.reshape((10000, 28, 28, 1))

print(train_images.shape, test_images.shape, train_labels.shape, test_labels.shape)

4. Construction of CNN model

Next is the most important construction of CNN, using sequential method here. (Borrowing the network structure diagram of the original article)

model = models.Sequential([
    # Set the two-dimensional convolution layer 1, set 32 3*3 convolution kernels (better), and the input_shape parameter sets the input shape of the layer to (28, 28, 1)
    layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
    # After this, w=(w1-kernel + 1)/s, 26*26*32 (number of 32 channels)
    # Maximum pooling layer 1, 2*2 sampling (the role of parameter reduction does not seem to be good for network accuracy, but it can greatly speed up the speed)
    layers.MaxPooling2D((2, 2)),
    #Set two-dimensional convolution layer 2 and set 64 3*3 convolution kernels
    layers.Conv2D(64, (3, 3), activation='relu'),
    # Pooling layer 2, 2*2 sampling
    layers.MaxPooling2D((2, 2)),
    # becomes 5*5*64 (11 to 5)
    layers.Flatten(), # Flatten layer (expansion), which flattens each element in the input data into a one-dimensional array according to its position in the multi-dimensional array, connecting the convolutional layer and the fully connected layer
    layers.Dense(64, activation='relu'), # Fully connected layer, features are further extracted, 64 is the dimension of the output space, the activation parameter sets the activation function to the ReLu function
    layers.Dense(10) # Output layer, outputs the expected result, 10 is the dimension of the output space
])
#Print network structure
model.summary()

The printed structure is as follows:

No problem, let’s compile and train the model. 

5. Model compilation and training

code show as below:

# The model.compile() method is used to inform the optimizer, loss function and accuracy evaluation standard used in training when configuring the training method.
model.compile(
    # Set the optimizer to Adam optimizer
    optimizer='adam',
    # Set the loss function to the cross entropy loss function (tf.keras.losses.SparseCategoricalCrossentropy())
    # When from_logits is True, y_pred will be converted into probability (using softmax), otherwise no conversion will be performed. Usually, using True will result in a more stable result.
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    #Set a list of performance indicators. The indicators in the list will be monitored during model training.
    metrics=['accuracy'])

history = model.fit(
    #Input training set images
    train_images,
    # Enter training set labels
    train_labels,
    #Set 10 epochs. Each epoch will input all data into the model to complete training.
    epochs=10,
    # Set up verification set
    validation_data=(test_images, test_labels))

6. Make predictions

plt.imshow(test_images[2])

pre = model.predict(test_images) # Predict all test images
plt.show()
print(pre[2])

If acc is too high, the curve will not be drawn. .

Okay, okay, that’s it for now, it would be rude to learn any more. The overall code is as follows:

import tensorflow as tf
from keras import datasets, layers, models
import matplotlib.pyplot as plt


gpus = tf.config.list_physical_devices("GPU")

if gpus:
    gpu0 = gpus[0] # If there are multiple GPUs, only use the 0th GPU
    tf.config.experimental.set_memory_growth(gpu0, True) # Set GPU memory usage as needed
    tf.config.set_visible_devices([gpu0],"GPU")

# Import mnist data, which are training set images, training set labels, test set images, and test set labels in order.
(train_images, train_labels), (test_images, test_labels) = datasets.mnist.load_data()

# Normalize pixel values to the range from 0 to 1. (For grayscale images, the maximum value of each pixel is 255, and the minimum value of each pixel is 0. That is, normalization can be completed by directly dividing by 255.)
train_images, test_images = train_images / 255.0, test_images / 255.0
# View data dimension information
print(train_images.shape, test_images.shape, train_labels.shape, test_labels.shape)

# Visually display the first 50 image data of the data set
# Draw with an image size of 20 width and 10 length (units are inches)
plt.figure(figsize=(20, 10))
# Traverse the MNIST data set subscript values 0~49
for i in range(50):
    # Divide the entire figure into 5 rows and 10 columns, and draw the i + 1th subfigure.
    plt.subplot(5,10,i + 1)
    # Set not to display x-axis scale
    plt.xticks([])
    #Set not to display y-axis scale
    plt.yticks([])
    #Set not to display sub-picture grid lines
    plt.grid(False)
    # Image display, cmap is the color map, "plt.cm.binary" is the color table in matplotlib.cm
    plt.imshow(train_images[i], cmap=plt.cm.binary)
    #Set the x-axis label to display as the number corresponding to the picture
    plt.xlabel(train_labels[i])
# display image
plt.show()

# Adjust the data to the format we need
train_images = train_images.reshape((60000, 28, 28, 1))
test_images = test_images.reshape((10000, 28, 28, 1))

print(train_images.shape, test_images.shape, train_labels.shape, test_labels.shape)

#Create and set up a convolutional neural network
# Convolution layer: perform dimensionality reduction and feature extraction on the input image through convolution operations
# Pooling layer: It is a non-linear form of downsampling. It is mainly used for feature dimensionality reduction, compressing the number of data and parameters, reducing overfitting, and improving the robustness of the model.
# Fully connected layer: After several convolution and pooling layers, advanced reasoning in neural networks is completed through fully connected layers.
model = models.Sequential([
    # Set the two-dimensional convolution layer 1, set 32 3*3 convolution kernels (better), and the input_shape parameter sets the input shape of the layer to (28, 28, 1)
    layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
    # After this, w=(w1-kernel + 1)/s, 26*26*32 (number of 32 channels)
    # Maximum pooling layer 1, 2*2 sampling (the role of parameter reduction does not seem to be good for network accuracy, but it can greatly speed up the speed)
    layers.MaxPooling2D((2, 2)),
    #Set two-dimensional convolution layer 2 and set 64 3*3 convolution kernels
    layers.Conv2D(64, (3, 3), activation='relu'),
    # Pooling layer 2, 2*2 sampling
    layers.MaxPooling2D((2, 2)),
    # becomes 5*5*64
    layers.Flatten(), # Flatten layer (expansion), which flattens each element in the input data into a one-dimensional array according to its position in the multi-dimensional array, connecting the convolutional layer and the fully connected layer
    layers.Dense(64, activation='relu'), # Fully connected layer, features are further extracted, 64 is the dimension of the output space, the activation parameter sets the activation function to the ReLu function
    layers.Dense(10) # Output layer, outputs the expected result, 10 is the dimension of the output space
])
#Print network structure
model.summary()


# The model.compile() method is used to inform the optimizer, loss function and accuracy evaluation standard used during training when configuring the training method.
model.compile(
    # Set the optimizer to Adam optimizer
    optimizer='adam',
    #Set the loss function to the cross entropy loss function (tf.keras.losses.SparseCategoricalCrossentropy())
    # When from_logits is True, y_pred will be converted into probability (using softmax), otherwise no conversion will be performed. Usually, the result of using True is more stable.
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    #Set a list of performance indicators. The indicators in the list will be monitored during model training.
    metrics=['accuracy'])

history = model.fit(
    #Input training set images
    train_images,
    # Enter training set labels
    train_labels,
    #Set 10 epochs. Each epoch will input all data into the model to complete training.
    epochs=10,
    # Set up verification set
    validation_data=(test_images, test_labels))

plt.imshow(test_images[2])
pre = model.predict(test_images) # Predict all test images
plt.show()
print(pre[2])