Week T1: Implementing mnist handwritten digit recognition

This article isThe learning record blog in the 365-day deep learning training camp

Original author: Classmate K | Tutoring, project customization

Article source: Student K’s study circle

Learning experience

In the process of learning tensorflow, the similarities and differences between it and pytorch are very prominent. But in general, during the code implementation process, the biggest feeling is that the difference between the two is only in the code format. The basic core and core parts are basically the same. However, tensorflow has very high compatibility requirements when using it, and various incompatible errors are prone to occur. It is recommended to download the corresponding package when you need to use it, otherwise it is easy to cause incompatibility due to redundant resources. Also, when learning the code in the lesson plan, try to understand the meaning and usage of the functions involved. This kind of learning is more systematic and scientific.

1. Preliminary work

1. Set up GPU

import tensorflow as tf
gpus = tf.config.list_physical_devices("GPU")
# This code uses the tf.config.list_physical_devices() function in the TensorFlow library to obtain the list of available physical devices. The parameter "GPU" means that we only care about GPU devices and not other types of devices (such as CPUs).

if gpus:
    gpu0 = gpus[0] #Select only the 0th gpu among multiple GPUs.
# Set the first GPU device (gpu0) to memory growth mode. In this mode, TensorFlow dynamically allocates and frees GPU memory as needed, rather than occupying all available GPU memory at the beginning.
    tf.config.experimental.set_memory_growth(gpu0,True)
#Set the visible devices of TensorFlow.
    tf.set_visible_devices([gpu0],"GPU")

2. Import data

import tensorflow as tf
from tensorflow.keras import datasets, layers, models
import matplotlib.pyplot as plt

(train_images, train_labels),(test_images, test_labels) = datasets.mnist.load_data()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
11490434/11490434 [==============================] – 4s 0us/step

3.Normalization

Reference article: Classmate K: Normalization and standardization
The function of normalization is to make the features of different dimensions in the same order of magnitude, reduce the impact of variance, and speed up the convergence speed. When the eigenvalue is larger, the gradient value is also larger. When the model is back-propagated, the gradient values are updated the same as the learning rate. A small learning rate will cause slow convergence, and an excessive learning rate will cause the model to be difficult to converge. Therefore, normalizing the image can adjust the features of different dimensions into similar intervals, and accelerate the training of the model with a unified learning rate.

Normalization: converge the number to between (0,1); scaling is only related to the maximum and minimum values Standardization: scale the data proportionally so that it falls into a specific interval, related to each value
Standardization: Pertinent to each point; scaling the data so that it falls into a specific interval

Standardization converts values into a distribution with a mean of 0 and a standard deviation of 1 (not necessarily a normal distribution)

# For grayscale images, the maximum value of each pixel is 255, and the minimum value of each pixel is 0. That is, normalization can be completed by directly dividing by 255.
train_images, test_images = train_images / 255.0, test_images / 255.0
train_images.shape, test_images.shape, train_labels.shape, test_labels.shape

((60000, 28, 28), (10000, 28, 28), (60000,), (10000,))

4. Visualized pictures

plt.figure(figsize = (20,5))

for i in range(20):
    plt.subplot(2,10,i + 1) # subplot (number of subplot rows, number of subplot columns, subplot index (starting from 1))
    # Do not display axis scales
    plt.xticks([])
    plt.yticks([])
    # Do not display subgraph grid lines
    plt.grid(False)
    #cmap is the color map, "plt.cm.binary maps pixel values to black and white."
    plt.imshow(train_images[i], cmap = plt.cm.binary)
    #Set the x-axis label to display as the corresponding number
    plt.xlabel(train_labels[i])
    
plt.show()

5. Adjust image format

train_images = train_images.reshape((60000,28,28,1)) # For grayscale images, the number of channels is 1
test_images = test_images.reshape((10000,28,28,1))

test_images.shape,test_images.shape, train_labels.shape, test_labels.shape

((10000, 28, 28, 1), (10000, 28, 28, 1), (60000,), (10000,))

Building a CNN network model

The ReLU (rectified linear unit) activation function is widely used in neural networks for the following reasons:
Alleviating the vanishing gradient problem: In deep neural networks, ReLU can help alleviate the vanishing gradient problem. This is because the ReLU activation function is 0 when the input value is less than 0, and remains unchanged when it is greater than 0, so there is no gradient of 0, making it easy to solve the gradient during model training.
The calculation is simple and efficient: the implementation of the ReLU activation function only requires a max() function, which makes the model training process more efficient.
Powerful nonlinear fitting ability: The ReLU function has powerful nonlinear fitting ability, which can maximize the screening ability of neurons. Although looking at the left or right side of the ReLU function alone, it is indeed linear. Looking at it as a whole, it is indeed a piecewise function. It is barely reasonable to say that it is a nonlinear function.

It will not have a significant impact on the generalization accuracy of the model: The ReLU activation function will not have a negative impact on the generalization performance of the model while maintaining the model training speed.

model = models.Sequential([
    layers.Conv2D(32, (3, 3), activation='relu', input_shape = (28, 28, 1)),
    layers.MaxPool2D((2, 2)),
    layers.Conv2D(64, (3, 3),activation='relu'),
    layers.MaxPooling2D((2, 2)),
    
    layers.Flatten(),
    layers.Dense(64, activation='relu'),
    layers.Dense(10)
])
model.summary()

?

The syntax for using layers.Dense() is as follows:

layers.Dense(units, activation=None, use_bias=True, kernel_initializer=glorot_uniform’, bias_initializer=zeros’, kernel_regularizer=null, bias_regularizer=null, activity_regularizer=null, kernel_constraint=null, bias_constraint=null)
Theunits parameter represents the dimensions of the output space;
The activation parameter is used to specify the activation function, such as relu’, sigmoid’, etc.;
The use_bias parameter is a Boolean value used to determine whether to use the bias term;
Parameters such as kernel_initializer and bias_initializer are used to set the initialization method of weights and bias items;
Parameters such as kernel_regularizer and bias_regularizer are used to set the regularization method of weights and bias terms;
The activity_regularizer parameter is used to set the regularization method of the layer;
Parameters such as kernel_constraint and bias_constraint are used to set constraints on weights and bias items.

3. Compile model

model.compile() is a method in Keras used to configure the compilation process of the model.

model.compile(loss=None, optimizer=None, metrics=None, loss_weights=None, sample_weight_mode=None, weighted_metrics=None, target_tensors=None)

Parameter description:
1.loss: Loss function, used to measure the difference between the model prediction results and the true label. Commonly used loss functions include mean square error (MSE), cross entropy (categorical_crossentropy), etc.
2.optimizer: Optimizer, used to update the weights of the model to minimize the loss function. Commonly used optimizers include stochastic gradient descent (SGD), Adam, etc.
3.metrics: Evaluation indicators, used to measure the performance of the model. Commonly used evaluation indicators include accuracy, precision, recall, etc. A list containing multiple evaluation metrics can be passed in.
4.loss_weights: used to specify the loss weights of different samples. If this parameter is provided, the loss for each sample will be multiplied by the corresponding weight when calculating the total loss.
5.sample_weight_mode: Specifies the mode of sample weight. Optional values are “temporal” (time series mode) and “class_weight” (class weight mode).
6.weighted_metrics: Used to specify which evaluation indicators need to be weighted and averaged using weights. You can pass in a list containing the names of the evaluation metrics that need to be weighted.
7.target_tensors: used to specify target tensors. In some cases, such as multiple input models or custom output layers, it may be necessary to manually specify the target tensor.

model.compile(
     optimizer='adam',
     loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
     metrics=['accuracy'])

SparseCategoricalCrossentropy() is a method in Keras for calculating the sparse category cross-entropy loss function. It is suitable for processing sparse data sets, where the number of categories is relatively small.

4. Training model

model.fit() is the method used in Keras to train models. It receives some parameters, including training data, validation data, batch size, number of iterations, etc., and uses these parameters to update the weights of the model to minimize the loss function.
Its general usage is:

model.fit(x=None, y=None, batch_size=None, epochs=1, verbose=1, callbacks=None, validation_split=0.0, validation_data=None, shuffle=True, class_weight=None, sample_weight=None, initial_epoch= 0)

Parameter Description:

x: Characteristics of the training data. Can be a single input tensor or a list containing multiple input tensors.
y: target value of training data. Can be a single output tensor or a list containing multiple output tensors.
batch_size: The number of training samples in each batch.
epochs: The total number of iterations to train the model.
verbose: log display level. Optional values are 0 (display no information), 1 (display progress bar), and 2 (display detailed calculation information).
callbacks: A list of callback functions used to perform specific operations during training. Commonly used callback functions include ModelCheckpoint (save the model), EarlyStopping (stop training early), etc.
validation_split: Specifies the proportion of training data used for validation. For example, if set to 0.2, it means that 20% of the data will be used for validation.
validation_data: Characteristics and target values of validation data. Can be a single input tensor or a list containing multiple input tensors, and a single output tensor or a list containing multiple output tensors.
shuffle: Whether to shuffle the training data before each batch. Default is True.
class_weight: Class weight, used to deal with imbalanced data sets. Can be a dictionary mapping categories to corresponding weights.
sample_weight: Sample weight, used to weight different samples. Can be an array with the same shape as the training data.
initial_epoch: The number of iterations to start training from. The default is 0, which starts from the first iteration.

history = model.fit(
        train_images,
        train_labels,
        epochs = 10,
        validation_data = (test_images, test_labels))

Epoch 1/10
1875/1875 – 23s 12ms/step – loss: 0.1465 – accuracy: 0.9550 – val_loss: 0.0578 – val_accuracy: 0.9806
Epoch 2/10
1875/1875 – 25s 13ms/step – loss: 0.0491 – accuracy: 0.9850 – val_loss: 0.0362 – val_accuracy: 0.9882
Epoch 3/10
1875/1875 – 21s 11ms/step – loss: 0.0334 – accuracy: 0.9895 – val_loss: 0.0451 – val_accuracy: 0.9866
Epoch 4/10
1875/1875 – 20s 11ms/step – loss: 0.0112 – accuracy: 0.9966 – val_loss: 0.0364 – val_accuracy: 0.9899
Epoch 8/10
1875/1875 – 23s 12ms/step – loss: 0.0104 – accuracy: 0.9967 – val_loss: 0.0366 – val_accuracy: 0.9899
Epoch 9/10
1875/1875 – 22s 11ms/step – loss: 0.0078 – accuracy: 0.9974 – val_loss: 0.0501 – val_accuracy: 0.9877
Epoch 10/10
1875/1875 – 22s 12ms/step – loss: 0.0065 – accuracy: 0.9977 – val_loss: 0.0412 – val_accuracy: 0.9905

5. Prediction

plt.imshow(test_images[1])

pre = model.predict(test_images)
pre[1]

313/313 [==============================] – 2s 6ms/step
array([ 1.8541143, -3.2595177, 29.620302, -11.129237, -4.489938,
-20.663198 , 2.7213254, -6.5041213, -1.9739916, -24.11983 ],
dtype=float32)
?