Convolutional neural network clothing image classification

1. Use Keras and TensorFlow to load and initially understand the Fashion-MNIST data set

Import and preprocess the Fashion-MNIST dataset, and set the log level of TensorFlow.
Fashion-MNIST is a dataset containing a training set of 60,000 28×28 pixel clothing category images and a test set of 10,000 images.

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import datasets,layers,optimizers,Sequential,metrics
#Import data set management library, hierarchy, optimizer, fully connected layer container, test metric
import os #Set the content printed in the output box
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
# '2'The output column only prints error information, and other messy information is not printed.
 
#Import data set, array type
(x,y),(x_test,y_test) = datasets.fashion_mnist.load_data()
# View data set information
print(f'x.shape={<!-- -->x.shape}, y.shape={<!-- -->y.shape}') # Check the size of the training set xy
print(f'x_test.shape={<!-- -->x_test.shape}, y_test.shape={<!-- -->y_test.shape}') #View the size of the test set
print(f'y[:5]={y[:5]}') # View the first 5 data of y

First, import the required libraries, including TensorFlow, Keras, and os.

Next, we use the tf.keras.datasets.fashion_mnist.load_data() function to load the Fashion-MNIST dataset. This function returns four NumPy arrays: (x_train, y_train) and (x_test, y_test), where x_train and x_test code> contains the image data, and y_train and y_test contain the corresponding labels.

We then print the shapes of these arrays to get an idea of the size of the dataset. x_train and x_test are both in the shape of (60000, 28, 28), representing 60000 28×28 pixel images. y_train and y_test are both in the shape of (60000,), representing 60000 labels.

Finally, we view the first five tags by printing y[:5]. The label here is an integer, indicating the clothing category corresponding to the image, such as 0 indicating a T-shirt, 1 indicating pants, etc.

*View data set

# Dataset display
import matplotlib.pyplot as plt
import numpy as np
# Name of each category
class_names = ['Tshirt','Trouser','Pullover','Dress','Coat','Sandal','Shirt','Sneaker','Bag','Ankle boot']
# draw image
for i in range(0,10):
    plt.subplot(2,5,i + 1) #The current graph is drawn at the i + 1th position in row 2 and column 5
    plt.imshow(x[i])
    plt.xlabel(class_names[y[i]]) #y[i] represents the label value of the category it belongs to
    plt.xticks([]) # Do not display x and y axis coordinates
    plt.yticks([])
*Image data set display

A class_names list is defined, containing the names of 10 categories, which may correspond to different categories in the image dataset.

Use a for loop to iterate over the first 10 samples in the dataset (i from 0 to 9). For each sample, the code creates a subplot (using plt.subplot) and plots the image using the imshow function.

When the image is drawn, the code adds a label to the image showing the category it belongs to (using the xlabel function). The label name is obtained through y[i]. This code y is a simple image data set display example. It first imports matplotlib.pyplot and numpy for plotting and array operations.
Next, a class_names list is defined, containing the names of 10 categories, which may correspond to different categories in the image dataset.
Then, use a for loop to iterate over the first 10 samples in the dataset (i from 0 to 9). For each sample, the code creates a subplot (using plt.subplot) and plots the image using the imshow function.
When the image is drawn, the code adds a label to the image showing the category it belongs to (using the xlabel function). The label name is obtained via y[i], which is the category label of the image.
Can more intuitively understand the content and format of the data set

2. Use TensorFlow for image data set preprocessing and loading

# Data preprocessing function, convert data type
def processing(x,y):
    x = tf.cast(x,tf.float32)/255.0 #x data changes data type and normalizes
    y = tf.cast(y,tf.int32) # Change the data type of target y
    return(x,y)
 
# Preprocess the training set
y = tf.one_hot(y,depth=10) # One-hot encoding, converted into a vector of length 10, the value of the corresponding index becomes 1
ds_train = tf.data.Dataset.from_tensor_slices((x,y)) # Automatically convert x and y to tensor type
ds_train = ds_train.map(processing).batch(128).shuffle(10000) # Set each sampling size and shuffle
# Preprocess the test set
ds_test = tf.data.Dataset.from_tensor_slices((x_test,y_test))
ds_test = ds_test.map(processing).batch(128) #Test all the test set samples at once, no need to interrupt
 
# Generate an iterator to check whether the data is loaded correctly
sample = next(iter(ds_train)) # Run one batch at a time, that is, 128 data
print('x_batch:',sample[0].shape,'y_batch:',sample[1].shape) # Check how many are taken at one time

  1. processing function: This function is responsible for preprocessing the input data. For input x, it converts the data type to tf.float32 and divides by 255.0 for normalization. For input y, it converts the data type to tf.int32.

  2. Data loading: The code uses tf.data.Dataset.from_tensor_slices to create a dataset from tensor slices. For the training set, first use tf.one_hot to one-hot encode the target y, and then merge x and y into a single dataset. Next, use the map function to apply the processing function to each element of the data set for preprocessing, and then use the batch function to divide the data set into sizes of 128 batches, and use the shuffle function to shuffle the data set. For the test set, preprocessing and batch operations are also performed, but no disruption is required.

  3. Check data loading: Use the iter function to create an iterator, and then use the next function to retrieve a batch of data from the iterator. Finally, use the print function to check the shape of the data and confirm whether the data is loaded correctly.

3. Build a simple TensorFlow fully connected neural network

# ==1== Set fully connected layer
# [b,784]=>[b,256]=>[b,128]=>[b,64]=>[b,32]=>[b,10], the middle layer generally decreases from large to small dimension
model = Sequential([
    layers.Dense(256, activation=tf.nn.relu), #The first connection layer outputs 256 features
    layers.Dense(128, activation=tf.nn.relu), #Second connection layer
    layers.Dense(64, activation=tf.nn.relu), #The third connection layer
    layers.Dense(32, activation=tf.nn.relu), #The fourth connection layer
    layers.Dense(10), #The last layer does not require an activation function and outputs 10 categories
    ])
 
# ==2== Set input layer dimensions
model.build(input_shape=[None, 28*28])
# ==3== View network structure
model.summary()
# ==4== Optimizer
# Complete weight update w = w - lr * grad
optimizer = optimizers.Adam(lr=1e-3)
  1. Building the network: The code uses the Sequential class to build a neural network composed of multiple fully connected layers. First, each fully connected layer is created using the layers.Dense class, which specifies the number of neurons, activation function and other parameters. Finally, the output layer (layer 5) does not use an activation function because this is a multi-classification problem and the number of outputs of the output layer is equal to the number of categories (10 categories).
  2. Set input layer dimensions: Use the model.build function to set the input layer dimensions. Since the input is a 28×28 image, set the input shape to [None, 28*28], which represents any number of 28×28 images.
  3. View the network structure: Use the model.summary function to view the network structure, including the input shape, output shape, and number of parameters of each layer.
  4. Choosing an optimizer: An Adam optimizer was created using optimizers.Adam with a learning rate of 0.001. The optimizer is used to update the weights of the model for gradient descent optimization.

When training a neural network, this optimizer will update the model weight (w) according to the set learning rate (lr) based on the gradient (grad) calculated by the loss function, so that the model gradually converges to the optimal solution.

*Output results


Sequential model architecture and parameter information. This is a simple feedforward neural network containing 5 fully connected layers (Dense layers). Each layer takes the output of the previous layer as input and performs some transformation on the input. The fully connected layer here means that each hidden node is connected to all nodes in the previous layer.

4. Training and optimization

for epoch in range(20): # Run 20 times
    # Run each batch
    for step,(x,y) in enumerate(ds_train):
        # The shape of x in ds_train is [b, 28, 28]. Since the input layer is [b, 28*28], type conversion is required.
        x = tf.reshape(x, [-1, 28*28]) #-1 will automatically settle the 0th dimension
        # Gradient calculation
        with tf.GradientTape() as tape:
            # Automatic network operation: [b,784]=>[b,10]
            logits = model(x) #Get the output of the last layer
            # Calculate the mean square error, between the real value y (after onehot encoding) and the output result
            loss1 = tf.reduce_mean(tf.losses.MSE(y, logits))
            # Calculate the cross entropy loss, the real value y (after onehot encoding) and the output probability (logits will automatically perform softmax and become a probability value)
            loss2 = tf.reduce_mean(tf.losses.categorical_crossentropy(y, logits, from_logits=True))
            
        # Gradient calculation, the first dependent variable, the second independent variable, model.trainable_variables obtains all weights and bias parameters
        grads = tape.gradient(loss2, model.trainable_variables)
        #Update weights, zip combines elements of grads with elements in model.trainable_variables
        optimizer.apply_gradients(zip(grads, model.trainable_variables)) # Complete the task: w1.assign_sub(lr * grads[0])
         
        #Print the results after running a batch each time
        if step % 100 == 0:
            print(f'epochs:{<!-- -->epoch}, step:{<!-- -->step}, loss_MSE:{<!-- -->loss1}, loss_CE:{<!-- -->loss2}')

Forward propagation, computational loss, and backpropagation in TensorFlow.

  1. Looping through epochs and batches: The code uses a for loop to iterate over the data set. For each epoch, the entire data set (batch) is traversed. This helps the model gradually learn from large amounts of data and optimize its parameters.
  2. Reshaping the input: At each iteration, the input data x is reshaped from [b, 28, 28] to [b, 28*28] to match the input layer of the model .
  3. Calculating losses: The code calculates mean square error loss (MSE) and cross-entropy loss (CE) for loss optimization during training.
  4. Gradient calculation and update: Use tf.GradientTape to record the calculation process in order to calculate the gradient. An optimizer (Adam) is then used to update the model’s weights based on the gradients.
  5. Printing results: After each batch (step) is completed, the code will print out the current epoch, step number and loss value for visualization and debugging.

5. Conduct testing and evaluation

    total_correct = 0 #The total number of correct predictions
    total_sum = 0 #The number of total statistics
 
    for (x,y) in ds_test: #Return x and y of the test set
        
        # Change the shape of x from [b,28,28]=>[b,28*28]
        x = tf.reshape(x, [-1,28*28])
        # Calculate the output layer [b,10]
        logits = model(x)
        # Calculate the index of the value with the highest probability
        #Convert logits to probability
        prob = tf.nn.softmax(logits, axis=1) # Convert the probability in the last dimension, and the probability sum is 1
        predict = tf.argmax(prob, axis=1) # Find the location of the maximum value and get a scalar
        
        # y is int32 type, shape is [128]
        # predict is int64 type, shape is [128]
        predict = tf.cast(predict, dtype=tf.int32)
 
        # y is a vector, each element represents which category it belongs to; predict is also a vector, and the subscript value indicates which category it belongs to.
        # Just check whether the values of the two variables are the same
        correct = tf.equal(y, predict) #Return True and False
        # True and False become 1 and 0, count the number of 1s, and how many predictions are correct in total
        correct = tf.reduce_sum(tf.cast(correct, dtype=tf.int32))
        
        # The number of correct predictions, correct is tensor type, variable numpy type
        total_correct + = int(correct)
        total_sum + = x.shape[0] #0th dimension, how many pictures are there in each test
    
    # Calculate the model accuracy after a large cycle
    acc = total_correct/total_sum
    print(f'acc: {acc}')

The main purpose is to test the trained model and calculate the accuracy of the model on the test set.

First, the code defines two variables: the total number of correct predictions and the total number of samples. Then, traverse the test set and perform forward propagation on each sample to obtain the prediction result. Then, the predictions are compared with the actual labels, and the number of correct predictions is accumulated. At the same time, count the total number of samples.

The for loop traverses the test data set (ds_test), for each input sample x and corresponding label y:

  1. Adjust the shape of x from [b,28,28] to [b,28*28].
  2. Predict the output through the model.
  3. Convert predicted logits into probability values (probability).
  4. Use tf.argmax to find the index of the predicted label, which is the prediction result.
  5. Convert prediction result from int64 to int32.
  6. Calculate the difference between the predicted result and the actual label to get the correct tensor.
  7. Convert correct from boolean to integer and sum to get total_correct.
  8. Accumulate x.shape[0] to get total_sum.

Finally, for each sample in the test set, adjust its shape, find the predicted label through model prediction, calculate the number of correct predictions, and accumulate the number of pictures in each test to obtain the accuracy of the model on the test set.

*The model is accurate

In each epoch (cycle), the model will traverse the entire training data set once.
“acc: 0.8878” is the accuracy of the model on the training data set in the 19th epoch.

As epoch increases, the loss function value will decrease and the accuracy will increase, which means that the model is gradually optimizing.