Convolutional Neural Network-Clothing Image Classification

Introduction:
The code for this tutorial uses a neural network to train an image classification machine learning model that can classify handwritten digits 0-9. It does this by creating a neural network. The neural network takes as input the pixel values of a “28 pixel x 28 pixel” image and outputs a list of 10 probabilities, one for each number to be classified. Here’s an example of what the data looks like.

1. Data acquisition

Build neural networks using data sets inside the system. First import the required library files, and save the images and targets of the training set in x and y. Save the images and targets required for the test set in x_test and y_test. (x, y) and (x_test, y_test) are both array types.

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import datasets,layers,optimizers,Sequential,metrics
#Import data set management library, hierarchy, optimizer, fully connected layer container, test metric
import os #Set the content printed in the output box
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
# '2'The output column only prints error information, and other messy information is not printed.
 
#Import data set, array type
(x,y),(x_test,y_test) = datasets.fashion_mnist.load_data()
# View data set information
print(f'x.shape={x.shape}, y.shape={y.shape}') # Check the size of the training set xy
print(f'x_test.shape={x_test.shape}, y_test.shape={y_test.shape}') #View the size of the test set
print(f'y[:5]={y[:5]}') # View the first 5 data of y

The data set information is as follows, for example, y[:5]=[9 0 0 3 0], the first picture belongs to the 10th category, and the second picture belongs to the 1st category.

x.shape=(60000, 28, 28), y.shape=(60000,)#There are 60000 pictures in variable x,
#The size of each picture is 28*28, and the variable y stores the category to which each picture belongs.
x_test.shape=(10000, 28, 28), y_test.shape=(10000,)
y[:5]=[9 0 0 3 0]

We view the first five tags by printing y[:5]. The label here is an integer, indicating the clothing category corresponding to the image, such as 0 indicating a T-shirt, 1 indicating pants, etc.

Picture information draw

# Dataset display
import matplotlib.pyplot as plt
import numpy as np
# Name of each category
class_names = ['Tshirt','Trouser','Pullover','Dress','Coat','Sandal','Shirt','Sneaker' ,'Bag','Ankle boot']
# draw image
for i in range(0,10):
    plt.subplot(2,5,i + 1) #The current graph is drawn at the i + 1th position in row 2 and column 5
    plt.imshow(x[i])
    plt.xlabel(class_names[y[i]]) #y[i] represents the label value of the category it belongs to
    plt.xticks([]) # Do not display x and y axis coordinates
    plt.yticks([])

2. Image data set preprocessing and loading

# Data preprocessing function, convert data type
def processing(x,y):
    x = tf.cast(x,tf.float32)/255.0 #x data changes data type and normalizes
    y = tf.cast(y,tf.int32) # Change the data type of target y
    return(x,y)
 
# Preprocess the training set
y = tf.one_hot(y,depth=10) # One-hot encoding, converted into a vector of length 10, the value of the corresponding index becomes 1
ds_train = tf.data.Dataset.from_tensor_slices((x,y)) # Automatically convert x and y to tensor type
ds_train = ds_train.map(processing).batch(128).shuffle(10000) # Set each sampling size and shuffle
# Preprocess the test set
ds_test = tf.data.Dataset.from_tensor_slices((x_test,y_test))
ds_test = ds_test.map(processing).batch(128) #Test all the test set samples at once, no need to interrupt
 
# Generate an iterator to check whether the data is loaded correctly
sample = next(iter(ds_train)) # Run one batch at a time, that is, 128 data
print('x_batch:',sample[0].shape,'y_batch:',sample[1].shape) # Check how many are taken at one time

When loading data, we first need to perform one-hot encoding on the target value y of the training set, which is helpful for calculating the loss of subsequent prediction results. Convert a scalar into a vector, and its corresponding value becomes 1. For example, y[0]=9 before encoding, the category corresponding to the first picture is category 9, and y[0]=[0., 0., 0., 0., 0., 0., 0. after encoding. , 0., 0., 1.], the value corresponding to the ninth index becomes 1, and the other values are 0.

Use ds_train to create a data set and convert the automatically input x and y into tensor types.

Use the map() function to execute the function content on all elements in the data set, use the batch() function to specify how many data to take from the data set in each generation, shuffle() disrupts the data set, but does not change the xy correspondence to avoid the results. chance.

When preprocessing the training set, there is no need to perform one-hot encoding on the y_test data, because no matter how the encoding is performed, the test result will always be the same value. Just compare it with y_test to see if the test is the same.

3. Build a simple fully connected neural network

# ==1== Set fully connected layer
# [b,784]=>[b,256]=>[b,128]=>[b,64]=>[b,32]=>[b,10], the middle layer generally decreases from large to small dimension
model = Sequential([
    layers.Dense(256, activation=tf.nn.relu), #The first connection layer outputs 256 features
    layers.Dense(128, activation=tf.nn.relu), #Second connection layer
    layers.Dense(64, activation=tf.nn.relu), #The third connection layer
    layers.Dense(32, activation=tf.nn.relu), #The fourth connection layer
    layers.Dense(10), #The last layer does not require an activation function and outputs 10 categories
    ])
 
# ==2== Set input layer dimensions
model.build(input_shape=[None, 28*28])
# ==3== View network structure
model.summary()
# ==4== Optimizer
# Complete weight update w = w - lr * grad
optimizer = optimizers.Adam(lr=1e-3)

1. Build the network: The code uses the Sequential class to build a neural network composed of multiple fully connected layers. First, each fully connected layer is created using the layers.Dense class, which specifies the number of neurons, activation function and other parameters. Finally, the output layer (layer 5) does not use an activation function because this is a multi-classification problem and the number of outputs of the output layer is equal to the number of categories (10 categories).
2. Set the input layer dimensions: Use the model.build function to set the input layer dimensions. Since the input is a 28×28 image, set the input shape to [None, 28*28], which represents any number of 28×28 images.
3. View the network structure: Use the model.summary function to view the network structure, including the input shape, output shape, and number of parameters of each layer.
4. Select the optimizer: Use optimizers.Adam to create an Adam optimizer with a learning rate of 0.001. The optimizer is used to update the weights of the model for gradient descent optimization.
When training a neural network, this optimizer will update the model weight (w) according to the set learning rate (lr) based on the gradient (grad) calculated by the loss function, so that the model gradually converges to the optimal solution.

Sequential model architecture and parameter information. This is a simple feedforward neural network containing 5 fully connected layers (Dense layers). Each layer takes the output of the previous layer as input and performs some transformation on the input. The fully connected layer here means that each hidden node is connected to all nodes in the previous layer.

4. Training and optimization

for epoch in range(20): # Run 20 times
    # Run each batch
    for step,(x,y) in enumerate(ds_train):
        # The shape of x in ds_train is [b, 28, 28]. Since the input layer is [b, 28*28], type conversion is required.
        x = tf.reshape(x, [-1, 28*28]) #-1 will automatically settle the 0th dimension
        # Gradient calculation
        with tf.GradientTape() as tape:
            # Automatic network operation: [b,784]=>[b,10]
            logits = model(x) #Get the output of the last layer
            # Calculate the mean square error, between the real value y (after onehot encoding) and the output result
            loss1 = tf.reduce_mean(tf.losses.MSE(y, logits))
            # Calculate the cross entropy loss, the real value y (after onehot encoding) and the output probability (logits will automatically perform softmax and become a probability value)
            loss2 = tf.reduce_mean(tf.losses.categorical_crossentropy(y, logits, from_logits=True))
            
        # Gradient calculation, the first dependent variable, the second independent variable, model.trainable_variables obtains all weights and bias parameters
        grads = tape.gradient(loss2, model.trainable_variables)
        #Update weights, zip combines elements of grads with elements in model.trainable_variables
        optimizer.apply_gradients(zip(grads, model.trainable_variables)) # Complete the task: w1.assign_sub(lr * grads[0])
         
        #Print the results after running a batch each time
        if step % 100 == 0:
            print(f'epochs:{epoch}, step:{step}, loss_MSE:{loss1}, loss_CE:{loss2}')

Forward propagation, computational loss, and backpropagation in TensorFlow.

1. Loop through epoch and batch: The code uses a for loop to traverse the data set. For each epoch, the entire data set (batch) is traversed. This helps the model gradually learn from large amounts of data and optimize its parameters.
2. Reshape the input: In each iteration, the shape of the input data x is adjusted from [b, 28, 28] to [b, 28*28] to match the input layer of the model.
3. Calculate loss: The code calculates mean square error loss (MSE) and cross entropy loss (CE) for loss optimization during training.
4. Gradient calculation and update: Use tf.GradientTape to record the calculation process in order to calculate the gradient. An optimizer (Adam) is then used to update the model’s weights based on the gradients.
5. Print results: After each batch (step) is completed, the code will print out the current epoch, step number and loss value for visualization and debugging.

5. Conduct testing and evaluation

    total_correct = 0 #The total number of correct predictions
    total_sum = 0 #The number of total statistics
 
    for (x,y) in ds_test: #Return x and y of the test set
        
        # Change the shape of x from [b,28,28]=>[b,28*28]
        x = tf.reshape(x, [-1,28*28])
        # Calculate the output layer [b,10]
        logits = model(x)
        # Calculate the index of the value with the highest probability
        #Convert logits to probability
        prob = tf.nn.softmax(logits, axis=1) # Convert the probability in the last dimension, and the probability sum is 1
        predict = tf.argmax(prob, axis=1) # Find the location of the maximum value and get a scalar
        
        # y is int32 type, shape is [128]
        # predict is int64 type, shape is [128]
        predict = tf.cast(predict, dtype=tf.int32)
 
        # y is a vector, each element represents which category it belongs to; predict is also a vector, and the subscript value indicates which category it belongs to.
        # Just check whether the values of the two variables are the same
        correct = tf.equal(y, predict) #Return True and False
        # True and False become 1 and 0, count the number of 1s, and how many predictions are correct in total
        correct = tf.reduce_sum(tf.cast(correct, dtype=tf.int32))
        
        # The number of correct predictions, correct is tensor type, variable numpy type
        total_correct + = int(correct)
        total_sum + = x.shape[0] #0th dimension, how many pictures are there in each test
    
    # Calculate the model accuracy after a large cycle
    acc = total_correct/total_sum
    print(f'acc: {acc}')

The main purpose is to test the trained model and calculate the accuracy of the model on the test set.

First, the code defines two variables: the total number of correct predictions and the total number of samples. Then, traverse the test set and perform forward propagation on each sample to obtain the prediction result. Then, the predictions are compared with the actual labels, and the number of correct predictions is accumulated. At the same time, count the total number of samples.

The for loop traverses the test data set (ds_test), for each input sample x and corresponding label y:

1. Adjust the shape of x from [b,28,28] to [b,28*28].
2. Predict the output through the model.
3. Convert predicted logits into probability values (probability).
4. Use tf.argmax to find the index of the predicted label, which is the prediction result.
5. Convert the prediction result from int64 to int32.
6. Calculate the difference between the predicted result and the actual label to obtain the correct tensor.
7. Convert correct from boolean to integer and sum to get total_correct.
8. Accumulate x.shape[0] to get total_sum.
Finally, for each sample in the test set, adjust its shape, find the predicted label through model prediction, calculate the number of correct predictions, and accumulate the number of pictures in each test to obtain the accuracy of the model on the test set.

*Model is accurate
In each epoch (cycle), the model will traverse the entire training data set once.
“acc: 0.8878” is the accuracy of the model on the training data set in the 19th epoch.

As epoch increases, the loss function value will decrease and the accuracy will increase, which means that the model is gradually optimizing.

The knowledge points of the article match the official knowledge archives, and you can further learn relevant knowledge. Python introductory skill treeArtificial intelligenceSupervised learning based on Python 386,678 people are learning the system