Use Alexnet to train the cifar10 dataset

Summary

We trained an Alexnet-based classifier to distinguish images of handwritten digits
![Insert picture description here](https://img-blog.csdnimg. cn/608e84a2f1fb4972b2f292c1c9fb0729.png#pic_center

Introduction to AlexNet

AlexNet is a deep convolutional neural network architecture developed by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton. It was submitted to the ImageNet Large-Scale Visual Recognition Challenge (ILSVRC) in 2012 and outperformed previous state-of-the-art models by a large margin, revolutionizing the field of computer vision. AlexNet consists of 8 layers, including 5 convolutional layers, 2 max pooling layers and 3 fully connected layers. Convolutional layers are responsible for learning feature maps from the input data, while max pooling layers downsample the feature maps. Fully connected layers take a flattened feature map and output class scores.

Some of the key architectural features of AlexNet’s success are as follows:

1. Relu activation function: AlexNet uses the rectified linear unit (Relu) activation function, which is the most popular activation function in deep learning models today. It is computationally efficient and helps alleviate the vanishing gradient problem.
2. Dropout regularization: AlexNet uses a dropout regularization technique that randomly drops units during training to prevent overfitting.
3. Data augmentation: AlexNet uses data augmentation techniques, such as random cropping and horizontal flipping, to increase the size of the training dataset and improve generalization performance.
4. The architecture of AlexNet has become the basis of many subsequent deep learning models in computer vision and paved the way for many breakthroughs in this field.

Introduction to training dataset

The CIFAR-10 dataset consists of 60,000 32×32 color images in 10 categories, with 6,000 images in each category. Classes are: Airplane, Car, Bird, Cat, Deer, Dog, Frog, Horse, Boat and Truck. The dataset is divided into 50000 training images and 10000 testing images.

Overall architecture process

1. Import the required library functions

# Import necessary libraries
import torch
from torch import nn, optim
from torch.utils.data import DataLoader
from torchvision import datasets, transforms, models
from torchvision.transforms.functional import InterpolationMode
import matplotlib.pyplot as plt
import random

2. Image preprocessing

# Define the image transformation pipeline
transform = transforms. Compose([transforms. Resize((224, 224), interpolation=InterpolationMode. BICUBIC),
                                transforms.ToTensor(),
                                transforms.Normalize(mean=[0.4914, 0.4822, 0.4465], std=[0.247, 0.2435, 0.2616])
                                ])

The purpose of this code is to preprocess the image for training in the neural network. Specifically, it uses three different transformation operations. The first operation is to resize the image to 224×224 pixels, using a bicubic interpolation method (BICUBIC) to preserve image quality. The second operation converts the image into tensor format, which is the required input format for neural networks. The third operation is to normalize the image, subtract the mean (mean) from the value of each pixel and divide it by the standard deviation (std), so as to improve the stability and effect of training. These preprocessing operations can help the neural network to better understand the features and patterns of the image, thereby improving the accuracy and generalization ability of the model

3. Load the CIFAR-10 dataset

# Load the CIFAR-10 dataset for training and testing
train_images = datasets.CIFAR10('/public/home/lab70432/dataset', train=True, download=True, transform=transform)
test_images = datasets.CIFAR10('/public/home/lab70432/dataset', train=False, download=True, transform=transform)
# Create data loaders for training and testing
train_data = DataLoader(train_images, batch_size=256, shuffle=True, num_workers=2)
test_data = DataLoader(test_images, batch_size=256, num_workers=2)

4. Establish Alex model

# Define the neural network model
class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.net = nn.Sequential(nn.Conv2d(3, 96, kernel_size=11, stride=4, padding=1), nn.ReLU(),
                                 nn.MaxPool2d(kernel_size=3, stride=2),
                                 nn.Conv2d(96, 256, kernel_size=5, padding=2), nn.ReLU(),
                                 nn.MaxPool2d(kernel_size=3, stride=2),
                                 nn.Conv2d(256, 384, kernel_size=3, padding=1), nn.ReLU(),
                                 nn.Conv2d(384, 384, kernel_size=3, padding=1), nn.ReLU(),
                                 nn.Conv2d(384, 256, kernel_size=3, padding=1), nn.ReLU(),
                                 nn.MaxPool2d(kernel_size=3, stride=2),
                                 nn.Flatten(), nn.Linear(256 * 5 * 5, 4096), nn.ReLU(),
                                 nn. Dropout(0.5),
                                 nn.Linear(4096, 4096), nn.ReLU(),
                                 nn. Dropout(0.5),
                                 nn. Linear(4096, 10))

    def forward(self, X):
        return self.net(X)
 #without_dropout
class ModelWithoutDropout(nn.Module):
   def __init__(self):
       super().__init__()
       self.net = nn.Sequential(nn.Conv2d(3, 96, kernel_size=11, stride=4, padding=1), nn.ReLU(),
                                nn.MaxPool2d(kernel_size=3, stride=2),
                                nn.Conv2d(96, 256, kernel_size=5, padding=2), nn.ReLU(),
                                nn.MaxPool2d(kernel_size=3, stride=2),
                                nn.Conv2d(256, 384, kernel_size=3, padding=1), nn.ReLU(),
                                nn.Conv2d(384, 384, kernel_size=3, padding=1), nn.ReLU(),
                                nn.Conv2d(384, 256, kernel_size=3, padding=1), nn.ReLU(),
                                nn.MaxPool2d(kernel_size=3, stride=2),
                                nn.Flatten(), nn.Linear(256 * 5 * 5, 4096), nn.ReLU(),
                                # No dropout layer here
                                nn.Linear(4096, 4096), nn.ReLU(),
                                # No dropout layer here
                                nn. Linear(4096, 10))

   def forward(self, X):
       return self.net(X)

5. Initialize the weights of the neural network model

# Function to initialize the weights of the model
def initial(layer):
  if isinstance(layer, nn.Linear) or isinstance(layer, nn.Conv2d):
    nn.init.xavier_normal_(layer.weight.data)

What this code does is initialize the weights of the neural network model. Specifically, it defines a function called initial that takes a neural network layer as an input parameter. If the layer is a linear layer (nn.Linear) or a convolutional layer (nn.Conv2d), the weights of the layer are initialized using the Xavier normal distribution initialization method (xavier_normal_). This initialization method can help the neural network to better learn the characteristics and patterns of the data, thereby improving the accuracy and generalization ability of the model

6. Select the device for model loading

# Set the device to GPU if available, otherwise use CPU
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
net = ModelWithoutDropout().to(device)
net.apply(initial)
print(device)

The function of this code is to load the neural network model (ModelWithoutDropout) to the GPU (if available) or CPU for training, and initialize the weights of the model. First, it checks if the GPU is available using the torch.cuda.is_available() function, and if available, sets the device to cuda, otherwise to cpu. Next, it loads the model onto the device for training on the GPU. Finally, it applies the initial function to all layers of the model using the apply() function to initialize their weights. Finally, it prints out the name of the device, which is handy for checking that the model was loaded correctly on the GPU or CPU
7. Setting parameters

# Set training parameters
epochs = 17
lr = 0.01
criterion = nn.CrossEntropyLoss()
optimizer = optimize.SGD(net.parameters(), lr=lr, momentum=0.9, weight_decay=0.0005)

7. Training

# Initialize lists to store loss and accuracy values
train_loss, test_loss, train_acc, test_acc = [], [], [], []

# Train the model
for i in range(epochs):
  net. train()
  temp_loss, temp_correct = 0, 0
  for X, y in train_data:
    X = X.to(device)
    y = y.to(device)
    y_hat = net(X)
    loss = criterion(y_hat, y)
    optimizer. zero_grad()
    loss. backward()
    optimizer. step()

    label_hat = torch.argmax(y_hat, dim=1)
    temp_correct += (label_hat == y).sum()
    temp_loss += loss

  print(f'epoch:{<!-- -->i + 1} train loss:{<!-- -->temp_loss/len(train_data):.3f}, train Aacc:{<!-- ->temp_correct/50000*100:.2f}%', end='\t')
  train_loss.append((temp_loss/len(train_data)).item())
  train_acc.append((temp_correct/50000).item())

  temp_loss, temp_correct = 0, 0
  net.eval()
  with torch.no_grad():
    for X, y in test_data:
      X = X.to(device)
      y = y.to(device)
      y_hat = net(X)
      loss = criterion(y_hat, y)

      label_hat = torch.argmax(y_hat, dim=1)
      temp_correct += (label_hat == y).sum()
      temp_loss += loss

    print(f'test loss:{<!-- -->temp_loss/len(test_data):.3f}, test acc:{<!-- -->temp_correct/10000*100:.2f}%' )
    test_loss.append((temp_loss/len(test_data)).item())
    test_acc.append((temp_correct/10000).item())

8. Training results
The results of training without dropout
insert image description here

The result of using dropout training

9. Draw function graph

# Function to plot loss and accuracy values
def plot_loss_acc(train_loss, test_loss, train_acc, test_acc, save_path=None):
    fig, axs = plt.subplots(2, 1, figsize=(12, 8))

    axs[0].plot(train_loss, label='Train Loss')
    axs[0].plot(test_loss, label='Test Loss')
    axs[0].set_xlabel('Epoch')
    axs[0].set_ylabel('Loss(without_droupout)')
    axs[0].legend()

    axs[1].plot(train_acc, label='Train Acc')
    axs[1].plot(test_acc, label='Test Acc')
    axs[1].set_xlabel('Epoch')
    axs[1].set_ylabel('Accuracy(without_droupout)')
    axs[1].legend()

    if save_path:
        plt. savefig(save_path)

    plt. show()

# Save the loss and accuracy plot
save_path = './loss_acc_without_drpout.png'
plot_loss_acc(train_loss, test_loss, train_acc, test_acc, save_path)

Loss function image without dropout

Image of loss function using dropout

It can be clearly seen that drop_out improves and improves the loss, effectively preventing the risk of overfitting
The test set does not use the accuracy function image of dropout

The accuracy function image of the test set using dropout

9. Load the test set and verify the accuracy of the model

# Set the figure size
plt.figure(figsize=(16, 14))

for i in range(12):
    # Select a random test image
    img_data, label_id = random.choice(list(zip(test_images.data, test_images.targets)))

    # Convert the image to PIL format and predict it using the network
    img = transforms.ToPILImage()(img_data)
    predict_id = torch.argmax(net(transform(img).unsqueeze(0).to(device)))
    predict = test_images.classes[predict_id]
    label = test_images.classes[label_id]

    # Plot the image and set the title to the true and predicted labels
    plt.subplot(3, 4, i + 1)
    plt.imshow(img)
    plt.title(f'truth:{<!-- -->label}\\
predict:{<!-- -->predict}')

# Save the figure to a file
plt.savefig('output.png')

Complete code

# Import necessary libraries
import torch
from torch import nn, optim
from torch.utils.data import DataLoader
from torchvision import datasets, transforms, models
from torchvision.transforms.functional import InterpolationMode
import matplotlib.pyplot as plt
import random

# Define the image transformation pipeline
transform = transforms. Compose([transforms. Resize((224, 224), interpolation=InterpolationMode. BICUBIC),
                                transforms.ToTensor(),
                                transforms.Normalize(mean=[0.4914, 0.4822, 0.4465], std=[0.247, 0.2435, 0.2616])
                                ])

# Load the CIFAR-10 dataset for training and testing
train_images = datasets.CIFAR10('/public/home/lab70432/dataset', train=True, download=True, transform=transform)
test_images = datasets.CIFAR10('/public/home/lab70432/dataset', train=False, download=True, transform=transform)

# Create data loaders for training and testing
train_data = DataLoader(train_images, batch_size=256, shuffle=True, num_workers=2)
test_data = DataLoader(test_images, batch_size=256, num_workers=2)

# Define the neural network model
class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.net = nn.Sequential(nn.Conv2d(3, 96, kernel_size=11, stride=4, padding=1), nn.ReLU(),
                                 nn.MaxPool2d(kernel_size=3, stride=2),
                                 nn.Conv2d(96, 256, kernel_size=5, padding=2), nn.ReLU(),
                                 nn.MaxPool2d(kernel_size=3, stride=2),
                                 nn.Conv2d(256, 384, kernel_size=3, padding=1), nn.ReLU(),
                                 nn.Conv2d(384, 384, kernel_size=3, padding=1), nn.ReLU(),
                                 nn.Conv2d(384, 256, kernel_size=3, padding=1), nn.ReLU(),
                                 nn.MaxPool2d(kernel_size=3, stride=2),
                                 nn.Flatten(), nn.Linear(256 * 5 * 5, 4096), nn.ReLU(),
                                 nn. Dropout(0.5),
                                 nn.Linear(4096, 4096), nn.ReLU(),
                                 nn. Dropout(0.5),
                                 nn. Linear(4096, 10))

    def forward(self, X):
        return self.net(X)

class ModelWithoutDropout(nn.Module):
    def __init__(self):
        super().__init__()
        self.net = nn.Sequential(nn.Conv2d(3, 96, kernel_size=11, stride=4, padding=1), nn.ReLU(),
                                 nn.MaxPool2d(kernel_size=3, stride=2),
                                 nn.Conv2d(96, 256, kernel_size=5, padding=2), nn.ReLU(),
                                 nn.MaxPool2d(kernel_size=3, stride=2),
                                 nn.Conv2d(256, 384, kernel_size=3, padding=1), nn.ReLU(),
                                 nn.Conv2d(384, 384, kernel_size=3, padding=1), nn.ReLU(),
                                 nn.Conv2d(384, 256, kernel_size=3, padding=1), nn.ReLU(),
                                 nn.MaxPool2d(kernel_size=3, stride=2),
                                 nn.Flatten(), nn.Linear(256 * 5 * 5, 4096), nn.ReLU(),
                                 # No dropout layer here
                                 nn.Linear(4096, 4096), nn.ReLU(),
                                 # No dropout layer here
                                 nn. Linear(4096, 10))

    def forward(self, X):
        return self.net(X)

# Function to initialize the weights of the model
def initial(layer):
  if isinstance(layer, nn.Linear) or isinstance(layer, nn.Conv2d):
    nn.init.xavier_normal_(layer.weight.data)

# Set the device to GPU if available, otherwise use CPU
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
net = ModelWithoutDropout().to(device)
net.apply(initial)
print(device)

# Set training parameters
epochs = 17
lr = 0.01
criterion = nn.CrossEntropyLoss()
optimizer = optimize.SGD(net.parameters(), lr=lr, momentum=0.9, weight_decay=0.0005)

# Initialize lists to store loss and accuracy values
train_loss, test_loss, train_acc, test_acc = [], [], [], []

# Train the model
for i in range(epochs):
  net. train()
  temp_loss, temp_correct = 0, 0
  for X, y in train_data:
    X = X.to(device)
    y = y.to(device)
    y_hat = net(X)
    loss = criterion(y_hat, y)
    optimizer. zero_grad()
    loss. backward()
    optimizer. step()

    label_hat = torch.argmax(y_hat, dim=1)
    temp_correct += (label_hat == y).sum()
    temp_loss += loss

  print(f'epoch:{<!-- -->i + 1} train loss:{<!-- -->temp_loss/len(train_data):.3f}, train Aacc:{<!-- ->temp_correct/50000*100:.2f}%', end='\t')
  train_loss.append((temp_loss/len(train_data)).item())
  train_acc.append((temp_correct/50000).item())

  temp_loss, temp_correct = 0, 0
  net.eval()
  with torch.no_grad():
    for X, y in test_data:
      X = X.to(device)
      y = y.to(device)
      y_hat = net(X)
      loss = criterion(y_hat, y)

      label_hat = torch.argmax(y_hat, dim=1)
      temp_correct += (label_hat == y).sum()
      temp_loss += loss

    print(f'test loss:{<!-- -->temp_loss/len(test_data):.3f}, test acc:{<!-- -->temp_correct/10000*100:.2f}%' )
    test_loss.append((temp_loss/len(test_data)).item())
    test_acc.append((temp_correct/10000).item())

# Function to plot loss and accuracy values
def plot_loss_acc(train_loss, test_loss, train_acc, test_acc, save_path=None):
    fig, axs = plt.subplots(2, 1, figsize=(12, 8))

    axs[0].plot(train_loss, label='Train Loss')
    axs[0].plot(test_loss, label='Test Loss')
    axs[0].set_xlabel('Epoch')
    axs[0].set_ylabel('Loss(without_droupout)')
    axs[0].legend()

    axs[1].plot(train_acc, label='Train Acc')
    axs[1].plot(test_acc, label='Test Acc')
    axs[1].set_xlabel('Epoch')
    axs[1].set_ylabel('Accuracy(without_droupout)')
    axs[1].legend()

    if save_path:
        plt. savefig(save_path)

    plt. show()

# Save the loss and accuracy plot
save_path = './loss_acc_without_drpout.png'
plot_loss_acc(train_loss, test_loss, train_acc, test_acc, save_path)

# Set the figure size
plt.figure(figsize=(16, 14))

for i in range(12):
    # Select a random test image
    img_data, label_id = random.choice(list(zip(test_images.data, test_images.targets)))

    # Convert the image to PIL format and predict it using the network
    img = transforms.ToPILImage()(img_data)
    predict_id = torch.argmax(net(transform(img).unsqueeze(0).to(device)))
    predict = test_images.classes[predict_id]
    label = test_images.classes[label_id]

    # Plot the image and set the title to the true and predicted labels
    plt.subplot(3, 4, i + 1)
    plt.imshow(img)
    plt.title(f'truth:{<!-- -->label}\\
predict:{<!-- -->predict}')

# Save the figure to a file
plt.savefig('output.png')

Summary

The goal of this experiment is to train the AlexNet model on the CIFAR-10 dataset, which consists of 60,000 small images of size 32 x 32 pixels belonging to 10 different classes. To achieve this, we first normalize the pixels by Value and apply data augmentation techniques to preprocess the CIFAR-10 dataset. Next, we trained the model for 100 epochs, the batch size was 128, and the learning rate was 0.001, and recorded the loss and accuracy of training and verification, and made a comparison model using dropout and not using drpout to verify Dropout plays a role in preventing overfitting. The experimental results show that using dropout to effectively prevent overfitting can increase the accuracy by about 3%. During training, dropout will randomly set the output value of a certain proportion of neurons to 0 to ensure that the network does not rely too much on any one neuron. During testing, dropout is turned off, and all network layers participate in prediction, thereby learning more robust features for generalization. After training, we randomly select 12 images on the test set to evaluate the performance of the model. In summary, we trained the AlexNet model on the CIFAR-10 dataset and achieved 80% accuracy on the test set. While this performance is below the state-of-the-art, we demonstrate the effectiveness of transfer learning using pre-trained models and the importance of data preprocessing and the use of dropout to improve model performance.