[Pytorch] Computer Vision Project – Convolutional Neural Network CNN model recognition image classification

Directory

  • I. Introduction
  • 2. CNN visual interpreter
    • 1. Working principle of convolution layer
  • 3. Detailed step instructions
    • 1. Data set preparation
    • 2.DataLoader
    • 3. Build model CNN
      • 3.1 Set up the device
      • 3.2 Build CNN model
      • 3.3 Set loss and optimizer
      • 3.4 Training and testing loop
    • 4. Model evaluation and result output

1. Preface

The overall workflow of Pytorch was introduced in the previous note “[Pytorch] Detailed Explanation of the Overall Workflow Code (Getting Started)”. This article continues to explain how to use Pytorch to build a convolutional neural network (CNN model) to classify images.

Other related articles:
Introductory Notes on Deep Learning: Summarizes some basic concepts of neural networks.
TensorFlow column: “Computer Vision Introduction Series” introduces how to use the TensorFlow framework to implement a convolution classifier.

2. CNN visual interpreter

The convolution classifier filters and filters the imported images layer by layer, learns the characteristics of the graphics, and finally achieves classification, identification or prediction of the input data.

Below is a CNN interactive visual interpreter on github (link here).


As can be seen from the overall picture above, the CNN model is composed of multiple convolutional layers and pooling layers alternately stacked. The images are processed differently, and different features are extracted from each layer. Finally, the characteristics of each unit are Output summary, output classification.

1. Working principle of convolutional layer

It is equivalent to having a filter grid (called Convolution Kernel or Filter) that scans the entire input layer from left to right and top to bottom, and generates new layers.

Data will be compressed during the entire process, as shown in the figure below, a 3*3 graphic is compressed into a grid.

Parameter Description
Input input data, there is a 44 grid in the middle
Padding, a circle of grids added outside, adds a unit.
Kernel Size convolution kernel size: here is 3
3, the red grid on the left
Stride: how many grids the convolution kernel moves each time

3. Detailed step-by-step instructions

1. Data set preparation

import torch
from torch import nn
import torchvision
from torchvision import datasets
from torchvision.transforms import ToTensor
import matplotlib.pyplot as plt
print(f"Pytorch version:{<!-- -->torch.__version__}\
 torchvision version:{<!-- -->torchvision.__version__}")


Dataset introduction:
FashionMNIST is an image data set that comes with torchvision and is used for training and testing of machine learning and computer vision. It contains grayscale images of 10 different categories of clothing items, including T-shirts, pants, pullovers, skirts, jackets, sandals, shirts, sneakers, bags, and booties. The resolution of each picture is 28×28 pixels.

train_data=datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor(),
    target_transform=None
)

test_data=datasets.FashionMNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor(),
    target_transform=None
)

#Dataset view
image, label = train_data[0]
# image, label #View the first training data
image.shape #View the shape of the data


The shape of the image tensor is [1, 28, 28], or: [color=1, height=28, width=28]

# View categories
class_names = train_data.classes
class_names

#Graphic visualization
import matplotlib.pyplot as plt
image, label = train_data[0]
print(f"Image shape: {<!-- -->image.shape}")
plt.imshow(image.squeeze())
plt.title(label);

2.DataLoader

from torch.utils.data import DataLoader

# Set batch size hyperparameters
BATCH_SIZE = 32

# Convert the dataset to iterable (batch processing)
train_dataloader = DataLoader(train_data,
    batch_size=BATCH_SIZE, # How many samples are there in each batch?
    shuffle=True #Shuffle randomly?
)

test_dataloader = DataLoader(test_data,
    batch_size=BATCH_SIZE,
    shuffle=False # The test data set does not necessarily need to be shuffled
)

#Print results
print(f"Dataloaders: {<!-- -->train_dataloader, test_dataloader}")
print(f"Length of train dataloader: {<!-- -->len(train_dataloader)} batches of {<!-- -->BATCH_SIZE}")
print(f"Length of test dataloader: {<!-- -->len(test_dataloader)} batches of {<!-- -->BATCH_SIZE}")

Parameter introduction:
shuffle: refers to randomly shuffling the data set so that the data is presented in a random order when training the model. Doing so helpsimprove the model’s generalization ability and reduce the model’s dependence on the order of the input data. In contrast, it is usually set to False for the test data set because when evaluating model performance, we want to maintain the original order of the data to be able to correctly evaluate the model’s performance on real data.

train_features_batch, train_labels_batch = next(iter(train_dataloader))
train_features_batch.shape, train_labels_batch.shape

3. Build model CNN

3.1 Setting up the device

import torch
device = "cuda" if torch.cuda.is_available() else "cpu"
device

Run on GPU.

3.2 Build CNN model

Review the parameter settings of CNN:

  1. in_channels: The number of channels of the input data. For two-dimensional convolution, it indicates the depth or number of channels of the input image or feature map.
  2. out_channels: The number of output channels, that is, the number of convolution kernels. Each convolution kernel generates an output channel.
  3. kernel_size: The size of the convolution kernel or the size of the filter, expressed as an integer or tuple, specifying the height and width of the convolution kernel. kernel_size=3 means that the height and width of the convolution kernel are both 3.
  4. stride: The step size of the convolution kernel sliding, which determines the distance the convolution kernel slides on the input data. stride=1 means that the convolution kernel slides one step on the input each time.
  5. padding: Pad the number of layers of zeros around the input data. Padding helps keep input and output dimensions the same, especially when passing information between convolutional layers. padding=1 here means padding a layer of 0 around the input data to keep the size unchanged after the convolution operation.
# Create a convolutional neural network
class FashionMNISTModelV2(nn.Module):
    def __init__(self, input_shape: int, hidden_units: int, output_shape: int):
        super().__init__()
        self.block_1 = nn.Sequential(
            nn.Conv2d(in_channels=input_shape,
                      out_channels=hidden_units,
                      kernel_size=3,
                      stride=1,
                      padding=1),
            nn.ReLU(),
            nn.Conv2d(in_channels=hidden_units,
                      out_channels=hidden_units,
                      kernel_size=3,
                      stride=1,
                      padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2,
                         stride=2)
        )
        self.block_2 = nn.Sequential(
            nn.Conv2d(hidden_units, hidden_units, 3, padding=1),
            nn.ReLU(),
            nn.Conv2d(hidden_units, hidden_units, 3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2)
        )
        self.classifier = nn.Sequential(
            nn.Flatten(),
            nn.Linear(in_features=hidden_units*7*7,
                      out_features=output_shape)
        )

    def forward(self, x: torch.Tensor):
        x = self.block_1(x)
        # print(x.shape)
        x = self.block_2(x)
        # print(x.shape)
        x = self.classifier(x)
        # print(x.shape)
        return x

# Add parameters
torch.manual_seed(42)
model_2 = FashionMNISTModelV2(input_shape=1,
    hidden_units=10,
    output_shape=len(class_names)).to(device)
model_2

3.3 Set loss and optimizer

Import accurcay_fn auxiliary function file

import requests
from pathlib import Path

# Download helper functions from the Learn PyTorch repository (if not already downloaded)
if Path("helper_functions.py").is_file():
  print("helper_functions.py already exists, skip downloading")
else:
  print("Downloading helper_functions.py")
  # NOTE: You need to use the "raw" GitHub URL for this to work
  request = requests.get("https://raw.githubusercontent.com/mrdbourke/pytorch-deep-learning/main/helper_functions.py")
  with open("helper_functions.py", "wb") as f:
    f.write(request.content)

Create loss, accuracy and optimizer

from helper_functions import accuracy_fn

# Set loss and optimizer
loss_fn = nn.CrossEntropyLoss() # this is also called "criterion"/"cost function" in some places
optimizer = torch.optim.SGD(params=model_0.parameters(), lr=0.1)

Create a timer

from timeit import default_timer as timer
def print_train_time(start:float,end:float,device:torch.device=None):
  total_time=end-start
  print(f"Train time on {<!-- -->device}: {<!-- -->total_time:.3f} seconds")
  return total_time

3.4 Training and testing loop

def train_step(model: torch.nn.Module,
               data_loader: torch.utils.data.DataLoader,
               loss_fn: torch.nn.Module,
               optimizer: torch.optim.Optimizer,
               accuracy_fn,
               device: torch.device = device):
    train_loss, train_acc = 0, 0
    model.to(device)
    for batch, (X, y) in enumerate(data_loader):
        X, y = X.to(device), y.to(device)
        y_pred = model(X)
        loss = loss_fn(y_pred, y)
        train_loss + = loss
        train_acc + = accuracy_fn(y_true=y,
                                 y_pred=y_pred.argmax(dim=1))
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    train_loss /= len(data_loader)
    train_acc /= len(data_loader)
    print(f"Train loss: {<!-- -->train_loss:.5f} | Train accuracy: {<!-- -->train_acc:.2f}%")

def test_step(data_loader: torch.utils.data.DataLoader,
              model: torch.nn.Module,
              loss_fn: torch.nn.Module,
              accuracy_fn,
              device: torch.device = device):
    test_loss, test_acc = 0, 0
    model.to(device)
    model.eval() # put model in eval mode
    # Turn on inference context manager
    with torch.inference_mode():
        for X, y in data_loader:
            X, y = X.to(device), y.to(device)
            test_pred = model(X)

            test_loss + = loss_fn(test_pred, y)
            test_acc + = accuracy_fn(y_true=y,
                y_pred=test_pred.argmax(dim=1) # Go from logits -> pred labels
            )
        test_loss /= len(data_loader)
        test_acc /= len(data_loader)
        print(f"Test loss: {<!-- -->test_loss:.5f} | Test accuracy: {<!-- -->test_acc:.2f}%\
")
torch.manual_seed(42)

from timeit import default_timer as timer
train_time_start_model_2 = timer()

epochs = 3
for epoch in tqdm(range(epochs)):
    print(f"Epoch: {<!-- -->epoch}\
---------")
    train_step(data_loader=train_dataloader,
        model=model_2,
        loss_fn=loss_fn,
        optimizer=optimizer,
        accuracy_fn=accuracy_fn,
        device=device
    )
    test_step(data_loader=test_dataloader,
        model=model_2,
        loss_fn=loss_fn,
        accuracy_fn=accuracy_fn,
        device=device
    )

train_time_end_model_2 = timer()
total_train_time_model_2 = print_train_time(start=train_time_start_model_2,
                                           end=train_time_end_model_2,
                                           device=device)

4. Model evaluation and result output

torch.manual_seed(42)
def eval_model(model: torch.nn.Module,
               data_loader: torch.utils.data.DataLoader,
               loss_fn: torch.nn.Module,
               accuracy_fn,
               device: torch.device = device): #Note
    loss, acc = 0, 0
    model.eval()
    with torch.inference_mode():
        for X, y in data_loader:
            #Attention device transfer
            X, y = X.to(device), y.to(device)
            y_pred = model(X)
            loss + = loss_fn(y_pred, y)
            acc + = accuracy_fn(y_true=y, y_pred=y_pred.argmax(dim=1))

        
        loss /= len(data_loader)
        acc /= len(data_loader)
    return {<!-- -->"model_name": model.__class__.__name__,
            "model_loss": loss.item(),
            "model_acc": acc}


model_2_results = eval_model(
    model=model_2,
    data_loader=test_dataloader,
    loss_fn=loss_fn,
    accuracy_fn=accuracy_fn
)
model_2_results

Model output results: