Directory
- I. Introduction
- 2. CNN visual interpreter
-
- 1. Working principle of convolution layer
- 3. Detailed step instructions
-
- 1. Data set preparation
- 2.DataLoader
- 3. Build model CNN
-
- 3.1 Set up the device
- 3.2 Build CNN model
- 3.3 Set loss and optimizer
- 3.4 Training and testing loop
- 4. Model evaluation and result output
1. Preface
The overall workflow of Pytorch was introduced in the previous note “[Pytorch] Detailed Explanation of the Overall Workflow Code (Getting Started)”. This article continues to explain how to use Pytorch to build a convolutional neural network (CNN model) to classify images.
Other related articles:
Introductory Notes on Deep Learning: Summarizes some basic concepts of neural networks.
TensorFlow column: “Computer Vision Introduction Series” introduces how to use the TensorFlow framework to implement a convolution classifier.
2. CNN visual interpreter
The convolution classifier filters and filters the imported images layer by layer, learns the characteristics of the graphics, and finally achieves classification, identification or prediction of the input data.
Below is a CNN interactive visual interpreter on github (link here).
As can be seen from the overall picture above, the CNN model is composed of multiple convolutional layers and pooling layers alternately stacked. The images are processed differently, and different features are extracted from each layer. Finally, the characteristics of each unit are Output summary, output classification.
1. Working principle of convolutional layer
It is equivalent to having a filter grid (called Convolution Kernel or Filter) that scans the entire input layer from left to right and top to bottom, and generates new layers.
Data will be compressed during the entire process, as shown in the figure below, a 3*3 graphic is compressed into a grid.
Parameter Description
Input input data, there is a 44 grid in the middle
Padding, a circle of grids added outside, adds a unit.
Kernel Size convolution kernel size: here is 33, the red grid on the left
Stride: how many grids the convolution kernel moves each time
3. Detailed step-by-step instructions
1. Data set preparation
import torch from torch import nn import torchvision from torchvision import datasets from torchvision.transforms import ToTensor import matplotlib.pyplot as plt print(f"Pytorch version:{<!-- -->torch.__version__}\ torchvision version:{<!-- -->torchvision.__version__}")
Dataset introduction:
FashionMNIST is an image data set that comes with torchvision
and is used for training and testing of machine learning and computer vision. It contains grayscale images of 10 different categories of clothing items, including T-shirts, pants, pullovers, skirts, jackets, sandals, shirts, sneakers, bags, and booties. The resolution of each picture is 28×28 pixels.
train_data=datasets.FashionMNIST( root="data", train=True, download=True, transform=ToTensor(), target_transform=None ) test_data=datasets.FashionMNIST( root="data", train=False, download=True, transform=ToTensor(), target_transform=None ) #Dataset view image, label = train_data[0] # image, label #View the first training data image.shape #View the shape of the data
The shape of the image tensor is [1, 28, 28], or: [color=1, height=28, width=28]
# View categories class_names = train_data.classes class_names
#Graphic visualization import matplotlib.pyplot as plt image, label = train_data[0] print(f"Image shape: {<!-- -->image.shape}") plt.imshow(image.squeeze()) plt.title(label);
2.DataLoader
from torch.utils.data import DataLoader # Set batch size hyperparameters BATCH_SIZE = 32 # Convert the dataset to iterable (batch processing) train_dataloader = DataLoader(train_data, batch_size=BATCH_SIZE, # How many samples are there in each batch? shuffle=True #Shuffle randomly? ) test_dataloader = DataLoader(test_data, batch_size=BATCH_SIZE, shuffle=False # The test data set does not necessarily need to be shuffled ) #Print results print(f"Dataloaders: {<!-- -->train_dataloader, test_dataloader}") print(f"Length of train dataloader: {<!-- -->len(train_dataloader)} batches of {<!-- -->BATCH_SIZE}") print(f"Length of test dataloader: {<!-- -->len(test_dataloader)} batches of {<!-- -->BATCH_SIZE}")
Parameter introduction:
shuffle
: refers to randomly shuffling the data set so that the data is presented in a random order when training the model. Doing so helpsimprove the model’s generalization ability and reduce the model’s dependence on the order of the input data. In contrast, it is usually set to False for the test data set because when evaluating model performance, we want to maintain the original order of the data to be able to correctly evaluate the model’s performance on real data.
train_features_batch, train_labels_batch = next(iter(train_dataloader)) train_features_batch.shape, train_labels_batch.shape
3. Build model CNN
3.1 Setting up the device
import torch device = "cuda" if torch.cuda.is_available() else "cpu" device
Run on GPU.
3.2 Build CNN model
Review the parameter settings of CNN:
in_channels
: The number of channels of the input data. For two-dimensional convolution, it indicates the depth or number of channels of the input image or feature map.out_channels
: The number of output channels, that is, the number of convolution kernels. Each convolution kernel generates an output channel.kernel_size
: The size of the convolution kernel or the size of the filter, expressed as an integer or tuple, specifying the height and width of the convolution kernel.kernel_size=3
means that the height and width of the convolution kernel are both 3.stride
: The step size of the convolution kernel sliding, which determines the distance the convolution kernel slides on the input data.stride=1
means that the convolution kernel slides one step on the input each time.padding
: Pad the number of layers of zeros around the input data. Padding helps keep input and output dimensions the same, especially when passing information between convolutional layers.padding=1
here means padding a layer of 0 around the input data to keep the size unchanged after the convolution operation.
# Create a convolutional neural network class FashionMNISTModelV2(nn.Module): def __init__(self, input_shape: int, hidden_units: int, output_shape: int): super().__init__() self.block_1 = nn.Sequential( nn.Conv2d(in_channels=input_shape, out_channels=hidden_units, kernel_size=3, stride=1, padding=1), nn.ReLU(), nn.Conv2d(in_channels=hidden_units, out_channels=hidden_units, kernel_size=3, stride=1, padding=1), nn.ReLU(), nn.MaxPool2d(kernel_size=2, stride=2) ) self.block_2 = nn.Sequential( nn.Conv2d(hidden_units, hidden_units, 3, padding=1), nn.ReLU(), nn.Conv2d(hidden_units, hidden_units, 3, padding=1), nn.ReLU(), nn.MaxPool2d(2) ) self.classifier = nn.Sequential( nn.Flatten(), nn.Linear(in_features=hidden_units*7*7, out_features=output_shape) ) def forward(self, x: torch.Tensor): x = self.block_1(x) # print(x.shape) x = self.block_2(x) # print(x.shape) x = self.classifier(x) # print(x.shape) return x # Add parameters torch.manual_seed(42) model_2 = FashionMNISTModelV2(input_shape=1, hidden_units=10, output_shape=len(class_names)).to(device) model_2
3.3 Set loss and optimizer
Import accurcay_fn auxiliary function file
import requests from pathlib import Path # Download helper functions from the Learn PyTorch repository (if not already downloaded) if Path("helper_functions.py").is_file(): print("helper_functions.py already exists, skip downloading") else: print("Downloading helper_functions.py") # NOTE: You need to use the "raw" GitHub URL for this to work request = requests.get("https://raw.githubusercontent.com/mrdbourke/pytorch-deep-learning/main/helper_functions.py") with open("helper_functions.py", "wb") as f: f.write(request.content)
Create loss, accuracy and optimizer
from helper_functions import accuracy_fn # Set loss and optimizer loss_fn = nn.CrossEntropyLoss() # this is also called "criterion"/"cost function" in some places optimizer = torch.optim.SGD(params=model_0.parameters(), lr=0.1)
Create a timer
from timeit import default_timer as timer def print_train_time(start:float,end:float,device:torch.device=None): total_time=end-start print(f"Train time on {<!-- -->device}: {<!-- -->total_time:.3f} seconds") return total_time
3.4 Training and testing loop
def train_step(model: torch.nn.Module, data_loader: torch.utils.data.DataLoader, loss_fn: torch.nn.Module, optimizer: torch.optim.Optimizer, accuracy_fn, device: torch.device = device): train_loss, train_acc = 0, 0 model.to(device) for batch, (X, y) in enumerate(data_loader): X, y = X.to(device), y.to(device) y_pred = model(X) loss = loss_fn(y_pred, y) train_loss + = loss train_acc + = accuracy_fn(y_true=y, y_pred=y_pred.argmax(dim=1)) optimizer.zero_grad() loss.backward() optimizer.step() train_loss /= len(data_loader) train_acc /= len(data_loader) print(f"Train loss: {<!-- -->train_loss:.5f} | Train accuracy: {<!-- -->train_acc:.2f}%") def test_step(data_loader: torch.utils.data.DataLoader, model: torch.nn.Module, loss_fn: torch.nn.Module, accuracy_fn, device: torch.device = device): test_loss, test_acc = 0, 0 model.to(device) model.eval() # put model in eval mode # Turn on inference context manager with torch.inference_mode(): for X, y in data_loader: X, y = X.to(device), y.to(device) test_pred = model(X) test_loss + = loss_fn(test_pred, y) test_acc + = accuracy_fn(y_true=y, y_pred=test_pred.argmax(dim=1) # Go from logits -> pred labels ) test_loss /= len(data_loader) test_acc /= len(data_loader) print(f"Test loss: {<!-- -->test_loss:.5f} | Test accuracy: {<!-- -->test_acc:.2f}%\ ")
torch.manual_seed(42) from timeit import default_timer as timer train_time_start_model_2 = timer() epochs = 3 for epoch in tqdm(range(epochs)): print(f"Epoch: {<!-- -->epoch}\ ---------") train_step(data_loader=train_dataloader, model=model_2, loss_fn=loss_fn, optimizer=optimizer, accuracy_fn=accuracy_fn, device=device ) test_step(data_loader=test_dataloader, model=model_2, loss_fn=loss_fn, accuracy_fn=accuracy_fn, device=device ) train_time_end_model_2 = timer() total_train_time_model_2 = print_train_time(start=train_time_start_model_2, end=train_time_end_model_2, device=device)
4. Model evaluation and result output
torch.manual_seed(42) def eval_model(model: torch.nn.Module, data_loader: torch.utils.data.DataLoader, loss_fn: torch.nn.Module, accuracy_fn, device: torch.device = device): #Note loss, acc = 0, 0 model.eval() with torch.inference_mode(): for X, y in data_loader: #Attention device transfer X, y = X.to(device), y.to(device) y_pred = model(X) loss + = loss_fn(y_pred, y) acc + = accuracy_fn(y_true=y, y_pred=y_pred.argmax(dim=1)) loss /= len(data_loader) acc /= len(data_loader) return {<!-- -->"model_name": model.__class__.__name__, "model_loss": loss.item(), "model_acc": acc} model_2_results = eval_model( model=model_2, data_loader=test_dataloader, loss_fn=loss_fn, accuracy_fn=accuracy_fn ) model_2_results
Model output results: