In-depth analysis of ResNet152 residual network: code analysis and practical application

Table of Contents

1. Background introduction

2. Application practice of ResNet152 residual network

1. Define ResNet152 model

2. Input image preprocessing

3. Define a custom data set class

4. Detect available computing devices and define optimizers

5. Model training

6. Complete code and result display

3. Summary

1. Background Introduction

ResNet152 is a type of Deep Residual Network, which is a very powerful image classification model. This network was proposed by Microsoft Research. Its core idea is to introduce residual modules and bottleneck structures so that the model can effectively learn image features at a deeper level, thereby avoiding the optimization function from falling into the local optimal solution and the problem of gradient disappearance. The basic principle is to introduce the residual module and bottleneck structure so that the model can effectively learn image features at a deeper level, thereby avoiding the optimization function falling into the local optimal solution and the problem of gradient disappearance.

The design idea of ResNet is to pass the input features through a series of convolutional layers, pooling layers and other operations, and then sum them with the original input features. This way, more original information can be retained and avoid the information being lost in the multi-layer network. Lost during delivery. The design of this residual connection allows the network to skip some unnecessary convolution operations during training, thereby reducing the amount of calculation and model size, while improving the performance of the model.

2. ResNet152 Residual Network Application Practice

ResNet152 residual network has performed well in application practice and is widely used in various computer vision tasks, such as image classification, target detection, semantic segmentation, etc. This time, image classification will be used as a case for practical demonstration.

1. Define ResNet152 model

resnet_model = models.resnet152(weights=models.ResNet152_Weights.DEFAULT) #Create a ResNet152 model instance
for param in resnet_model.parameters():#Freeze model parameters, leaving only the fully connected layer
    param.requires_grad = False
in_features = resnet_model.fc.in_features
resnet_model.fc = nn.Linear(in_features,20)
params_to_update = []
for param in resnet_model.parameters():#Traverse all parameters of the ResNet152 model
    if param.requires_grad == True:
        params_to_update.append(param)

The specific parameters are explained in detail as follows:

resnet_model = models.resnet152(weights=models.ResNet152_Weights.DEFAULT): This line of code creates a ResNet152 model instance. models.resnet152 is the predefined ResNet152 model function in PyTorch. weights=models.ResNet152_Weights.DEFAULT specifies the use of default pre-trained weights.
for param in resnet_model.parameters(): param.requires_grad = False: This line of code will all parameters in the ResNet152 model. The code>requires_grad property is set to False. This means that during backpropagation, these parameters are not updated. This is often used to freeze certain parts of the model to prevent its parameters from changing during training.

in_features = resnet_model.fc.in_features: This line of code gets the number of input features of the last fully connected layer of the ResNet152 model.

resnet_model.fc = nn.Linear(in_features,20): This line of code replaces the last fully connected layer of the ResNet152 model with a new fully connected layer. The number of input features of the layer is the same as the original one (in_features), and the number of output features is 20.

params_to_update = [] for param in resnet_model.parameters(): if param.requires_grad == True: params_to_update.append(param):here An empty list params_to_update is created, and then all parameters of the ResNet152 model are iterated over. For each parameter that needs to be updated (i.e. param.requires_grad == True), add it to the params_to_update list.

A ResNet152 model is created through the above process, all its parameters are frozen (without updating these parameters), and then the final fully connected layer is modified so that it outputs a 20-dimensional vector instead of the original class vector. At the same time, a list is prepared to store the parameters that need to be updated for use during the training process.

2. Input image preprocessing

data_transforms = { #You can also use the PIL library and smote to manually fit the data 'train': transforms.Compose([ transforms.Resize([300,300]), #is the image transformation size transforms.RandomRotation(45),#Random rotation, randomly selected between -45 and 45 degrees transforms.CenterCrop(256),#Crop from the center[256,256] transforms.RandomHorizontalFlip(p=0.5),#Random horizontal flip choose a probability probability transforms.RandomVerticalFlip(p=0.5),#Random vertical flip transforms.ColorJitter(brightness=0.2, contrast=0.1, saturation=0.1, hue=0.1), #Parameter 1 is brightness, parameter 2 is contrast, parameter 3 is saturation, parameter 4 is hue transforms.RandomGrayscale(p=0.1),#Probability is converted into grayscale rate, 3 channels are R=G=B transforms.ToTensor(), transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])#Standardization, mean, standard deviation ]), 'valid': transforms.Compose([ transforms.Resize([256,256]), transforms.ToTensor(), transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) ]), }

Some parameters are explained in detail below:

'train' and 'valid': These two keys define two data conversion methods, used for training respectively. set and validation set. Often, the training and validation sets may be transformed differently, e.g. the validation set may not require data augmentation.

transforms.Compose([...]): This is PyTorch’s image transformation tool, which accepts a series of transformation operations (called “transforms”) and converts They are concatenated so that each transformation is executed after the previous one.

transforms.ToTensor(): This operation will convert the input image from PIL Image or numpy.ndarray to PyTorch’s Tensor.

transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]): This operation will normalize the input Tensor, Give it a specified mean and standard deviation. This is common when training neural networks as it helps the network learn features better.

The above code section defines two transformation methods for image data preprocessing: one for the training set and the other for the validation set. These transformations include image resizing, rotation, cropping, flipping, color dithering, grayscale, and normalization.

3. Define custom data set class

class food_dataset(Dataset): #food_dataset is the name of the class you created. You can change it to the name you need. def __init__(self, file_path,transform=None): #Initialization of class self.file_path = file_path self.imgs = [] self.labels = [] self.transform = transform with open(self.file_path) as f: samples = [x.strip().split(' ') for x in f.readlines()] for img_path, label in samples: self.imgs.append(img_path) self.labels.append(label) def __len__(self): #After the class instantiates the object, you can use the len function to measure the number of objects return len(self.imgs) def __getitem__(self, idx): #Key, each image data and label can be obtained in the form of index image = Image.open(self.imgs[idx]) # if self.transform: image = self.transform(image) label = self.labels[idx] label = torch.from_numpy(np.array(label,dtype = np.int64)) return image, label training_data = food_dataset(file_path = 'train.txt',transform = data_transforms['train']) test_data = food_dataset(file_path = 'test.txt',transform = data_transforms['valid']) train_dataloader = DataLoader(training_data, batch_size=64, shuffle=True) # 64 pictures are a package, test_dataloader = DataLoader(test_data, batch_size=64, shuffle=True)

Defines a custom data set class food_dataset, which reads the image path and corresponding label from the file, and provides a method to obtain samples in the data set . This class can be used with PyTorch’s DataLoader to batch load and transform data in a training loop. For detailed operation analysis, please see below to define the training and test data sets.

Note: The following definitions of training set and test set are described in the above link and will not be described in this article.

4. Detect available computing devices and define optimizers

device = "cuda" if torch.cuda.is_available() else "mps" if torch.backends.mps.is_available() else "cpu"#Detect whether GPU is supported print(f"Using {device} device") model = resnet_model.to(device) loss_fn = nn.CrossEntropyLoss()#Define cross entropy loss function optimizer = torch.optim.Adam(params_to_update,lr=0.001)#Define optimizer

Some parameters are explained in detail below:

model = resnet_model.to(device):Transfer the predefined ResNet model to the previously determined device. If the device is a CPU or MPS, this line of code has no practical effect because the model is already on these devices. But if the device is CUDA, this line of code will put the model on the GPU.

optimizer = torch.optim.Adam(params_to_update,lr=0.001):
An optimizer is defined, namely Adam optimizer. It will use a learning rate (lr) of 0.001 to optimize the parameters in the model. Here params_to_update is a reference to the parameters in the model that need to be updated. Typically, this means the weights and biases of the model.

5. Model training

epochs = 20 #Define the number of training times for t in range(epochs): start_time = time.time() print(f"Epoch {t + 1}\\ ----------------------------------") train(train_dataloader, model, loss_fn, optimizer) #Train and test test(test_dataloader, model, loss_fn) end_time = time.time() time_diff = end_time - start_time print("Time difference:", time_diff) #Calculate training time print()

By performing 20 times of training and recording the loss and accuracy each time, we can then determine whether the image classification results meet the requirements.

6. Complete code and result display

import torch from torch.utils.data import Dataset,DataLoader import numpy as np from torch import nn from PIL import Image from torchvision import transforms import torchvision.models as models import time resnet_model = models.resnet152(weights=models.ResNet152_Weights.DEFAULT) for param in resnet_model.parameters(): param.requires_grad = False in_features = resnet_model.fc.in_features resnet_model.fc = nn.Linear(in_features,20) params_to_update = [] for param in resnet_model.parameters(): if param.requires_grad == True: params_to_update.append(param) data_transforms = { #You can also use the PIL library and smote to manually fit the data 'train': transforms.Compose([ transforms.Resize([300,300]), #is the image transformation size transforms.RandomRotation(45),#Random rotation, randomly selected between -45 and 45 degrees transforms.CenterCrop(256),#Crop from the center[256,256] transforms.RandomHorizontalFlip(p=0.5),#Random horizontal flip choose a probability probability transforms.RandomVerticalFlip(p=0.5),#Random vertical flip transforms.ColorJitter(brightness=0.2, contrast=0.1, saturation=0.1, hue=0.1), #Parameter 1 is brightness, parameter 2 is contrast, parameter 3 is saturation, parameter 4 is hue transforms.RandomGrayscale(p=0.1),#Probability is converted into grayscale rate, 3 channels are R=G=B transforms.ToTensor(), transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])#Standardization, mean, standard deviation ]), 'valid': transforms.Compose([ transforms.Resize([256,256]), transforms.ToTensor(), transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) ]), } #Doing data enhancement does not mean that the training effect will definitely get better. It can only say that it will get better with a high probability. class food_dataset(Dataset): #food_dataset is the name of the class you created. You can change it to the name you need. def __init__(self, file_path,transform=None): #Initialization of class self.file_path = file_path self.imgs = [] self.labels = [] self.transform = transform with open(self.file_path) as f: samples = [x.strip().split(' ') for x in f.readlines()] for img_path, label in samples: self.imgs.append(img_path) self.labels.append(label) def __len__(self): #After the class instantiates the object, you can use the len function to measure the number of objects return len(self.imgs) def __getitem__(self, idx): #Key, each image data and label can be obtained in the form of index image = Image.open(self.imgs[idx]) # if self.transform: image = self.transform(image) label = self.labels[idx] label = torch.from_numpy(np.array(label,dtype = np.int64)) return image, label training_data = food_dataset(file_path = 'train.txt',transform = data_transforms['train']) test_data = food_dataset(file_path = 'test.txt',transform = data_transforms['valid']) train_dataloader = DataLoader(training_data, batch_size=64, shuffle=True) # 64 pictures are a package, test_dataloader = DataLoader(test_data, batch_size=64, shuffle=True) '''Show pictures in the training data set''' # from matplotlib import pyplot as plt # image, label = iter(train_dataloader).__next__() #iter is an iterator function. __next__() is used to get the next data # sample = image[2] #image # sample = sample.permute((1, 2, 0)).numpy() #Dimension conversion of tensor data # plt.imshow(sample) # plt.show() # print('Label is: {}'.format(label[2].numpy())) '''-------------cnn convolutional neural network part----------------------'\ '' device = "cuda" if torch.cuda.is_available() else "mps" if torch.backends.mps.is_available() else "cpu" print(f"Using {device} device") model = resnet_model.to(device) loss_fn = nn.CrossEntropyLoss() optimizer = torch.optim.Adam(params_to_update,lr=0.001) #optimizer = torch.optim.Adam(model.parameters(), lr=0.01)#Create an optimizer. SGD is the stochastic gradient descent algorithm? ? #scheduler = torch.optim.lr_scheduler.StepLR(optimizer,step_size=25,gamma=0.5) ''' Define neural network ''' from torch import nn class CNN(nn.Module): def __init__(self): # Input size (3, 256, 256) super(CNN, self).__init__() self.conv1 = nn.Sequential( #Combine multiple layers together. nn.Conv2d( #2d is generally used for images, 3d is used for video data (one more time dimension), 1d is generally used for structured sequence data in_channels=3, #The number of image channels, 1 represents the grayscale image (the number in the convolution kernel group is determined), out_channels=16,# How many feature maps to get, the number of convolution kernels kernel_size = 5, # Convolution kernel size, 5*5 stride=1, # step size padding=2, # It is generally hoped that the size of the result after the convolution kernel is the same as the size of the data before processing, and the effect will be better. So how to design the padding? It is recommended that stride is 1, kernel_size = 2*padding + 1 ), #The output feature map is (16, 256, 256) nn.ReLU(), # relu layer nn.MaxPool2d(kernel_size=2), # Perform pooling operation (2x2 area), the output result is: (16, 128, 128) ) self.conv2 = nn.Sequential( #Input (16, 128, 128) nn.Conv2d(16, 32, 5, 1, 2), # output (32, 128, 128) nn.ReLU(), # relu layer nn.Conv2d(32, 32, 5, 1, 2), # output (32, 128, 128) nn.ReLU(), nn.MaxPool2d(2), # output (32, 64, 64) ) self.conv3 = nn.Sequential( #input (32, 64, 64) nn.Conv2d(32, 64, 5, 1, 2), nn.ReLU(), # output (64, 64, 64) ) self.out = nn.Linear(64 * 64 * 64, 20) # The result of the fully connected layer def forward(self, x): x = self.conv1(x) x = self.conv2(x) x = self.conv3(x)# output (64,64, 32, 32) x = x.view(x.size(0), -1) # Flatten operation, the result is: (batch_size, 64 * 32 * 32) output = self.out(x) return output #model = CNN().to(device) #print(model) def train(dataloader, model, loss_fn, optimizer): model.train() #pytorch provides 2 ways to switch between training and testing modes: model.train() and model.eval(). # The general usage is: write model.trian() before training starts, and model.eval() during testing. #batch_size_num = 1 for X, y in dataloader: #where batch is the number of each data X, y = X.to(device), y.to(device) #Transfer the training data set and labels to the cpu or GPU pred = model.forward(X) #Automatically initialize w weight loss = loss_fn(pred, y) #Calculate the loss value loss through the cross entropy loss function # Backpropagation comes in with a batch of data, calculates the gradient once, and updates the network once. optimizer.zero_grad() #Clear the gradient value to zero loss.backward() #Backward propagation calculates the gradient value of each parameter optimizer.step() #Update network parameters according to gradient # loss = loss.item() #Get the loss value # print(f"loss: {loss:>7f} [number:{batch_size_num}]") # batch_size_num + = 1 best_acc=0 def test(dataloader, model, loss_fn): global best_acc size = len(dataloader.dataset) num_batches = len(dataloader) model.eval() # test_loss, correct = 0, 0 with torch.no_grad(): #A context manager that turns off gradient calculation. When you confirm that Tensor.backward() will not be called. This reduces the memory consumption used by calculations. for X, y in dataloader: X, y = X.to(device), y.to(device) pred = model.forward(X) test_loss + = loss_fn(pred, y).item() # correct + = (pred.argmax(1) == y).type(torch.float).sum().item() #a = (pred.argmax(1) == y) #dim=1 represents the index number corresponding to the maximum value in each row, dim=0 represents the index number corresponding to the maximum value in each column #b = (pred.argmax(1) == y).type(torch.float) test_loss /= num_batches correct /= size print(f"Test result: \\ Accuracy: {(100*correct)}%, Avg loss: {test_loss}") acc_s.append(correct) loss_s.append(test_loss) if correct > best_acc: best_acc = correct #loss_fn = nn.CrossEntropyLoss() #Create a cross-entropy loss function object, because there are a total of 10 numbers in handwriting recognition, and the output will have 10 results. #optimizer = torch.optim.Adam(model.parameters(), lr=0.01)#Create an optimizer. SGD is the stochastic gradient descent algorithm? ? #scheduler = torch.optim.lr_scheduler.StepLR(optimizer,step_size=25,gamma=0.5) '''Training model''' epochs = 20 acc_s = [] loss_s = [] for t in range(epochs): start_time = time.time() # train_dataloader = DataLoader(training_data,batch_size=64,shuffle=True) # test_dataloader = DataLoader(test_data,batch_size=64,shuffle=True) print(f"Epoch {t + 1}\\ ----------------------------------") train(train_dataloader, model, loss_fn, optimizer) test(test_dataloader, model, loss_fn) #scheduler.step() end_time = time.time() time_diff = end_time - start_time print("Time difference:", time_diff) print()

The test results are as follows:

The specific test results are based on personal needs, and the above are for reference only!

3. Summary

As a representative model of deep residual networks, ResNet152 achieves efficient feature learning and model optimization through technologies such as residual blocks, batch normalization, and multi-level structures. It achieves significant performance improvements in various computer vision tasks.

The knowledge points of the article match the official knowledge files, and you can further learn relevant knowledge. Python entry skill treeArtificial IntelligenceDeep Learning 384994 people are learning the system