This article participates in the new star project artificial intelligence (Pytorch) track: https://bbs.csdn.net/topics/613989052
Written in the foreword: This article is a nanny-level classification tutorial, which aims to enable students with zero foundation to master the basic elements, general templates and module implementation of a classification system. A very detailed comment has been made in the project code, so only the steps to realize the classification system are explained in the article, and the specific explanation can be viewed in the code.
-
Download cat and dog dataset:
Link: https://pan.baidu.com/s/18LxKlQV4HAcGxdqYxRhUtg
Extraction code: 3l6q
-
Divide the data into train, val, test
Among them, the pictures under the train folder are displayed, including pictures of cats and dogs. In fact, the data set contains a total of 25,000 pictures. I wrote the data sets into train, val, and test in a 7:2:1 division. Of course, it should be noted that the number of cats and dogs in each data set should be balanced, otherwise the trained model may be biased towards predicting a certain category.
-
Project directory
--dog_cat_classify --data_process.py --model.py --my_loader.py --predict.py --train.py
–data_process.py: used to generate train.txt and val.txt
–my_laoder.py: Custom Dataset
–model.py: Build your own cat and dog classification model
–predict.py: Use the trained model to predict images
–train.py: Use the training script to train and verify
-
Separate the image paths and labels in train and val with spaces, and save them separately to train.txt and val.txt for subsequent custom Dataset use.
When customizing the Dataset, the parameter in the initialization function of the Dataset class is the path of train.txt or val.txt, which mainly converts the content in txt into a list form
class dog_dataset(nn.Module): def __init__(self, path): super(dog_dataset, self).__init__() with open(path, 'r') as f: self.data_li = f.readlines()
import os #Generate the path of each image and the category to which the image belongs according to the storage path of the image, and use spaces to separate them def create_data_txt(path, txt_p): data_li = os.listdir(path) with open(txt_p, 'w') as f: for ele in data_li: # print(ele. split('.')[0]) if ele.split('.')[0] == 'dog': f.write('%s %s\\ '%(os.path.join(path, ele), str(0))) else: f.write('%s %s\\ '%(os.path.join(path, ele), str(1))) #You can get train.txt and val.txt respectively to build your own data set later path = r'G:/data_dog_cat/' txt_path = './' li = ['train', 'val'] for ele in li: txt_p = os.path.join(txt_path,ele + '.txt') create_data_txt(os. path. join(path, ele), txt_p)
After executing the above script, train.txt and val.txt will be generated, the content of which is shown in the picture below:
-
Custom Dataset
import torch.nn as nn from torchvision.transforms import Compose, ToTensor, Resize, Normalize, ColorJitter from PIL import Image # Implement your own dataset class dog_dataset(nn.Module): def __init__(self, path): super(dog_dataset, self).__init__() #The path where the data set is stored self.path = path #Design the transformation required before the data set is sent to the network, you can combine transformations by yourself, or customize self.transforms = Compose([ColorJitter(), Resize([224, 224]), ToTensor(), Normalize( mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) ]) #Read the corresponding txt file to get a list containing all image paths and tags with open(path, 'r') as f: self.data_li = f.readlines() def __getitem__(self, index): # Get the path of the image in the list by index img_path = self.data_li[index].split(' ')[0] #Read the image, here we need to use the PIL library to read the image, because the class instance encapsulated in the subsequent transforms, the default operation type is the Image type img = Image.open(img_path) #Transform the image, such as changing image size, color transformation, center cropping (ToTensor() and Normalize() are necessary, and the rest of the operations can be combined by themselves, ToTensor() operation and Normalize() # The operation cannot be reversed, because the data type of the Normalize() operation is Tensor, otherwise a type error will be reported) img = self. transforms(img) # Get the label corresponding to the image label = int(self.data_li[index].split(' ')[1]) return img, label def __len__(self): # return the size of the dataset return len(self. data_li) if __name__ == '__main__': txt_path = 'train.txt' mydata = dog_dataset(txt_path) img,lab = mydata[100] print(img. size()) print(lab) print(len(mydata))
__init__: Mainly define a custom transformation method, store the image path and label information in txt in the form of a list
__getitem__: Mainly get a piece of data through the index, here is to get the image path and label in the list through the index, then read the image through the path, and use the transform defined in the __init__ method to transform it, and finally return the image and label .
__len__: returns the size of the train or val dataset
-
Build classification model
The model is mainly constructed by means of nn.Sequential. The advantage of nn.Sequential is that several operators can be combined into one
Module, the content in the module will be executed in order.
import torch import torch.nn as nn #Customize a model to implement cat and dog classification class myModel(nn.Module): def __init__(self): #Because the custom class inherits from nn.Module, the following sentence indicates the initialization method of the parent class, which is a must super(myModel, self).__init__() #The task of this article is binary classification, and the final output needs to pass through the activation function to get a probability distribution, with 0.5 as the limit, and the prediction is greater than or equal to 0.5 as a cat, otherwise it is a dog. Of course you can also use softmax() self. sigmoid = nn. Sigmoid() #The first convolution fast includes convolution, activation function (used to increase the nonlinearity of the model, also known as complexity, which can increase the fitting ability of the model), normalization layer (normalized data distribution, can make the model faster convergence) self.conv1_1 = nn.Sequential(nn.Conv2d(3, 32, (3,3), 1, 1), nn.ReLU(), nn.BatchNorm2d(32)) #Same as above, the difference is that the convolution step is 2, this function can make the image smaller self.conv1_2 = nn.Sequential(nn.Conv2d(32, 64, (3, 3), 2, 1), nn.ReLU(), nn.BatchNorm2d(64)) #Repeat twice, except for the different parameters, the structure is the same, so that the module can actually be packaged for reuse. Here, for a clearer representation, no package is performed self.conv2_1 = nn.Sequential(nn.Conv2d(64, 64, (3, 3), 1, 1), nn.ReLU(), nn.BatchNorm2d(64)) self.conv2_2 = nn.Sequential(nn.Conv2d(64, 128, (3, 3), 2, 1), nn.ReLU(), nn.BatchNorm2d(128)) self.conv3_1 = nn.Sequential(nn.Conv2d(128, 128, (3, 3), 1, 1), nn.ReLU(), nn.BatchNorm2d(128)) self.conv3_2 = nn.Sequential(nn.Conv2d(128, 256, (3, 3), 2, 1), nn.ReLU(), nn.BatchNorm2d(256)) #It should be noted that the number of input channels and the number of output channels must be connected between the convolution and linear layers, otherwise an error will be reported. # The input channel here is [batch_szie, channel, height, width] --> [batch_size, channel*height*width] self.linear_1 = nn.Linear(28*28*256, 128) self.linear_2 = nn.Linear(128, 1) def forward(self, x): in_size = x. size(0) #Here, for the convenience of explaining the change of tensor shape, it is assumed that batch_szie is 2 #[2,3,224,224] --> [2,32,112,112] x = self.conv1_1(x) #[2,32,224,224] --> [2,64,112,112] x = self.conv1_2(x) #[2,64,112,112] --> [2,64,112,112] x = self.conv2_1(x) #[2,64,112,112] --> [2,128,56,56] x = self.conv2_2(x) #[2,64,112,112] --> [2,128,56,56] x = self.conv3_1(x) #[2, 64, 112, 112] --> [2, 256, 28, 28] x = self.conv3_2(x) x = x.view(in_size,-1) x = self. linear_1(x) out = self. linear_2(x) out = self. sigmoid(out) return out if __name__ == '__main__': x = torch.rand([2, 3, 224, 224]) model = myModel() print(model) print(model(x). size()) print(model(x))
__init__: initialization operator
__forward__: The real operator call is carried out in this function. By inputting x, the calculation of each operator is obtained and finally the form of probability distribution is obtained
Finally, print it to see the structure of the model:
myModel( (sigmoid): Sigmoid() (conv1_1): Sequential( (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): ReLU() (2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv1_2): Sequential( (0): Conv2d(32, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1)) (1): ReLU() (2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv2_1): Sequential( (0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): ReLU() (2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv2_2): Sequential( (0): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1)) (1): ReLU() (2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv3_1): Sequential( (0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): ReLU() (2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv3_2): Sequential( (0): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1)) (1): ReLU() (2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (linear_1): Linear(in_features=200704, out_features=128, bias=True) (linear_2): Linear(in_features=128, out_features=1, bias=True) )
-
Build training script
import torch from torch.utils.data import DataLoader from .my_loader import dog_dataset from .model import myModel from torch import nn from torch.optim.lr_scheduler import StepLR # define the training process def train(model, device, train_loader, optimizer, epoch, lr_scheduler, criterion): #Define the initialization mode of the model, when train and val, batchnorm and dropout are used differently model. train() #Iterate dataloader, why datlaoder can be traversed (because it is an iterable object) for batch_idx, (data, target) in enumerate(train_loader): data, target = data.to(device), target.to(device).float().unsqueeze(1) # Gradient reset optimizer. zero_grad() # forward pass output = model(data) # calculate loss loss = criterion(output, target) #backpropagation loss. backward() #parameter update optimizer. step() # learning rate update lr_scheduler. step() #Every iteration 10 times, print a loss if (batch_idx + 1) % 10 == 0: print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format( epoch, (batch_idx + 1) * len(data), len(train_loader.dataset), (batch_idx + 1) / len(train_loader), loss. item())) # define the test process def val(model, device, test_loader, criterion): #Define the training mode model.eval() test_loss = 0 correct = 0 # context manager, gradients are not computed in this scope with torch.no_grad(): for data, target in test_loader: data, target = data.to(device), target.to(device).float().unsqueeze(1) output = model(data) #print(output) test_loss += criterion(output, target, reduction='mean').item() pred = torch.tensor([[1] if num[0] >= 0.5 else [0] for num in output]).to(device) correct += pred.eq(target.long()).sum().item() # print accuracy once print('\\ Test set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\\ '.format( test_loss, correct, len(test_loader.dataset), correct / len(test_loader. dataset))) def main(): # If there is cuda, use cuda to accelerate the training model, otherwise use gpu device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu') #Define the number of training rounds epochs = 100 # Instantiate the training dataset train_dataset = dog_dataset('C:/Users/86181/Desktop/tset_demo\dog_cat_classify/train.txt') # Instantiate the validation dataset val_dataset = dog_dataset('C:/Users/86181/Desktop/tset_demo\dog_cat_classify/val.txt') # Instantiate the training dataLoader train_loader = DataLoader(train_dataset, batch_size=2, shuffle=True) # Instantiate the validated dataLoader val_loader = DataLoader(val_dataset, batch_size=2, shuffle=True) # Instantiate the model model = myModel() # instantiation loss criterion = nn.BCELoss() # Instantiate an optimizer with an initial learning rate of 1e-3 optimizer = torch.optim.Adam(model.parameters(), lr=1e-3) # Instantiate an adaptive learning rate policy lr_scheduler = StepLR(optimizer, step_size=10, gamma=0.1) for epoch in range(1, epochs + 1): train(model, device, train_loader, optimizer, epoch, lr_scheduler, criterion) val(model, device, val_loader, criterion) # save the model torch.save(model, 'G:/dog_cat_calssify/model.pth')
The training situation is shown in the figure below:
-
Build prediction script
import torch from PIL import Image from torchvision import transforms def predict(model_save_path, device, img_path): class_names = ['dog', 'cat'] model = torch.load(model_save_path) model.eval() image_PIL = Image.open(img_path) transform_test = transforms. Compose([ transforms. Resize(224), transforms.ToTensor(), transforms. Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5]) ]) image_tensor = transform_test(image_PIL) image_tensor = torch.unsqueeze(image_tensor, 0) image_tensor = image_tensor.to(device) out = model(image_tensor) pred = torch.tensor([[1] if num[0] >= 0.5 else [0] for num in out]).to(device) return class_names[pred]
In the future, we will continue to improve the visualization, indicator calculation and other functions of this project. If you need the entire source code, please leave your email address below the comments.
references:
https://www.jb51.net/article/269528.htm
The knowledge points of the article match the official knowledge files, and you can further learn relevant knowledgePython entry skill treeArtificial intelligenceDeep learning 258755 people are studying systematically