Cat and dog classification system based on pytorch

This article participates in the new star project artificial intelligence (Pytorch) track: https://bbs.csdn.net/topics/613989052

Written in the foreword: This article is a nanny-level classification tutorial, which aims to enable students with zero foundation to master the basic elements, general templates and module implementation of a classification system. A very detailed comment has been made in the project code, so only the steps to realize the classification system are explained in the article, and the specific explanation can be viewed in the code.

  • Download cat and dog dataset:

Link: https://pan.baidu.com/s/18LxKlQV4HAcGxdqYxRhUtg

Extraction code: 3l6q

  • Divide the data into train, val, test

Among them, the pictures under the train folder are displayed, including pictures of cats and dogs. In fact, the data set contains a total of 25,000 pictures. I wrote the data sets into train, val, and test in a 7:2:1 division. Of course, it should be noted that the number of cats and dogs in each data set should be balanced, otherwise the trained model may be biased towards predicting a certain category.

  • Project directory

--dog_cat_classify
    --data_process.py
    --model.py
    --my_loader.py
    --predict.py
    --train.py

–data_process.py: used to generate train.txt and val.txt

–my_laoder.py: Custom Dataset

–model.py: Build your own cat and dog classification model

–predict.py: Use the trained model to predict images

–train.py: Use the training script to train and verify

  • Separate the image paths and labels in train and val with spaces, and save them separately to train.txt and val.txt for subsequent custom Dataset use.

When customizing the Dataset, the parameter in the initialization function of the Dataset class is the path of train.txt or val.txt, which mainly converts the content in txt into a list form

class dog_dataset(nn.Module):
    def __init__(self, path):
        super(dog_dataset, self).__init__()
        with open(path, 'r') as f:
            self.data_li = f.readlines()
import os

#Generate the path of each image and the category to which the image belongs according to the storage path of the image, and use spaces to separate them
def create_data_txt(path, txt_p):
    data_li = os.listdir(path)
    with open(txt_p, 'w') as f:
        for ele in data_li:
            # print(ele. split('.')[0])
            if ele.split('.')[0] == 'dog':
                f.write('%s %s\\
'%(os.path.join(path, ele), str(0)))
            else:
                f.write('%s %s\\
'%(os.path.join(path, ele), str(1)))

#You can get train.txt and val.txt respectively to build your own data set later
path = r'G:/data_dog_cat/'
txt_path = './'
li = ['train', 'val']
for ele in li:
    txt_p = os.path.join(txt_path,ele + '.txt')
    create_data_txt(os. path. join(path, ele), txt_p)

After executing the above script, train.txt and val.txt will be generated, the content of which is shown in the picture below:

  • Custom Dataset

import torch.nn as nn
from torchvision.transforms import Compose, ToTensor, Resize, Normalize, ColorJitter
from PIL import Image


# Implement your own dataset
class dog_dataset(nn.Module):
    def __init__(self, path):
        super(dog_dataset, self).__init__()
        #The path where the data set is stored
        self.path = path

        #Design the transformation required before the data set is sent to the network, you can combine transformations by yourself, or customize
        self.transforms = Compose([ColorJitter(),
                                       Resize([224, 224]),
                                       ToTensor(),
                                       Normalize(
                                           mean=[0.485, 0.456, 0.406],
                                           std=[0.229, 0.224, 0.225])
                                       ])

        #Read the corresponding txt file to get a list containing all image paths and tags
        with open(path, 'r') as f:
            self.data_li = f.readlines()




    def __getitem__(self, index):
        # Get the path of the image in the list by index
        img_path = self.data_li[index].split(' ')[0]
        #Read the image, here we need to use the PIL library to read the image, because the class instance encapsulated in the subsequent transforms, the default operation type is the Image type
        img = Image.open(img_path)
        #Transform the image, such as changing image size, color transformation, center cropping (ToTensor() and Normalize() are necessary, and the rest of the operations can be combined by themselves, ToTensor() operation and Normalize()
        # The operation cannot be reversed, because the data type of the Normalize() operation is Tensor, otherwise a type error will be reported)
        img = self. transforms(img)
        # Get the label corresponding to the image
        label = int(self.data_li[index].split(' ')[1])
        return img, label

    def __len__(self):
        # return the size of the dataset
        return len(self. data_li)


if __name__ == '__main__':
    txt_path = 'train.txt'
    mydata = dog_dataset(txt_path)
    img,lab = mydata[100]
    print(img. size())
    print(lab)
    print(len(mydata))

__init__: Mainly define a custom transformation method, store the image path and label information in txt in the form of a list

__getitem__: Mainly get a piece of data through the index, here is to get the image path and label in the list through the index, then read the image through the path, and use the transform defined in the __init__ method to transform it, and finally return the image and label .

__len__: returns the size of the train or val dataset

  • Build classification model

The model is mainly constructed by means of nn.Sequential. The advantage of nn.Sequential is that several operators can be combined into one

Module, the content in the module will be executed in order.

import torch
import torch.nn as nn

#Customize a model to implement cat and dog classification
class myModel(nn.Module):
    def __init__(self):
        #Because the custom class inherits from nn.Module, the following sentence indicates the initialization method of the parent class, which is a must
        super(myModel, self).__init__()
        #The task of this article is binary classification, and the final output needs to pass through the activation function to get a probability distribution, with 0.5 as the limit, and the prediction is greater than or equal to 0.5 as a cat, otherwise it is a dog. Of course you can also use softmax()
        self. sigmoid = nn. Sigmoid()
        #The first convolution fast includes convolution, activation function (used to increase the nonlinearity of the model, also known as complexity, which can increase the fitting ability of the model), normalization layer (normalized data distribution, can make the model faster convergence)
        self.conv1_1 = nn.Sequential(nn.Conv2d(3, 32, (3,3), 1, 1),
                                    nn.ReLU(),
                                    nn.BatchNorm2d(32))
        #Same as above, the difference is that the convolution step is 2, this function can make the image smaller
        self.conv1_2 = nn.Sequential(nn.Conv2d(32, 64, (3, 3), 2, 1),
                                    nn.ReLU(),
                                    nn.BatchNorm2d(64))
        #Repeat twice, except for the different parameters, the structure is the same, so that the module can actually be packaged for reuse. Here, for a clearer representation, no package is performed
        self.conv2_1 = nn.Sequential(nn.Conv2d(64, 64, (3, 3), 1, 1),
                                    nn.ReLU(),
                                    nn.BatchNorm2d(64))
        self.conv2_2 = nn.Sequential(nn.Conv2d(64, 128, (3, 3), 2, 1),
                                    nn.ReLU(),
                                    nn.BatchNorm2d(128))

        self.conv3_1 = nn.Sequential(nn.Conv2d(128, 128, (3, 3), 1, 1),
                                    nn.ReLU(),
                                    nn.BatchNorm2d(128))
        self.conv3_2 = nn.Sequential(nn.Conv2d(128, 256, (3, 3), 2, 1),
                                    nn.ReLU(),
                                    nn.BatchNorm2d(256))

        #It should be noted that the number of input channels and the number of output channels must be connected between the convolution and linear layers, otherwise an error will be reported.
        # The input channel here is [batch_szie, channel, height, width] --> [batch_size, channel*height*width]
        self.linear_1 = nn.Linear(28*28*256, 128)
        self.linear_2 = nn.Linear(128, 1)





    def forward(self, x):
        in_size = x. size(0)
        #Here, for the convenience of explaining the change of tensor shape, it is assumed that batch_szie is 2
        #[2,3,224,224] --> [2,32,112,112]
        x = self.conv1_1(x)
        #[2,32,224,224] --> [2,64,112,112]
        x = self.conv1_2(x)
        #[2,64,112,112] --> [2,64,112,112]
        x = self.conv2_1(x)
        #[2,64,112,112] --> [2,128,56,56]
        x = self.conv2_2(x)
        #[2,64,112,112] --> [2,128,56,56]
        x = self.conv3_1(x)
        #[2, 64, 112, 112] --> [2, 256, 28, 28]
        x = self.conv3_2(x)
        x = x.view(in_size,-1)
        x = self. linear_1(x)
        out = self. linear_2(x)
        out = self. sigmoid(out)
        return out


if __name__ == '__main__':
    x = torch.rand([2, 3, 224, 224])
    model = myModel()
    print(model)
    print(model(x). size())
    print(model(x))

__init__: initialization operator

__forward__: The real operator call is carried out in this function. By inputting x, the calculation of each operator is obtained and finally the form of probability distribution is obtained

Finally, print it to see the structure of the model:

myModel(
  (sigmoid): Sigmoid()
  (conv1_1): Sequential(
    (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU()
    (2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  )
  (conv1_2): Sequential(
    (0): Conv2d(32, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (1): ReLU()
    (2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  )
  (conv2_1): Sequential(
    (0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU()
    (2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  )
  (conv2_2): Sequential(
    (0): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (1): ReLU()
    (2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  )
  (conv3_1): Sequential(
    (0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU()
    (2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  )
  (conv3_2): Sequential(
    (0): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (1): ReLU()
    (2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  )
  (linear_1): Linear(in_features=200704, out_features=128, bias=True)
  (linear_2): Linear(in_features=128, out_features=1, bias=True)
)
  • Build training script

import torch
from torch.utils.data import DataLoader
from .my_loader import dog_dataset
from .model import myModel
from torch import nn
from torch.optim.lr_scheduler import StepLR

# define the training process
def train(model, device, train_loader, optimizer, epoch, lr_scheduler, criterion):
    #Define the initialization mode of the model, when train and val, batchnorm and dropout are used differently
    model. train()
    #Iterate dataloader, why datlaoder can be traversed (because it is an iterable object)
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to(device), target.to(device).float().unsqueeze(1)
        # Gradient reset
        optimizer. zero_grad()
        # forward pass
        output = model(data)
        # calculate loss
        loss = criterion(output, target)
        #backpropagation
        loss. backward()
        #parameter update
        optimizer. step()
        # learning rate update
        lr_scheduler. step()
        #Every iteration 10 times, print a loss
        if (batch_idx + 1) % 10 == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(

                epoch, (batch_idx + 1) * len(data), len(train_loader.dataset),

                        (batch_idx + 1) / len(train_loader), loss. item()))


# define the test process

def val(model, device, test_loader, criterion):
    #Define the training mode
    model.eval()
    test_loss = 0
    correct = 0

    # context manager, gradients are not computed in this scope
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device).float().unsqueeze(1)

            output = model(data)
            #print(output)
            test_loss += criterion(output, target, reduction='mean').item()
            pred = torch.tensor([[1] if num[0] >= 0.5 else [0] for num in output]).to(device)
            correct += pred.eq(target.long()).sum().item()

        # print accuracy once
        print('\\
Test set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\\
'.format(
            test_loss, correct, len(test_loader.dataset),
             correct / len(test_loader. dataset)))





def main():
    # If there is cuda, use cuda to accelerate the training model, otherwise use gpu
    device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
    #Define the number of training rounds
    epochs = 100
    # Instantiate the training dataset
    train_dataset = dog_dataset('C:/Users/86181/Desktop/tset_demo\dog_cat_classify/train.txt')
    # Instantiate the validation dataset
    val_dataset = dog_dataset('C:/Users/86181/Desktop/tset_demo\dog_cat_classify/val.txt')

    # Instantiate the training dataLoader
    train_loader = DataLoader(train_dataset, batch_size=2, shuffle=True)
    # Instantiate the validated dataLoader
    val_loader = DataLoader(val_dataset, batch_size=2, shuffle=True)

    # Instantiate the model
    model = myModel()

    # instantiation loss
    criterion = nn.BCELoss()
    # Instantiate an optimizer with an initial learning rate of 1e-3
    optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
    # Instantiate an adaptive learning rate policy
    lr_scheduler = StepLR(optimizer, step_size=10, gamma=0.1)

    for epoch in range(1, epochs + 1):

        train(model, device, train_loader, optimizer, epoch, lr_scheduler, criterion)
        val(model, device, val_loader, criterion)

    # save the model
    torch.save(model, 'G:/dog_cat_calssify/model.pth')

The training situation is shown in the figure below:

  • Build prediction script

import torch
from PIL import Image
from torchvision import transforms

def predict(model_save_path, device, img_path):
    class_names = ['dog', 'cat']
    model = torch.load(model_save_path)
    model.eval()
    image_PIL = Image.open(img_path)
    transform_test = transforms. Compose([
        transforms. Resize(224),
        transforms.ToTensor(),
        transforms. Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])
    ])
    image_tensor = transform_test(image_PIL)
    image_tensor = torch.unsqueeze(image_tensor, 0)
    image_tensor = image_tensor.to(device)
    out = model(image_tensor)
    pred = torch.tensor([[1] if num[0] >= 0.5 else [0] for num in out]).to(device)
    return class_names[pred]

In the future, we will continue to improve the visualization, indicator calculation and other functions of this project. If you need the entire source code, please leave your email address below the comments.

references:

https://www.jb51.net/article/269528.htm

The knowledge points of the article match the official knowledge files, and you can further learn relevant knowledgePython entry skill treeArtificial intelligenceDeep learning 258755 people are studying systematically