Based on Pytorch, the most comprehensive and ultra-high prediction accuracy in the history of cat and dog binary classification

The most complete and ultra-high prediction accuracy in the history of Pytorch-based binary classification of cats and dogs

Free sharing~
Cat and dog classification file download address
in the next chapter say

Two categories of cats and dogs

- Based on Pytorch, the most comprehensive and ultra-high prediction accuracy in the history of cat and dog binary classification
First level directory
- - One: Data preparation
  - Two: training and model creation, and there is also reading data in it
  - Three: Prediction (just take a cat and dog picture to identify whether it is a cat or a dog)
  - Four: Upgraded forecast

First level directory

Secondary classification of cats and dogs This has really troubled me for several days. I found a lot of information based on TensorFlow classification of cats and dogs, but what we require is classification of cats and dogs in pytorch . At the beginning, I found it and it ran successfully. I thought it was ok. Finally, I was dumbfounded when I took a look at the practical requirements. The teacher wanted pytorch, but I got TensorFlow. The main reason was that I didn’t know that the two were the same. After searching, I slowly found this. Both environments are different. After that, I searched again, and I searched for several days, but it was hard work. Most of the Internet is classified as cats and dogs of TensorFlow, and there are very few pytorch. However, the aftermath came out. I learned a lot in this process, write an article to record it. Mine uses a GPU, you all try to install a GPU, it is very simple, I thought it was difficult before, but it is not. You have to remember to install the GPU, first install cuda, use the cuda version to install pytorch, and then ~~~So easy! You can ask me if you don’t understand.

Ok, on to the topic:

My software: pycharm professional edition, which was originally a community edition, but there are too few options for him to create files, so he directly switched to the professional edition yyds, this is the cracked version I found online (I found that Visual Studio Code can also run during the past few days of exploration) Python, I thought it was pretty good at the time, so I messed with that, but it couldn’t run after that, it seemed that the version of tensboard was wrong, so I simply gave up and directly used the professional version of pycharm)

There are two pytorchs, one CPU and one GPU (it turns out that a computer can have two pytorches, just name them differently)

python in pytorch: 3.7, torch: 1.2.0.

cuda: 10.0 (a bit low, but at least my versions match and can be used)

Below is the code, there are explanations in the code, let’s go and realize it yourself, come on, finding problems and solving problems is indeed a good way to learn (because my brother has benefited a lot)

first on the directory

Among them, data-predict stores the predicted pictures, so I put two pictures~

One: Data preparation

All the data sets used on the Internet are from the cat and dog competition of the year. After decompression, it is shown in the figure below

If you use the pictures in the train, there are 25,000 pictures, which I think is a bit too much, so I created a smaller data set, which is Smalldata in the above directory. If you go to the Internet to find this 25,000 data set, you can download it from the Baidu network disk.
data.py


import os, shut down
# Downloaded kaggle dataset path
original_dataset_dir = '/pythonProject3/cat and dog classification/Bigdata'
# New small dataset placement path
base_dir = '/pythonProject3/cat and dog classification/Smalldata'
os.mkdir(base_dir)
train_dir = os.path.join(base_dir, 'train')
os.mkdir(train_dir)
test_dir = os.path.join(base_dir, 'test')
os.mkdir(test_dir)

train_cats_dir = os.path.join(train_dir, 'cats')
os.mkdir(train_cats_dir)
train_dogs_dir = os.path.join(train_dir, 'dogs')
os.mkdir(train_dogs_dir)
test_cats_dir = os.path.join(test_dir, 'cats')
os.mkdir(test_cats_dir)
test_dogs_dir = os.path.join(test_dir, 'dogs')
os.mkdir(test_dogs_dir)

fnames = ['cat.{}.jpg'.format(i) for i in range(200)]
for fname in fnames:
    src = os.path.join(original_dataset_dir, fname)
    dst = os.path.join(train_cats_dir, fname)
    shutil. copyfile(src, dst)


fnames = ['cat.{}.jpg'.format(i) for i in range(300, 400)]
for fname in fnames:
    src = os.path.join(original_dataset_dir, fname)
    dst = os.path.join(test_cats_dir, fname)
    shutil. copyfile(src, dst)

fnames = ['dog.{}.jpg'.format(i) for i in range(200)]
for fname in fnames:
    src = os.path.join(original_dataset_dir, fname)
    dst = os.path.join(train_dogs_dir, fname)
    shutil. copyfile(src, dst)

fnames = ['dog.{}.jpg'.format(i) for i in range(300, 400)]
for fname in fnames:
    src = os.path.join(original_dataset_dir, fname)
    dst = os.path.join(test_dogs_dir, fname)
    shutil. copyfile(src, dst)

print('total training cat images:', len(os. listdir(train_cats_dir)))
print('total training dog images:', len(os. listdir(train_dogs_dir)))
print('total test cat images:', len(os. listdir(test_cats_dir)))
print('total test dog images:', len(os. listdir(test_dogs_dir)))

Demonstrate it:

Too much talk is bound to be lost, come on~

Two: training and model creation, by the way, there is also reading data

train.py


import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms, models
from torch.utils.data import DataLoader
device=torch.device("cuda" if torch.cuda.is_available() else "cpu") #Judge whether the GPU is available
# Data preprocessing Data enhancement
transform = transforms. Compose([
    # Randomly crop the image and then resize it to a fixed size (224*224)
    transforms.RandomResizedCrop(224),
    # Randomly rotate by 20 degrees (clockwise and counterclockwise)
    transforms. RandomRotation(20),
    # random horizontal flip
    transforms.RandomHorizontalFlip(p=0.5),
    # convert data to tensor
    transforms. ToTensor()
])

# read data
root = 'Smalldata' #root is the dataset directory
# Get the path of the data, use transform to enhance the change
train_dataset = datasets. ImageFolder(root + '/train', transform)
test_dataset = datasets. ImageFolder(root + '/test', transform)
# Import Data
# 8 data per batch, scrambled
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=8, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=8, shuffle=True)
# classification name
classes = train_dataset.classes
# category number
classes_index = train_dataset.class_to_idx
print("category name",classes)
print("category number", classes_index)
# models. There are many trained models provided by pytorch
model = models.vgg16(pretrained=True)
# We mainly want to call the convolutional layer of vgg16, define the fully connected layer by ourselves, and overwrite the original one
# If you want to train only the fully connected layer of the model (comment out this for if you don't want to)
for param in model.parameters():
    param.requires_grad = False
# Build a new fully connected layer
# 25088: The volume class input is 25088 neurons, the middle 100 are self-defined, and the number of output categories is 2
model.classifier = torch.nn.Sequential(torch.nn.Linear(25088, 100),
                                       torch.nn.ReLU(),
                                       torch.nn.Dropout(p=0.5),
                                       torch.nn.Linear(100, 2)
                                       # Here you can add softmax or not
                                       )
model=model.to(device) #Send the model to the GPU
print("Use GPU:",next(model.parameters()).device) # output: cuda:0
LR = 0.0001
# define the cost function
entropy_loss = nn.CrossEntropyLoss() #loss function
# define optimizer
optimizer = optim.SGD(model.parameters(), LR, momentum=0.9)
print("Start training~")
def train():
    model. train()
    for i, data in enumerate(train_loader):
        # Get data and corresponding labels
        inputs, labels = data
        inputs,labels=inputs.to(device),labels.to(device) #Send data to GPU
        # Obtain model prediction results, (64, 10)
        out = model(inputs)
        # Cross entropy cost function out(batch,C),labels(batch)
        loss = entropy_loss(out, labels).to(device) #Don't forget that the loss function should also be sent to the GPU
        # Clear the gradient to 0
        optimizer. zero_grad()
        # calculate the gradient
        loss. backward()
        # Modify weights
        optimizer. step()
def test():
    model.eval()
    correct = 0
    for i, data in enumerate(test_loader):
        # Get data and corresponding labels
        inputs, labels = data
        inputs,labels=inputs.to(device),labels.to(device)
        # Get model prediction results
        out = model(inputs)
        # Get the maximum value and the location of the maximum value
        _, predicted = torch.max(out, 1)
        # Predict the correct amount
        correct + = (predicted == labels). sum()
    print("Test acc: {:.2f}". format(correct. item() / len(test_dataset)))
    print("Test loss:{:.2f}".format(1-correct.item() / len(test_dataset))) #loss rate + accuracy rate 1

    correct = 0
    for i, data in enumerate(train_loader):
        # Get data and corresponding labels
        inputs, labels = data
        inputs,labels=inputs.to(device),labels.to(device)
        # Get model prediction results
        out = model(inputs)
        # Get the maximum value and the location of the maximum value
        _, predicted = torch.max(out, 1)
        # Predict the correct amount
        correct + = (predicted == labels). sum()
    print("Train acc: {:.2f}". format(correct. item() / len(train_dataset)))
    print("Train loss:{:.2f}".format(1-correct.item() / len(train_dataset)))
for epoch in range(0,10):
    print('epoch:', epoch)
    train()
    test()
torch.save(model.state_dict(), 'model.pth')
print("~end training")

Demonstrate it:

Because at the beginning, I got CPU without GPU. I changed this many times. I used a little Vgg16, which belongs to CNN. It took me a day to get the code running on GPU. It turned out to be this: model=model .to(device) made a mistake, the second model was not VGG16, but it didn’t work at that time, hey, I wasted my day.

Three: Prediction (take a picture of a cat and dog at random to identify whether it is a cat or a dog)

predict.py


import torch
import numpy as np
from PIL import Image
from torchvision import transforms, models


model = models.vgg16(pretrained=True)
# Build a new fully connected layer
model.classifier = torch.nn.Sequential(torch.nn.Linear(25088, 100),
                                       torch.nn.ReLU(),
                                       torch.nn.Dropout(p=0.5),
                                       torch.nn.Linear(100, 2))

# Load the trained model, which stores the parameters of the model
model.load_state_dict(torch.load('model.pth'))

# prediction mode
model.eval()

label = np. array(['cat', 'dog'])

# Data preprocessing
transform = transforms. Compose([
    transforms. Resize(224),
    transforms. ToTensor()
])


def predict(image_path):
    # open the image
    img = Image.open(image_path)
    # Data processing, add another dimension, add 1 dimension to the 0th dimension, and become a 4-dimensional data (originally 3-dimensional, width and height and dimension 3)
    img = transform(img). unsqueeze(0)
    # predict the result
    outputs = model(img)
    # 1 represents the first dimension (there are 2 possible values, cat 0 dog 1), where the maximum value is obtained (which one is more likely to be a cat or a dog), and the 0th dimension is the number of pictures in each batch (1)
    _, predicted = torch.max(outputs, 1)
    # convert to category name
    print(label[predicted. item()])


predict('data-predict/cat.jpg')
Insert code slice in predict('data-predict/dog.jpg')

demonstrate

No pictures of cats and dogs are displayed, so my matplotlib is not for nothing, absolutely not, and the whole thing~ (maybe I have a little obsessive-compulsive disorder)
I always feel that this is not perfect enough to verify that the picture will not be displayed, so I searched for information and upgraded it.

Four: Upgraded forecast

Upgraded version predict.py

import os
os.environ['NLS_LANG'] = 'SIMPLIFIED CHINESE_CHINA.UTF8'
import time
import json
import torch
import torchvision.transforms as transforms
from PIL import Image
from matplotlib import pyplot as plt
import torchvision.models as models
import torchsummary
BASE_DIR = os.path.dirname(os.path.abspath(__file__))
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")


def img_transform(img_rgb, transform=None):
    """
    Transform the data into a form that the model reads
    :param img_rgb: PIL Image
    :param transform: torchvision.transform
    :return: tensor
    """

    if transform is None:
        raise ValueError("Can't find transform! There must be transform to process img")

    img_t = transform(img_rgb)
    return img_t


def load_class_names(p_clsnames, p_clsnames_cn):
    """
    load tag name
    :param p_clsnames:
    :param p_clsnames_cn:
    :return:
    """
    with open(p_clsnames, "r") as f:
        class_names = json. load(f)
    with open(p_clsnames_cn, encoding='UTF-8') as f: # set file object
        class_names_cn = f. readlines()
    return class_names, class_names_cn


def get_model(path_state_dict, num_classes, vis_model=False):
    """
    Create a model, load parameters
    :param path_state_dict:
    :return:
    """

    model = models.vgg16(num_classes=num_classes)
    model.classifier = torch.nn.Sequential(torch.nn.Linear(25088, 100),
                                           torch.nn.ReLU(),
                                           torch.nn.Dropout(p=0.5),
                                           torch.nn.Linear(100, 2))

    pretrained_state_dict = torch. load(path_state_dict)
    model.load_state_dict(pretrained_state_dict)
    model.eval()

    if vis_model:
        from torchsummary import summary
        summary(model, input_size=(3, 224, 224), device="cpu")

    model.to(device)
    return model


def process_img(path_img):

    # hard code
    norm_mean = [0.485, 0.456, 0.406]
    norm_std = [0.229, 0.224, 0.225]
    inference_transform = transforms. Compose([
        transforms. Resize(256),
        transforms.CenterCrop((224, 224)),
        transforms.ToTensor(),
        transforms.Normalize(norm_mean, norm_std),
    ])

    # path --> img
    img_rgb = Image.open(path_img).convert('RGB')

    # img --> tensor
    img_tensor = img_transform(img_rgb, inference_transform)
    img_tensor.unsqueeze_(0) # chw --> bchw
    img_tensor = img_tensor.to(device)

    return img_tensor, img_rgb


if __name__ == "__main__":
    num_classes=2
    #config
    path_state_dict = os.path.join(BASE_DIR, "model.pth")
    path_img = os.path.join(BASE_DIR, "data-predict", "dog.jpg")

    # 1/5 load img
    img_tensor, img_rgb = process_img(path_img)

    # 2/5 load model
    model = get_model(path_state_dict, num_classes, True)

    with torch.no_grad():
        time_tic = time. time()
        outputs = model(img_tensor)
        time_toc = time. time()

    # 4/5 index to class names
    _, pred_int = torch.max(outputs.data, 1)
    _, top1_idx = torch.topk(outputs.data, 1, dim=1)
    #
    pred_idx = int(pred_int.cpu().numpy())
    if pred_idx == 0:
        pred_str = str("cat")
        print("img: {} is: {}". format(os. path. basename(path_img), pred_str))
    else:
        pred_str = str("dog")
        print("img: {} is: {}". format(os. path. basename(path_img), pred_str))
    print("time consuming:{:.2f}s". format(time_toc - time_tic))

    # 5/5 visualization
    plt.imshow(img_rgb)
    plt. title("predict:{}". format(pred_str))
    plt.text(5, 45, "top {}:{}".format(1, pred_str), bbox=dict(fc='yellow'))
    plt. show()

Demo:

Multiple images, OK, perfect.

Finally, today is March 23, 2023. I wish you all a healthy, rich and happy year in 2023!

end~