**
The most complete and ultra-high prediction accuracy in the history of Pytorch-based binary classification of cats and dogs
Free sharing~
Cat and dog classification file download address
in the next chapter say
Two categories of cats and dogs
-
- Based on Pytorch, the most comprehensive and ultra-high prediction accuracy in the history of cat and dog binary classification
- First level directory
-
-
- One: Data preparation
- Two: training and model creation, and there is also reading data in it
- Three: Prediction (just take a cat and dog picture to identify whether it is a cat or a dog)
- Four: Upgraded forecast
-
First level directory
Secondary classification of cats and dogs This has really troubled me for several days. I found a lot of information based on TensorFlow classification of cats and dogs, but what we require is classification of cats and dogs in pytorch . At the beginning, I found it and it ran successfully. I thought it was ok. Finally, I was dumbfounded when I took a look at the practical requirements. The teacher wanted pytorch, but I got TensorFlow. The main reason was that I didn’t know that the two were the same. After searching, I slowly found this. Both environments are different. After that, I searched again, and I searched for several days, but it was hard work. Most of the Internet is classified as cats and dogs of TensorFlow, and there are very few pytorch. However, the aftermath came out. I learned a lot in this process, write an article to record it. Mine uses a GPU, you all try to install a GPU, it is very simple, I thought it was difficult before, but it is not. You have to remember to install the GPU, first install cuda, use the cuda version to install pytorch, and then ~~~So easy! You can ask me if you don’t understand.
Ok, on to the topic:
My software: pycharm professional edition, which was originally a community edition, but there are too few options for him to create files, so he directly switched to the professional edition yyds, this is the cracked version I found online (I found that Visual Studio Code can also run during the past few days of exploration) Python, I thought it was pretty good at the time, so I messed with that, but it couldn’t run after that, it seemed that the version of tensboard was wrong, so I simply gave up and directly used the professional version of pycharm)
There are two pytorchs, one CPU and one GPU (it turns out that a computer can have two pytorches, just name them differently)
python in pytorch: 3.7, torch: 1.2.0.
cuda: 10.0 (a bit low, but at least my versions match and can be used)
Below is the code, there are explanations in the code, let’s go and realize it yourself, come on, finding problems and solving problems is indeed a good way to learn (because my brother has benefited a lot)
first on the directory
Among them, data-predict stores the predicted pictures, so I put two pictures~
One: Data preparation
All the data sets used on the Internet are from the cat and dog competition of the year. After decompression, it is shown in the figure below
If you use the pictures in the train, there are 25,000 pictures, which I think is a bit too much, so I created a smaller data set, which is Smalldata in the above directory. If you go to the Internet to find this 25,000 data set, you can download it from the Baidu network disk.
data.py
import os, shut down # Downloaded kaggle dataset path original_dataset_dir = '/pythonProject3/cat and dog classification/Bigdata' # New small dataset placement path base_dir = '/pythonProject3/cat and dog classification/Smalldata' os.mkdir(base_dir) train_dir = os.path.join(base_dir, 'train') os.mkdir(train_dir) test_dir = os.path.join(base_dir, 'test') os.mkdir(test_dir) train_cats_dir = os.path.join(train_dir, 'cats') os.mkdir(train_cats_dir) train_dogs_dir = os.path.join(train_dir, 'dogs') os.mkdir(train_dogs_dir) test_cats_dir = os.path.join(test_dir, 'cats') os.mkdir(test_cats_dir) test_dogs_dir = os.path.join(test_dir, 'dogs') os.mkdir(test_dogs_dir) fnames = ['cat.{}.jpg'.format(i) for i in range(200)] for fname in fnames: src = os.path.join(original_dataset_dir, fname) dst = os.path.join(train_cats_dir, fname) shutil. copyfile(src, dst) fnames = ['cat.{}.jpg'.format(i) for i in range(300, 400)] for fname in fnames: src = os.path.join(original_dataset_dir, fname) dst = os.path.join(test_cats_dir, fname) shutil. copyfile(src, dst) fnames = ['dog.{}.jpg'.format(i) for i in range(200)] for fname in fnames: src = os.path.join(original_dataset_dir, fname) dst = os.path.join(train_dogs_dir, fname) shutil. copyfile(src, dst) fnames = ['dog.{}.jpg'.format(i) for i in range(300, 400)] for fname in fnames: src = os.path.join(original_dataset_dir, fname) dst = os.path.join(test_dogs_dir, fname) shutil. copyfile(src, dst) print('total training cat images:', len(os. listdir(train_cats_dir))) print('total training dog images:', len(os. listdir(train_dogs_dir))) print('total test cat images:', len(os. listdir(test_cats_dir))) print('total test dog images:', len(os. listdir(test_dogs_dir)))
Demonstrate it:
Too much talk is bound to be lost, come on~
Two: training and model creation, by the way, there is also reading data
train.py
import torch import torch.nn as nn import torch.optim as optim from torchvision import datasets, transforms, models from torch.utils.data import DataLoader device=torch.device("cuda" if torch.cuda.is_available() else "cpu") #Judge whether the GPU is available # Data preprocessing Data enhancement transform = transforms. Compose([ # Randomly crop the image and then resize it to a fixed size (224*224) transforms.RandomResizedCrop(224), # Randomly rotate by 20 degrees (clockwise and counterclockwise) transforms. RandomRotation(20), # random horizontal flip transforms.RandomHorizontalFlip(p=0.5), # convert data to tensor transforms. ToTensor() ]) # read data root = 'Smalldata' #root is the dataset directory # Get the path of the data, use transform to enhance the change train_dataset = datasets. ImageFolder(root + '/train', transform) test_dataset = datasets. ImageFolder(root + '/test', transform) # Import Data # 8 data per batch, scrambled train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=8, shuffle=True) test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=8, shuffle=True) # classification name classes = train_dataset.classes # category number classes_index = train_dataset.class_to_idx print("category name",classes) print("category number", classes_index) # models. There are many trained models provided by pytorch model = models.vgg16(pretrained=True) # We mainly want to call the convolutional layer of vgg16, define the fully connected layer by ourselves, and overwrite the original one # If you want to train only the fully connected layer of the model (comment out this for if you don't want to) for param in model.parameters(): param.requires_grad = False # Build a new fully connected layer # 25088: The volume class input is 25088 neurons, the middle 100 are self-defined, and the number of output categories is 2 model.classifier = torch.nn.Sequential(torch.nn.Linear(25088, 100), torch.nn.ReLU(), torch.nn.Dropout(p=0.5), torch.nn.Linear(100, 2) # Here you can add softmax or not ) model=model.to(device) #Send the model to the GPU print("Use GPU:",next(model.parameters()).device) # output: cuda:0 LR = 0.0001 # define the cost function entropy_loss = nn.CrossEntropyLoss() #loss function # define optimizer optimizer = optim.SGD(model.parameters(), LR, momentum=0.9) print("Start training~") def train(): model. train() for i, data in enumerate(train_loader): # Get data and corresponding labels inputs, labels = data inputs,labels=inputs.to(device),labels.to(device) #Send data to GPU # Obtain model prediction results, (64, 10) out = model(inputs) # Cross entropy cost function out(batch,C),labels(batch) loss = entropy_loss(out, labels).to(device) #Don't forget that the loss function should also be sent to the GPU # Clear the gradient to 0 optimizer. zero_grad() # calculate the gradient loss. backward() # Modify weights optimizer. step() def test(): model.eval() correct = 0 for i, data in enumerate(test_loader): # Get data and corresponding labels inputs, labels = data inputs,labels=inputs.to(device),labels.to(device) # Get model prediction results out = model(inputs) # Get the maximum value and the location of the maximum value _, predicted = torch.max(out, 1) # Predict the correct amount correct + = (predicted == labels). sum() print("Test acc: {:.2f}". format(correct. item() / len(test_dataset))) print("Test loss:{:.2f}".format(1-correct.item() / len(test_dataset))) #loss rate + accuracy rate 1 correct = 0 for i, data in enumerate(train_loader): # Get data and corresponding labels inputs, labels = data inputs,labels=inputs.to(device),labels.to(device) # Get model prediction results out = model(inputs) # Get the maximum value and the location of the maximum value _, predicted = torch.max(out, 1) # Predict the correct amount correct + = (predicted == labels). sum() print("Train acc: {:.2f}". format(correct. item() / len(train_dataset))) print("Train loss:{:.2f}".format(1-correct.item() / len(train_dataset))) for epoch in range(0,10): print('epoch:', epoch) train() test() torch.save(model.state_dict(), 'model.pth') print("~end training")
Demonstrate it:
Because at the beginning, I got CPU without GPU. I changed this many times. I used a little Vgg16, which belongs to CNN. It took me a day to get the code running on GPU. It turned out to be this: model=model .to(device) made a mistake, the second model was not VGG16, but it didn’t work at that time, hey, I wasted my day.
Three: Prediction (take a picture of a cat and dog at random to identify whether it is a cat or a dog)
predict.py
import torch import numpy as np from PIL import Image from torchvision import transforms, models model = models.vgg16(pretrained=True) # Build a new fully connected layer model.classifier = torch.nn.Sequential(torch.nn.Linear(25088, 100), torch.nn.ReLU(), torch.nn.Dropout(p=0.5), torch.nn.Linear(100, 2)) # Load the trained model, which stores the parameters of the model model.load_state_dict(torch.load('model.pth')) # prediction mode model.eval() label = np. array(['cat', 'dog']) # Data preprocessing transform = transforms. Compose([ transforms. Resize(224), transforms. ToTensor() ]) def predict(image_path): # open the image img = Image.open(image_path) # Data processing, add another dimension, add 1 dimension to the 0th dimension, and become a 4-dimensional data (originally 3-dimensional, width and height and dimension 3) img = transform(img). unsqueeze(0) # predict the result outputs = model(img) # 1 represents the first dimension (there are 2 possible values, cat 0 dog 1), where the maximum value is obtained (which one is more likely to be a cat or a dog), and the 0th dimension is the number of pictures in each batch (1) _, predicted = torch.max(outputs, 1) # convert to category name print(label[predicted. item()]) predict('data-predict/cat.jpg') Insert code slice in predict('data-predict/dog.jpg')
demonstrate
No pictures of cats and dogs are displayed, so my matplotlib is not for nothing, absolutely not, and the whole thing~ (maybe I have a little obsessive-compulsive disorder)
I always feel that this is not perfect enough to verify that the picture will not be displayed, so I searched for information and upgraded it.
Four: Upgraded forecast
Upgraded version predict.py
import os os.environ['NLS_LANG'] = 'SIMPLIFIED CHINESE_CHINA.UTF8' import time import json import torch import torchvision.transforms as transforms from PIL import Image from matplotlib import pyplot as plt import torchvision.models as models import torchsummary BASE_DIR = os.path.dirname(os.path.abspath(__file__)) device = torch.device("cuda" if torch.cuda.is_available() else "cpu") def img_transform(img_rgb, transform=None): """ Transform the data into a form that the model reads :param img_rgb: PIL Image :param transform: torchvision.transform :return: tensor """ if transform is None: raise ValueError("Can't find transform! There must be transform to process img") img_t = transform(img_rgb) return img_t def load_class_names(p_clsnames, p_clsnames_cn): """ load tag name :param p_clsnames: :param p_clsnames_cn: :return: """ with open(p_clsnames, "r") as f: class_names = json. load(f) with open(p_clsnames_cn, encoding='UTF-8') as f: # set file object class_names_cn = f. readlines() return class_names, class_names_cn def get_model(path_state_dict, num_classes, vis_model=False): """ Create a model, load parameters :param path_state_dict: :return: """ model = models.vgg16(num_classes=num_classes) model.classifier = torch.nn.Sequential(torch.nn.Linear(25088, 100), torch.nn.ReLU(), torch.nn.Dropout(p=0.5), torch.nn.Linear(100, 2)) pretrained_state_dict = torch. load(path_state_dict) model.load_state_dict(pretrained_state_dict) model.eval() if vis_model: from torchsummary import summary summary(model, input_size=(3, 224, 224), device="cpu") model.to(device) return model def process_img(path_img): # hard code norm_mean = [0.485, 0.456, 0.406] norm_std = [0.229, 0.224, 0.225] inference_transform = transforms. Compose([ transforms. Resize(256), transforms.CenterCrop((224, 224)), transforms.ToTensor(), transforms.Normalize(norm_mean, norm_std), ]) # path --> img img_rgb = Image.open(path_img).convert('RGB') # img --> tensor img_tensor = img_transform(img_rgb, inference_transform) img_tensor.unsqueeze_(0) # chw --> bchw img_tensor = img_tensor.to(device) return img_tensor, img_rgb if __name__ == "__main__": num_classes=2 #config path_state_dict = os.path.join(BASE_DIR, "model.pth") path_img = os.path.join(BASE_DIR, "data-predict", "dog.jpg") # 1/5 load img img_tensor, img_rgb = process_img(path_img) # 2/5 load model model = get_model(path_state_dict, num_classes, True) with torch.no_grad(): time_tic = time. time() outputs = model(img_tensor) time_toc = time. time() # 4/5 index to class names _, pred_int = torch.max(outputs.data, 1) _, top1_idx = torch.topk(outputs.data, 1, dim=1) # pred_idx = int(pred_int.cpu().numpy()) if pred_idx == 0: pred_str = str("cat") print("img: {} is: {}". format(os. path. basename(path_img), pred_str)) else: pred_str = str("dog") print("img: {} is: {}". format(os. path. basename(path_img), pred_str)) print("time consuming:{:.2f}s". format(time_toc - time_tic)) # 5/5 visualization plt.imshow(img_rgb) plt. title("predict:{}". format(pred_str)) plt.text(5, 45, "top {}:{}".format(1, pred_str), bbox=dict(fc='yellow')) plt. show()
Demo:
Multiple images, OK, perfect.
Finally, today is March 23, 2023. I wish you all a healthy, rich and happy year in 2023!
end~