SSD trains its own data set (Pytorch), super detailed version! ! ! !

I recently started to learn the SSD algorithm. From being confused when I got the code to finally being able to run my own code (I have a weak foundation), I am very, very happy. This article mainly introduces my modification process and the encounters I encountered during the training process. The problems I mentioned can help everyone avoid pitfalls, and also make it easier for me to use it myself in the future. Then let’s start~

In addition, the sources of articles referenced during training should be indicated. If there is any infringement, please contact me to delete:

[Dataset Production] VOC2007 format data set production and processing tutorial (Faster-RCNN model standard input)_AI Xiaoyang’s blog-CSDN blog

SSD pytorch trains its own data set (windows + colab) | Programmer Sought

Summary of problems encountered in SSD’s pytorch implementation training_userwarning: volatile was removed and now has no e-CSDN blog

1. Data set production

What is introduced here is the process of the data set from 0 to 1. If you have already prepared the data set, then skip this part~

1.1 Create folder

Here I am using a data set in VOC2007 format. Before starting to create the data set, you need to create the following folders in the SSD working directory:

Among them, Datacluster Fire and Smoke Sample is the name of the data set I used (it does not need to be the same as mine)

Annotations: the label file of the image, which is an xml file
JPEGImages: store original images
ImageSets:
- Main:
  - test.txt: test set
  - train.txt: training set
  - trainval.txt: training set and validation set
  - val.txt: validation set

1.2 Original image data renaming

After creating the corresponding folder, put the original image into the JPEGImages folder. Since the acquired data set has a messy name, you can rename the original image data. The script file is as follows

import os
path = r"C:\Users\xxx\Desktop\VOC2007\JPEGImages"#The path to the JPEGImages folder
filelist = os.listdir(path) #All files in this folder (including folders)
count=0
for file in filelist:
    print(file)
for file in filelist: #Traverse all files
    Olddir=os.path.join(path,file) #Original file path
    if os.path.isdir(Olddir): #If it is a folder, skip it
        continue
    filename=os.path.slitext(file)[0] #File name
    filetype='.jpg' #File extension
    Newdir=os.path.join(path,str(count).zfill(6) + filetype) #Use the string function zfill to complete the required number of digits with 0
    os.rename(Olddir,Newdir)#Rename
    count + =1

1.3 Image annotation

There are many data set labeling tools. The labeling tool used here is consistent with the original text: LabelImage. There are many installation tutorials and usage methods on the website. I will not go into details here. I will mainly introduce a few methods that can simplify the operation.

Default save address: shortcut keys to select the default save folder, use the shortcut keys Ctrl + r to select the default save folder as Annotation, so you don’t have to select a folder to save every time.
Set the default label value: If there is only one label value, you can set the default label value in the upper right corner of the labeling tool.

Shortcut keys: w key to select annotation, Ctrl + s key to save, a and d keys to quickly switch between the previous and next pictures

1.4 Data set division

After the annotation is completed, the files in the Annotation need to be divided into data sets and divided into training sets, test sets, and verification sets in proportion.

import os
import random

trainval_percent = 0.9#Percentage of validation set and training set
train_percent = 0.7#Percentage of training set
xmlfilepath = r'C:\Users\xxx\Desktop\VOC2007\Annotations'#Annotation folder location
txtsavepath = r'C:\Users\xxx\Desktop\VOC2007\ImageSets\Main'#The location of the Main folder under the ImageSets file
total_xml = os.listdir(xmlfilepath)

num = len(total_xml)
list = range(num)
tv = int(num * trainval_percent)
tr = int(tv * train_percent)
trainval = random.sample(list, tv)
train = random.sample(trainval, tr)
#The four corresponding txt folder paths under the Main folder
ftrainval = open(r'C:\Users\xxx\Desktop\VOC2007\ImageSets\Main\trainval.txt', 'w')
ftest = open(r'C:\Users\xxx\Desktop\VOC2007\ImageSets\Main\test.txt', 'w')
ftrain = open(r'C:\Users\xxx\Desktop\VOC2007\ImageSets\Main\train.txt', 'w')
fval = open(r'C:\Users\xxx\Desktop\VOC2007\ImageSets\Main\val.txt', 'w')

for i in list:
    name = total_xml[i][:-4] + '\\
'
    if i in trainval:
        ftrainval.write(name)
        if i in train:
            ftrain.write(name)
        else:
            fval.write(name)
    else:
        ftest.write(name)

ftrainval.close()
ftrain.close()
fval.close()
ftest.close()

At this time, the name of the corresponding image is in the .txt file under the ImageSets/Main folder. The data set is now ready~

2. Start training

2.1 Environment Configuration

The main thing here is to install the torch version and cv library that match your computer. Here are two torch and cuda version matching tables, which are not original! ! Picture source: torch and cuda version, Driver Version, torchversion, torchaudio correspondence_torch and cuda version correspondence_noob_noob’s blog-CSDN blog

Corresponding versions of CUDA driver and CUDAToolkit
CUDA and its available PyTorch corresponding version

2.2config.py modification

Copy your own data set below according to the format of the VOC and coco data sets. In order to simply test whether it can run, the number of iterations is only 5.

fire = {
    'num_classes': 2, #Number of categories + 1, + 1: background
    'lr_steps': (40000, 50000, 60000), #The number of steps to adjust the learning rate
    'max_iter': 5,#Number of iterations, you can first set a small test to see if it can run
    'feature_maps': [38, 19, 10, 5, 3, 1],
    'min_dim': 300,
    'steps': [8, 16, 32, 64, 100, 300],
    'min_sizes': [30, 60, 111, 162, 213, 264],
    'max_sizes': [60, 111, 162, 213, 264, 315],
    'aspect_ratios': [[2], [2, 3], [2, 3], [2, 3], [2], [2]],
    'variance': [0.1, 0.2],
    'clip': True,
    'name': 'FIRE',
}

2.3 Create a new fire.py file in data

Because the data set I use is in VOC format, it can be modified on the basis of voc0712. The modifications are marked with #########:

from .config import HOME
import os.path as osp
importsys
import torch
import torch.utils.data as data
import cv2
import numpy as np
if sys.version_info[0] == 2:
    import xml.etree.cElementTree as ET
else:
    import xml.etree.ElementTree as ET

FIRE_CLASSES = ( 'fire')############

# note: if you used our download scripts, this should be right
FIRE_ROOT = osp.join('D:/Document/Learning/ObjectDetection/SSD/ssd.pytorch-master', "data/Datacluster Fire and Smoke Sample/") ########## ######


class FIREAnnotationTransform(object):###############
    """Transforms a VOC annotation into a Tensor of bbox coords and label index
    Initilized with a dictionary lookup of classnames to indexes

    Arguments:
        class_to_ind (dict, optional): dictionary lookup of classnames -> indexes
            (default: alphabetic indexing of VOC's 20 classes)
        keep_difficult (bool, optional): keep difficult instances or not
            (default: False)
        height (int): height
        width (int): width
    """

    def __init__(self, class_to_ind=None, keep_difficult=False):
        # self.class_to_ind = class_to_ind or dict(
        # zip(VOC_CLASSES, range(len(VOC_CLASSES))))
        # self.keep_difficult = keep_difficult
        self.class_to_ind = class_to_ind or dict(fire=0)########There is only one category, the dictionary is created directly, otherwise it will be rewritten according to the original form
        self.keep_difficult = keep_difficult

    def __call__(self, target, width, height):
        """
        Arguments:
            target (annotation) : the target annotation to be made usable
                will be an ET.Element
        Returns:
            a list containing lists of bounding boxes [bbox coords, class name]
        """
        res = []
        for obj in target.iter('object'):
            difficult = int(obj.find('difficult').text) == 1
            if not self.keep_difficult and difficult:
                continue
            name = obj.find('name').text.lower().strip()
            bbox = obj.find('bndbox')

            pts = ['xmin', 'ymin', 'xmax', 'ymax']
            bndbox = []
            for i, pt in enumerate(pts):
                cur_pt = int(bbox.find(pt).text) - 1
                # scale height or width
                cur_pt = cur_pt / width if i % 2 == 0 else cur_pt / height
                bndbox.append(cur_pt)
            label_idx = self.class_to_ind[name]
            bndbox.append(label_idx)
            res + = [bndbox] # [xmin, ymin, xmax, ymax, label_ind]
            # img_id = target.find('filename').text[:-4]

        return res # [[xmin, ymin, xmax, ymax, label_ind], ... ]


class FIREDetection(data.Dataset):##############
    """VOC Detection Dataset Object

    input is image, target is annotation

    Arguments:
        root (string): filepath to VOCdevkit folder.
        image_set (string): imageset to use (eg. 'train', 'val', 'test')
        transform (callable, optional): transformation to perform on the
            input image
        target_transform (callable, optional): transformation to perform on the
            target `annotation`
            (eg: take in caption string, return tensor of word indices)
        dataset_name (string, optional): which dataset to load
            (default: 'VOC2007')
    """

    def __init__(self, root,
                 # image_sets=[('2007', 'trainval'), ('2012', 'trainval')],
                 image_sets='trainval',#################
                 transform=None, target_transform=FIREAnnotationTransform(),#############
                 dataset_name='FIRE'):################
        self.root = root
        self.image_set = image_sets
        self.transform = transform
        self.target_transform = target_transform
        self.name = dataset_name
        self._annopath = osp.join('%s', 'Annotations', '%s.xml')
        self._imgpath = osp.join('%s', 'JPEGImages', '%s.jpg')
        self.ids = list()
        # for (year, name) in image_sets:
        # rootpath = osp.join(self.root, 'VOC' + year)
        # for line in open(osp.join(rootpath, 'ImageSets', 'Main', name + '.txt')):
        # self.ids.append((rootpath, line.strip()))
        for line in open(FIRE_ROOT + '/ImageSets/Main/' + self.image_set + '.txt'):
            self.ids.append((FIRE_ROOT, line.strip()))###################

    def __getitem__(self, index):
        im, gt, h, w = self.pull_item(index)

        return im, gt

    def __len__(self):
        return len(self.ids)

    def pull_item(self, index):
        img_id = self.ids[index]

        target = ET.parse(self._annopath % img_id).getroot()
        img = cv2.imread(self._imgpath % img_id)
        height, width, channels = img.shape

        if self.target_transform is not None:
            target = self.target_transform(target, width, height)

        if self.transform is not None:
            target = np.array(target)
            img, boxes, labels = self.transform(img, target[:, :4], target[:, 4])
            # to rgb
            img = img[:, :, (2, 1, 0)]
            # img = img.transpose(2, 0, 1)
            target = np.hstack((boxes, np.expand_dims(labels, axis=1)))
        return torch.from_numpy(img).permute(2, 0, 1), target, height, width
        # return torch.from_numpy(img), target, height, width

    def pull_image(self, index):
        '''Returns the original image object at index in PIL form

        Note: not using self.__getitem__(), as any transformations passed in
        could mess up this functionality.

        Argument:
            index (int): index of img to show
        Return:
            PIL img
        '''
        img_id = self.ids[index]
        return cv2.imread(self._imgpath % img_id, cv2.IMREAD_COLOR)

    def pull_anno(self, index):
        '''Returns the original annotation of image at index

        Note: not using self.__getitem__(), as any transformations passed in
        could mess up this functionality.

        Argument:
            index (int): index of img to get annotation of
        Return:
            list: [img_id, [(label, bbox coords),...]]
                eg: ('001718', [('dog', (96, 13, 438, 332))])
        '''
        img_id = self.ids[index]
        anno = ET.parse(self._annopath % img_id).getroot()
        gt = self.target_transform(anno, 1, 1)
        return img_id[1], gt

    def pull_tensor(self, index):
        '''Returns the original image at an index in tensor form

        Note: not using self.__getitem__(), as any transformations passed in
        could mess up this functionality.

        Argument:
            index (int): index of img to show
        Return:
            tensorized version of img, squeezed
        '''
        return torch.Tensor(self.pull_image(index)).unsqueeze_(0)

2.4init file

# from .voc0712 import VOCDetection, VOCAnnotationTransform, VOC_CLASSES, VOC_ROOT
from .fire import FIREDetection,FIREAnnotationTransform,FIRE_CLASSES,FIRE_ROOT
# from .coco import COCODetection, COCOAnnotationTransform, COCO_CLASSES, COCO_ROOT, get_label_map

2.5 ssd.py

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
from layers import *
from data import voc, coco, fire ############
import os

 def __init__(self, phase, size, base, extras, head, num_classes):
        super(SSD, self).__init__()
        self.phase = phase
        self.num_classes = num_classes
        self.cfg = (coco, voc, fire)[num_classes == 2] ###########
        self.priorbox = PriorBox(self.cfg)
        # self.priors = Variable(self.priorbox.forward(), volatile=True)
        with torch.no_grad(): ############
            self.priors = self.priorbox.forward() #############
        self.size = size

2.6 train.py

The batcsize and other parameters in the hyperparameters can be modified according to your actual situation.

parser = argparse.ArgumentParser(
    description='Single Shot MultiBox Detector Training With Pytorch')
train_set = parser.add_mutually_exclusive_group()
parser.add_argument('--dataset', default='FIRE', choices=['VOC', 'COCO','FIRE'],########
                    type=str, help='VOC or COCO')
parser.add_argument('--dataset_root', default=FIRE_ROOT, ########
                    help='Dataset root directory path')

In the train function, imitate and add your own data set

 elif args.dataset == 'FIRE':
        # if args.dataset_root == FIRE_ROOT:
        # parser.error('Must specify dataset if specifying dataset_root')
         cfg=fire
         # print(cfg)
         dataset = FIREDetection(root=args.dataset_root,
                               transform=SSDAugmentation(cfg['min_dim'],
                                                         MEANS))

If you use CPU like me, remember to modify all .cuda in the file and –GPU in the parameters.

The modification is complete here. You can try to run it. If it runs smoothly, congratulations! (Although the probability is not high)

If you encounter problems, don’t worry, just read the third part~

####################### Supplementary breakpoint continuation training###################### #

**When we are affected by the long training time and cannot finish training at once but are too sleepy, we can use the breakpoint resume training function! In this way, we can continue training from the breakpoint the next day without letting the previous efforts go to waste. The specific operations are as follows:

First, set the resume parameter of the parameter list to True:

parser.add_argument('--resume', default=True, type=str,
                    help='Checkpoint state_dict file to resume training from')

Then load the weight path address (here is “weights/FIRE.pth”) and modify it as follows

 if args.resume:
        print('Resuming training, loading {}...'.format(args.resume))
        # ssd_net.load_weights(args.resume)
        ssd_net.load_weights("weights/FIRE.pth")#############
    else:
        vgg_weights = torch.load(args.save_folder + args.basenet)
        print('Loading base network...')
        ssd_net.vgg.load_state_dict(vgg_weights)

In this way, when you start training again, you can continue training from the breakpoint!

3. Problem summary

3.1 StopIteration

Change images, targets = next(batch_iterator) in train.py to

try:
            images, targets = next(batch_iterator)
        except StopIteration:
            batch_iterator = iter(data_loader)
            images, targets = next(batch_iterator)

3.2 IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number

Change all data[0] in train.py to data.item()

 loc_loss + = loss_l.data.item()#############
        conf_loss + = loss_c.data.item()###########

        # if iteration % 10 == 0:
        # print('timer: %.4f sec.' % (t1 - t0))
        # print('iter ' + repr(iteration) + ' || Loss: %.4f ||' % (loss.data.item()), end=' ')

        print('timer: %.4f sec.' % (t1 - t0)) #Print once for each iteration
        print('iter ' + repr(iteration) + ' || Loss: %.4f ||' % (loss.data.item()), end=' ')###### ########

        if args.visdom:
            update_vis_plot(iteration, loss_l.data.item(), loss_c.data.item(),##########
                            iter_plot, epoch_plot, 'append')

        if iteration != 0 and iteration % 5000 == 0:
            print('Saving state, iter:', iteration)
            torch.save(ssd_net.state_dict(), 'weights/ssd300_FIRE_' + ###########I missed one thing, and I also need to change it to my own data set name here.
                       repr(iteration) + '.pth')

3.3 xavier_uniform has been deprecated

Change init.xavier_uniform(param) in train.py to

def xavier(param):
    init.xavier_uniform_(param)

3.4 UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead.molded_images = Variable(molded_images, volatile=True)

Remove volatile=True in train.py

images = Variable(images)
targets = [Variable(ann) for ann in targets]

3.5 UserWarning: size_average and reduce is now deprecated,please use reduction=sum’ instead

Modify in multibox_loss.py to

loss_l = F.smooth_l1_loss(loc_p, loc_t, reduction='sum')

loss_c = F.cross_entropy(conf_p, targets_weighted, reduction='sum')

3.6 UserWarning: torch.set_default_tensor_type() is deprecated as of PyTorch 2.1

In train.py, modify torch.set_default_tensor_type(‘torch.FloatTensor’) to

torch.set_default_dtype(torch.float32)

3.7 ValueError: setting an array element with a sequence.

Modify in the augmentations.py file

 def __init__(self):
        self.sample_options = np.array([
            # using entire original input image
            None,
            # sample a patch s.t. MIN jaccard w/ obj in .1,.3,.4,.7,.9
            (0.1, None),
            (0.3, None),
            (0.7, None),
            (0.9, None),
            # randomly sample a patch
            (None, None),
        ],dtype=object) # When the randomly selected array dimensions are inconsistent, dtype=object needs to be added (problems caused by too high numpy versions)

3.8 Others

Summary of problems encountered in the article SSD’s pytorch implementation training_userwarning: volatile was removed and now has no Suggestions given in the e-CSDN blog, but I don’t seem to have encountered them. If there are any errors, you can refer to them.

4. Run the display

After a series of operations, it finally ran successfully! (This is modified to print information once in each round)~~~

The knowledge points of the article match the official knowledge files, and you can further learn relevant knowledge. Python entry skill treeArtificial intelligenceDeep learning 388,920 people are learning the system