YOLO V3 SPP ultralytics Section 1: Convert VOC annotation file (xml) to YOLO annotation format (txt) and how to customize YOLO data samples

Directory

1 Introduction

2. About PASCAL VOC dataset xml –> YOLO txt format

2.1 Path setting

2.2 Function to read xml file

2.3 xml —> yolo txt

2.4 yolo’s label file

2.6 Results

2.7 Code

3. Custom YOLO dataset

3.1 Preparatory work

3.2 open labelimg

3.3 Drawing

The code reference is the boss of station b: 3.2 YOLOv3 SPP source code analysis (Pytorch version)

Link to PASCAL VOC dataset: http://host.robots.ox.ac.uk/pascal/VOC/voc2012/

The converted yolo format data set is divided into two, one is too large to upload

Training set: training set in yolo format for PASCAL VOC target detection

Verification set: Verification set of yolo format for PASCAL VOC target detection

1. Foreword

The label file of target detection is different from classification and segmentation. Generally speaking, in classification tasks, pictures of the same category are placed in the same directory, and the index of the file name is the name of the category. In the segmentation task, different training images correspond to different multi-threshold images, that is, the training is an image, and the label is also an image.

The label of target detection is divided into two types, one is the category of the target to be detected, such as cats, dogs, etc. The other is the position of the target, marked with a bounding box, often a rectangular box of xmin, xman, ymin, ymax.

Usually, the label of target detection is annotated with xml file

For example, in the object below, there are two categories of horse and person, and there are four parameters below the corresponding category that are the information of the bounding box

However, the yolo algorithm causes such xml to not satisfy the yolo format, so an xml-to-yolo format operation is required

As follows, 12 refers to the category of detection, and the next four parameters are the information of the x, y, w, h bounding box

The yolo bounding box is based on the center coordinates of the bounding box, w, h relative to the entire image

2. About PASCAL VOC dataset xml –> YOLO txt format

This chapter only completes the work of data conversion

In the beginning, my_yolo_dataset and my_data_label.names did not exist, but were generated by trans_voc2yolo.py to convert the data of VOCdevkit

2.1 Path Setting

The VOC data set is separate and used for different tasks, here only for target detection tasks

Annotations put the xml tag file for target detection
train.txt, val.txt put the file name of the training set and validation set
JPEGImages put all VOC pictures

2.2 Function to read xml file

as follows:

The code here is implemented recursively. I don’t understand it very much. Just know how to use it.

The following is to read an xml file and return the dictionary information

{‘annotation’: {‘folder’: ‘VOC2012’, ‘filename’: ‘2008_000008.jpg’, ‘source’: {‘database’: ‘The VOC2008 Database’, ‘annotation’: ‘PASCAL VOC2008’, ‘image’: ‘flickr’}, ‘size’: {‘width’: ‘500’, ‘height’: ‘442’, ‘depth’: ‘3’}, ‘segmented’: ‘0’, ‘object’: [{‘name’ : ‘horse’, ‘pose’: ‘Left’, ‘truncated’: ‘0’, ‘occluded’: ‘1’, ‘bndbox’: { ‘xmin’: ’53’, ‘ymin’: ’87’, ‘xmax’: ‘471’, ‘ymax’: ‘420’}, \ ‘difficult’: ‘0’}, {‘name’: ‘person’, ‘pose’: ‘Unspecified’, ‘truncated’: ‘1’, ‘pose’: ‘Unspecified’, ‘truncated’: ‘1’, \ ‘occluded’: ‘0’, ‘bndbox’: {‘xmin’: ‘158’, ‘ymin’: ’44’, ‘xmax’: ‘ 289’, ‘ymax’: ‘167’}, ‘difficult’: ‘0’}]}}

Then traverse the bounding box under the key as object

Note that index here is the index, starting from 0. Here are the values of the first index and obj

Finally, convert the bounding box to the width and height of the center point coordinates, and then change it to the relative value of the entire image.

2.4 yolo’s label file

The implementation code is as follows:

It is also very simple here, just take out the VOC key and store it

2.6 Results

The operation process is as follows

The generated yolo dataset directory is as follows:

yolo’s label information:

2.7 Code

The converted code is as follows:

"""
This script has two functions:
1. Convert the voc dataset annotation information (.xml) to yolo annotation format (.txt), and copy the image file to the corresponding folder
2. According to the json label file, generate the corresponding names label (my_data_label.names)
"""

import os
from tqdm import tqdm
from lxml import etree
import json
import shut-off


# Read the xml file information and return it in dictionary form
def parse_xml_to_dict(xml):
    """
    Parse the xml file into a dictionary, refer to recursive_parse_xml_to_dict of tensorflow
    Args:
        xml: xml tree obtained by parsing XML file contents using lxml.etree

    Returns:
        Python dictionary holding XML contents.
    """

    if len(xml) == 0: # Traverse to the bottom layer and directly return the information corresponding to the tag
        return {xml. tag: xml. text}

    result = {}
    for child in xml:
        child_result = parse_xml_to_dict(child) # Recursively traverse label information
        if child.tag != 'object':
            result[child.tag] = child_result[child.tag]
        else:
            if child.tag not in result: # Because there may be multiple objects, they need to be put in the list
                result[child. tag] = []
            result[child.tag].append(child_result[child.tag])
    return {xml. tag: result}


# Convert the xml file to yolo's txt file
def translate_info(file_names: list, save_root: str, class_dict: dict, train_val='train'):
    """
    :param file_names: path of all training set/validation set images
    :param save_root: corresponding yolo file with save
    :param class_dict: json label of voc data
    :param train_val: Determine whether the input is a training set or a verification set
    """

    save_txt_path = os.path.join(save_root, train_val, "labels") # save yolo's txt label file
    if os.path.exists(save_txt_path) is False:
        os.makedirs(save_txt_path)

    save_images_path = os.path.join(save_root, train_val, "images") # save the training image file of yolo
    if os.path.exists(save_images_path) is False:
        os.makedirs(save_images_path)

    for file in tqdm(file_names, desc="translate {} file...".format(train_val)):
        # Check if the image file exists
        img_path = os.path.join(voc_images_path, file + ".jpg")
        assert os.path.exists(img_path), "file:{} not exist...".format(img_path)

        # Check if the xml file exists
        xml_path = os.path.join(voc_xml_path, file + ".xml")
        assert os.path.exists(xml_path), "file:{} not exist...".format(xml_path)

        # read xml
        with open(xml_path) as fid:
            xml_str = fid. read()
        xml = etree. fromstring(xml_str)
        data = parse_xml_to_dict(xml)["annotation"] # read xml file information
        img_height = int(data["size"]["height"]) # read in the h of the image
        img_width = int(data["size"]["width"]) # read in the w of the image

        # Determine whether the xml has ground truth
        assert "object" in data.keys(), "file: '{}' lack of object key.".format(xml_path)
        if len(data["object"]) == 0:
            # If there is no target in the xml file, return the image path and ignore the sample
            print("Warning: in '{}' xml, there are no objects.".format(xml_path))
            continue

        # Create a new yolo txt annotation file corresponding to xml, and write it
        with open(os. path. join(save_txt_path, file + ".txt"), "w") as f:
            for index, obj in enumerate(data["object"]): # index is the index starting from 0, obj is the dictionary file of object
                # Get the box information of each object
                xmin = float(obj["bndbox"]["xmin"])
                xmax = float(obj["bndbox"]["xmax"])
                ymin = float(obj["bndbox"]["ymin"])
                ymax = float(obj["bndbox"]["ymax"])
                class_name = obj["name"] # Get the classification of the bounding box
                class_index = class_dict[class_name] - 1 # target id starts from 0

                # Further check the data, some label information may have w or h as 0, such data will cause the calculation regression loss to be nan
                if xmax <= xmin or ymax <= ymin:
                    print("Warning: in '{}' xml, there are some bbox w/h <=0".format(xml_path))
                    continue

                # Convert box information to yolo format
                xcenter = xmin + (xmax - xmin) / 2 # center point coordinates
                ycenter = ymin + (ymax - ymin) / 2
                w = xmax - xmin # w and h of the bounding box
                h = ymax - ymin

                # Convert absolute coordinates to relative coordinates, save 6 decimal places
                xcenter = round(xcenter / img_width, 6)
                ycenter = round(ycenter / img_height, 6)
                w = round(w / img_width, 6)
                h = round(h / img_height, 6)

                info = [str(i) for i in [class_index, xcenter, ycenter, w, h]]

                if index == 0:
                    f.write(" ".join(info))
                else: # automatic line break
                    f.write("\\
" + " ".join(info))

        # Copy the image to the corresponding set
        path_copy_to = os.path.join(save_images_path, img_path.split(os.sep)[-1])
        if os.path.exists(path_copy_to) is False:
            shutil. copyfile(img_path, path_copy_to)


# Create a label file for yolo
def create_class_names(class_dict: dict):
    keys = class_dict.keys()
    with open("./data/my_data_label.names", "w") as w:
        for index, k in enumerate(keys):
            if index + 1 == len(keys):
                w. write(k)
            else:
                w.write(k + "\\
")


def main():
    # Read the json label file of the original voc data
    json_file = open(label_json_path, 'r')
    class_dict = json. load(json_file)

    # Read all the line information in the training set path file train.txt of the voc dataset, and delete the blank line
    with open(train_txt_path, "r") as r:
        train_file_names = [i for i in r.read().splitlines() if len(i.strip()) > 0]

    # voc information to yolo, and copy the image file to the corresponding folder
    translate_info(train_file_names, save_file_root, class_dict, "train")

    # Read all the line information in the voc dataset path file val.txt and delete the blank lines
    with open(val_txt_path, "r") as r:
        val_file_names = [i for i in r.read().splitlines() if len(i.strip()) > 0]
    # voc information to yolo, and copy the image file to the corresponding folder
    translate_info(val_file_names, save_file_root, class_dict, "val")

    # Create my_data_label.names file
    create_class_names(class_dict)


if __name__ == "__main__":
    # voc dataset root directory and version
    voc_root = "VOCdevkit"
    voc_version = "VOC2012"

    # Converted training set and validation set correspond to txt files
    train_txt = "train.txt"
    val_txt = "val.txt"

    # Converted file save directory, yolo format
    save_file_root = "./my_yolo_dataset"
    if os.path.exists(save_file_root) is False:
        os.makedirs(save_file_root)

    # The label tag corresponds to the json file
    label_json_path = './data/pascal_voc_classes.json'

    voc_images_path = os.path.join(voc_root, voc_version, "JPEGImages") # voc training image path
    voc_xml_path = os.path.join(voc_root, voc_version, "Annotations") # xml tag file path of voc
    train_txt_path = os.path.join(voc_root, voc_version, "ImageSets", "Main", train_txt) # voc training set path file
    val_txt_path = os.path.join(voc_root, voc_version, "ImageSets", "Main", val_txt) # voc validation set path file

    # Check if the file/folder exists
    assert os.path.exists(voc_images_path), "VOC images path not exist..."
    assert os.path.exists(voc_xml_path), "VOC xml path not exist..."
    assert os.path.exists(train_txt_path), "VOC train txt file not exist..."
    assert os.path.exists(val_txt_path), "VOC val txt file not exist..."
    assert os.path.exists(label_json_path), "label_json_path does not exist..."

    # start conversion
    main()