Data set of target points using yolov5 target detection (based on bamboo sticks)

Preface:

In order to complete a certain job, I recently learned artificial intelligence by myself. I saw a program that automatically detects the number of photos taken with a mobile phone. I thought it was quite good, so I started researching it myself. I would like to share my experience summary and insights here. (Because I wrote it for a long time afterwards, I have almost forgotten some of the details I encountered in the middle)

1. Download and install yolov5

I am using version 5.0. Here are two methods: First, you can download it from the official website. It is recommended to download it from the official website, but the speed limit may be a bit slow. Yolov5 official: Redirecting…Quickstart – Ultralytics YOLOv8 DocsRedirecting…

Second, I shared it on Baidu Netdisk, and friends in need can download it themselves. Link: https://pan.baidu.com/s/1AlcE–5ape7kRxBO1RCokQ?pwd=8888
Extraction code: 8888

I won’t say much about the installation here. Many bloggers have shared it. The installation process may encounter pitfalls and require some attempts. Here you can think about it by yourself. If you don’t understand, you can leave a message and we can discuss it together.

2. Bamboo stick data set preparation

Since the label type in the original detect.py does not have the ‘bamboo stick’ category, you need to take pictures and train it yourself. You need to prepare some bamboo sticks to take pictures by yourself, about one or two hundred pictures are enough. One thing to mention here is that you can batch name the photos you took, otherwise the names will be confusing and difficult to distinguish. The code is shared below.

path: Change the required path; (.xml) into .jpg or .jpeg format; i = 0, which is the starting number of the mark; I have personally tested it and recommended it to everyone.

import os


class BatchRename():

    def rename(self):
        path = "E:/D drive backup/desktop/ss"
        filelist = os.listdir(path)
        total_num = len(filelist)
        i=88
        for item in filelist:
            if item.endswith('.xml'):
                src = os.path.join(os.path.abspath(path), item)
                dst = os.path.join(os.path.abspath(path), '' + str(i) + '.xml')
                try:
                    os.rename(src, dst)
                    i + = 1
                except:
                    continue
        print('total %d to rename & converted %d jpg' % (total_num, i))


if __name__ == '__main__':
    demo = BatchRename()
    demo.rename()

3. Manual tagging

Installing labelimg, I won’t go into details here. I also encountered a lot of pitfalls during the installation. When installing, be sure to install it in the Anaconda environment you have configured yourself, otherwise it will easily fail. When configuring the requirements file, it is recommended to install from domestic open source mirrors, such as: pip install requirements -i https://pypi.mirrors .ustc.edu.cn/simple/. Then I made a tag. I watched the up master “Brother Pao Takes You to Learn” on Bizhan. He explained it in more detail. Just follow him step by step. It is suitable for novices.

Some people will definitely ask whether it is exhausting to label more than a hundred photos one by one. Here you can use semi-automatic labeling. Before semi-automatic annotation,first manually label 10 photos. Be careful not to use yolov5 format when labeling. Try to use pascal voc format. The labels produced are xml. The generated xml format needs to be converted into txt, because yolov5 only recognizes txt format. Then, the imgaug data set is enhanced on the 10 labeled photos, so that there are more pre-trained training sets. My data enhancement is based on the reference of Offline Data Enhancement of Target Detection Data Set_Xiao Cen’s Blog-CSDN Blog, which is relatively easy to use. The photo after data augmentation is shown in the figure. Note that some labels in the enhanced data set may be misaligned. You can delete the photos or change them manually.

For the enhanced training set, change the xml to label in txt format, share the code for converting xml to txt, and distinguish train and val according to the folder requirements of Bizhan. (Note that the folder name must be named strictly in accordance with the video requirements)

import xml.etree.ElementTree as ET
import pickle
import os
from os import listdir, getcwd
from os.path import join
import random
from shutil import copyfile

classes = ["zhuqian"]
# classes=["ball"]

TRAIN_RATIO = 90


def clear_hidden_files(path):
    dir_list = os.listdir(path)
    for i in dir_list:
        abspath = os.path.join(os.path.abspath(path), i)
        if os.path.isfile(abspath):
            if i.startswith("._"):
                os.remove(abspath)
        else:
            clear_hidden_files(abspath)


def convert(size, box):
    dw = 1./size[0]
    dh = 1. / size[1]
    x = (box[0] + box[1]) / 2.0
    y = (box[2] + box[3]) / 2.0
    w = box[1] - box[0]
    h = box[3] - box[2]
    x = x * dw
    w = w * dw
    y = y * dh
    h = h * dh
    return (x, y, w, h)


def convert_annotation(image_id):
    in_file = open('VOCdevkit/VOC2007/Annotations/%s.xml' % image_id)
    out_file = open('VOCdevkit/VOC2007/YOLOLabels/%s.txt' % image_id, 'w')
    tree = ET.parse(in_file)
    root = tree.getroot()
    size = root.find('size')
    w = int(size.find('width').text)
    h = int(size.find('height').text)

    for obj in root.iter('object'):
        difficult = obj.find('difficult').text
        cls = obj.find('name').text
        if cls not in classes or int(difficult) == 1:
            continue
        cls_id = classes.index(cls)
        xmlbox = obj.find('bndbox')
        b = (float(xmlbox.find('xmin').text), float(xmlbox.find('xmax').text), float(xmlbox.find('ymin').text),
             float(xmlbox.find('ymax').text))
        bb = convert((w, h), b)
        out_file.write(str(cls_id) + " " + " ".join([str(a) for a in bb]) + '\
')
    in_file.close()
    out_file.close()


wd = os.getcwd()
wd = os.getcwd()
data_base_dir = os.path.join(wd, "VOCdevkit/")
if not os.path.isdir(data_base_dir):
    os.mkdir(data_base_dir)
work_sapce_dir = os.path.join(data_base_dir, "VOC2007/")
if not os.path.isdir(work_sapce_dir):
    os.mkdir(work_sapce_dir)
annotation_dir = os.path.join(work_sapce_dir, "Annotations/")
if not os.path.isdir(annotation_dir):
    os.mkdir(annotation_dir)
clear_hidden_files(annotation_dir)
image_dir = os.path.join(work_sapce_dir, "JPEGImages/")
if not os.path.isdir(image_dir):
    os.mkdir(image_dir)
clear_hidden_files(image_dir)
yolo_labels_dir = os.path.join(work_sapce_dir, "YOLOLabels/")
if not os.path.isdir(yolo_labels_dir):
    os.mkdir(yolo_labels_dir)
clear_hidden_files(yolo_labels_dir)
yolov5_images_dir = os.path.join(data_base_dir, "images/")
if not os.path.isdir(yolov5_images_dir):
    os.mkdir(yolov5_images_dir)
clear_hidden_files(yolov5_images_dir)
yolov5_labels_dir = os.path.join(data_base_dir, "labels/")
if not os.path.isdir(yolov5_labels_dir):
    os.mkdir(yolov5_labels_dir)
clear_hidden_files(yolov5_labels_dir)
yolov5_images_train_dir = os.path.join(yolov5_images_dir, "train/")
if not os.path.isdir(yolov5_images_train_dir):
    os.mkdir(yolov5_images_train_dir)
clear_hidden_files(yolov5_images_train_dir)
yolov5_images_test_dir = os.path.join(yolov5_images_dir, "val/")
if not os.path.isdir(yolov5_images_test_dir):
    os.mkdir(yolov5_images_test_dir)
clear_hidden_files(yolov5_images_test_dir)
yolov5_labels_train_dir = os.path.join(yolov5_labels_dir, "train/")
if not os.path.isdir(yolov5_labels_train_dir):
    os.mkdir(yolov5_labels_train_dir)
clear_hidden_files(yolov5_labels_train_dir)
yolov5_labels_test_dir = os.path.join(yolov5_labels_dir, "val/")
if not os.path.isdir(yolov5_labels_test_dir):
    os.mkdir(yolov5_labels_test_dir)
clear_hidden_files(yolov5_labels_test_dir)

train_file = open(os.path.join(wd, "yolov5_train.txt"), 'w')
test_file = open(os.path.join(wd, "yolov5_val.txt"), 'w')
train_file.close()
test_file.close()
train_file = open(os.path.join(wd, "yolov5_train.txt"), 'a')
test_file = open(os.path.join(wd, "yolov5_val.txt"), 'a')
list_imgs = os.listdir(image_dir) # list image files
prob = random.randint(1, 100)
print("Probability: %d" % prob)
for i in range(0, len(list_imgs)):
    path = os.path.join(image_dir, list_imgs[i])
    if os.path.isfile(path):
        image_path = image_dir + list_imgs[i]
        voc_path = list_imgs[i]
        (nameWithoutExtention, extention) = os.path.splitext(os.path.basename(image_path))
        (voc_nameWithoutExtention, voc_extention) = os.path.splitext(os.path.basename(voc_path))
        annotation_name = nameWithoutExtention + '.xml'
        annotation_path = os.path.join(annotation_dir, annotation_name)
        label_name = nameWithoutExtention + '.txt'
        label_path = os.path.join(yolo_labels_dir, label_name)
    prob = random.randint(1, 100)
    print("Probability: %d" % prob)
    if (prob < TRAIN_RATIO): # train dataset
        if os.path.exists(annotation_path):
            train_file.write(image_path + '\
')
            convert_annotation(nameWithoutExtention) # convert label
            copyfile(image_path, yolov5_images_train_dir + voc_path)
            copyfile(label_path, yolov5_labels_train_dir + label_name)
    else: # test dataset
        if os.path.exists(annotation_path):
            test_file.write(image_path + '\
')
            convert_annotation(nameWithoutExtention) # convert label
            copyfile(image_path, yolov5_images_test_dir + voc_path)
            copyfile(label_path, yolov5_labels_test_dir + label_name)
train_file.close()
test_file.close()

Then use train.py for pre-training. (How to use it, including which parameters need to be changed can be searched on the blog) After a long training, the runs/train/exp/weights/best.pt file is formed (as shown in the picture),

4. Semi-automatic annotation

Open detect.py and use the best.pt file formed by pre-training as a weight for detection. Of course, the photos detected here are the more than 100 photos you took minus the trained photos. What you need to pay attention to here is the –save-txt item in detect.py, and finally add default=True (the code is as follows). In this way, a .txt file corresponding to the border can be generated, which can be easily converted into .xml format or modified later.

parser.add_argument('--save-txt', action='store_true', help='save results to *.txt',default=True)

Looking at the target detection results, the results are generally pretty good, but manual labelimg must still be used for fine-tuning. However, using such semi-automatic labeling can greatly reduce the trouble of manual labeling.

5. Training manually fine-tuned data sets

Before training, if you feel that the data set is still small, you can use data enhancement again. Basically train according to the above ideas. The number of photos for this training will be large. (Of course the training time will also multiply)

Let’s go here first. The training results need to be sorted out before being shown to everyone. If you don’t understand or need photos, you can like + collect them and send them to everyone for free.