Train your own data set based on the darknet framework + yolov3

Directory

1. Prepare annotation tools

2. Dataset preparation

(1) Download the dataset

(2) Processing datasets

3. Modify the corresponding configuration file

(1) Modify voc.data and voc.names

(2) Modify the yolov3.cfg configuration file

4. Download the weight file

5. Training dataset

(1) Training starts

(2) Training results

(3) Test results

The process of compiling and installing DarkNet (detailed explanation without GPU)

Two IP Cameras + YOLOV3 for target detection (the phone camera is used as a computer camera)

The windows platform uses the CMake tool to compile and install darknet + yolov3 + image detection + camera detection + video detection + mobile phone as a camera detection (detailed explanation)

Tip:

If the reader has not used the CMake tool to compile the darknet source code, then please read the above article “Windows platform uses the CMake tool to compile darknet and the installation process + yolov3 + image detection + camera detection + video detection + mobile phone as a camera detection ( Detailed explanation)”;

If the darknet has been compiled on the windows platform, you can directly perform the following operations.

The official tutorial for training your own dataset

The following entire process structure diagram:

1. Prepare annotation tools

(1) Image labeling tool: pip install labelimg

(2) Open the labeling tool: labelimg (after activating the corresponding virtual environment in the command window of windows)

(3) Select the image folder to be marked;

2. Dataset preparation

(1) Download Dataset

Link: https://pan.baidu.com/s/18R30A4NtFJ2vpLEIk8I1-w
Extraction code: b61k

Reminder: The data sets downloaded above have already been processed, but if readers want to label their own data sets, it is recommended that the storage structure of the files be as follows while labeling the data sets:

VOCdevkit

VOC2007

Annotations (XML file obtained after storing annotated images)

ImageSets (.txt files containing paths to store images)

?Main

?train.txt

test.txt

val.txt

JPEGImages (the location where the corresponding annotation image is stored. jpg)

labels (the file generated after the program processing the data set, which contains the txt file corresponding to each image, the .txt file contains : [class,cx,cy,w,h])

2007_test.txt (contains the absolute path of the image used for the test set)

2007_train.txt (contains the absolute path of the image used for the training set)

2007_val.txt(contains the absolute path of the image used in the validation set)

train.all.txt(contains the absolute path of all images)

train.txt (contains absolute paths to images used for training and validation sets)

Reminder: Why do you recommend readers to place the data set according to the above structure, mainly because the given path in the program for processing the data set is as shown in the above style, and the above format is relatively clear.

(2) Processing Dataset

Tips: The code looks a bit long, but don’t be afraid readers, the key points are explained and it is easy to understand. The following part of the code has already been given, the path is:

import os
import pickle
import random
import numpy as np
from PIL import Image
from os.path import join
from os import listdir, getcwd
import xml.etree.ElementTree as ET

import cv2

# sets=[('2012', 'train'), ('2012', 'val'), ('2007', 'train'), ('2007 ', 'val'), ('2007', 'test')]

# classes = ["aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat\ ", "chair", "cow", "diningtable", "dog", "horse", "motorbike", "person", "pottedplant", "sheep", "sofa", "train", "tvmonitor"]
sets =('2007', 'train'), ('2007', 'val'), ('2007', 'test')
classes = ['face']

#Convert the format of the data to the center coordinates and the height and width of the image [cx, cy, w, h]
def convert(size, box):
    dw = 1./(size[0])
    dh = 1./(size[1])
    x = (box[0] + box[1])/2.0 - 1
    y = (box[2] + box[3])/2.0 - 1
    w = box[1] - box[0]
    h = box[3] - box[2]
    x = x*dw
    w = w*dw
    y = y*dh
    h = h*dh
    return (x,y,w,h)

"""
Relevant information under the reader XML file, such as the height and width of the object in the marked image and the size of the image
Note: The obtained box [xmin, ymin, xmax, ymax] in XML represents the coordinates (xmin, ymin) of the upper left corner of the object in the image
And the coordinates of the lower right corner (xmax, ymax); but the format of the data needs to be converted to the center coordinates and the height and width of the image [cx, cy, w, h]
"""
def convert_annotation(year, image_id):
    in_file = open('VOCdevkit/VOC%s/Annotations/%s.xml'%(year, image_id))
    out_file = open('VOCdevkit/VOC%s/labels/%s.txt'%(year, image_id), 'w')
    tree=ET. parse(in_file)
    root = tree. getroot()
    size = root. find('size')
    w = int(size. find('width'). text)
    h = int(size. find('height'). text)

    for obj in root.iter('object'):
        difficult = obj.find('difficult').text
        cls = obj.find('name').text
        if cls not in classes or int(difficult)==1:
            continue
        cls_id = classes. index(cls)
        xmlbox = obj. find('bndbox')
        b = (float(xmlbox.find('xmin').text), float(xmlbox.find('xmax').text), float(xmlbox.find('ymin').text) , float(xmlbox.find('ymax').text))
        bb = convert((w,h), b)
        out_file.write(str(cls_id) + " " + " ".join([str(a) for a in bb]) + '\\
')


"""
Read the files ImageSets/Main/train.txt, ImageSets/Main/val.txt, ImageSets/Main/test.txt
The name of the image in the image, so that the path of the image + the image name constitutes the full path of the image, and the image is read.
"""
def VOC2007():
    wd = getcwd()

    for year, image_set in sets:
        if not os.path.exists('VOCdevkit/VOC%s/labels/'%(year)):
            os.makedirs('VOCdevkit/VOC%s/labels/'%(year))
        image_ids = open('VOCdevkit/VOC%s/ImageSets/Main/%s.txt'%(year, image_set)).read().strip().split()
        list_file = open('%s_%s.txt'%(year, image_set), 'w')
        for image_id in image_ids:
            list_file.write('%s/VOCdevkit/VOC%s/JPEGImages/%s.jpg\\
'%(wd, year, image_id))
            convert_annotation(year, image_id)
        list_file. close()

    os.system("cat 2007_train.txt 2007_val.txt 2012_train.txt 2012_val.txt > train.txt")
    os.system("cat 2007_train.txt 2007_val.txt 2007_test.txt 2012_train.txt 2012_val.txt > train.all.txt")

"""
Scale the size of the image to the specified image size
"""
def resizeImage():
    """
    :return:
    """
    imgPath = r"E:\tempImage"
    imgs_list = os.listdir(imgPath)
    for img_name in imgs_list[:200]:
        img_path = os.path.join(imgPath,img_name)
        img = cv2.imread(img_path)
        newimg = cv2.resize(src = img,dsize=(416,416))
        cv2.imwrite(filename='../myDataset/VOC2007/JPEGImages/' + str(img_name), img = newimg)
    cv2.destroyAllWindows()

"""
Since there are no files under ImageSets/Main
ImageSets/Main/train.txt, ImageSets/Main/val.txt, ImageSets/Main/test.txt
Therefore, it is necessary to divide the image data set into training set, verification set, and test set according to the image, and store the names of these data sets in
In train.txt, val.txt, test.txt files
"""
def ImageSets():
    train_val_ratio = 0.8
    val_ratio = 0.1
    test_ratio = 0.1
    xmlfilepath = 'VOCdevkit/VOC2007/Annotations'
    mainPath = 'VOCdevkit/VOC2007/ImageSets/Main'
    if not os.path.exists(mainPath):
        os.makedirs(mainPath)

    xml_list = os.listdir(xmlfilepath)
    total_num = len(xml_list)
    #Read in the order of their file names
    xml_list.sort(key=lambda x:int(x.split('.')[0]))
    total_range = range(total_num)
    n_train = int(total_num * train_val_ratio)
    n_val = int(total_num * val_ratio)
    n_test = int(total_num * test_ratio)

    # trainPath = os.path.join('VOCdevkit/VOC2007/ImageSets/Main','train')
    # valPath = os.path.join('VOCdevkit/VOC2007/ImageSets/Main','val')
    # testPath = os.path.join('VOCdevkit/VOC2007/ImageSets/Main','test')
    # if not os.path.exists(trainPath):
    # os.mkdir(trainPath)
    # if not os.path.exists(valPath):
    # os.mkdir(valPath)
    # if not os.path.exists(testPath):
    # os.mkdir(testPath)

    train = open('VOCdevkit/VOC2007/ImageSets/Main/train.txt', 'w')
    test = open('VOCdevkit/VOC2007/ImageSets/Main/test.txt', 'w')
    val = open('VOCdevkit/VOC2007/ImageSets/Main/val.txt', 'w')

    for xml_idx in total_range[:n_train]:
        xml_name = xml_list[xml_idx]
        file = xml_name.split('.')[0] + '\\
'
        train. write(file)
    for xml_idx in total_range[n_train:n_train + n_val]:
        xml_name = xml_list[xml_idx]
        file= xml_name.split('.')[0] + '\\
'
        val. write(file)
    for xml_idx in total_range[n_train + n_val:total_num]:
        xml_name = xml_list[xml_idx]
        file = xml_name.split('.')[0] + '\\
'
        test. write(file)

    train. close()
    test. close()
    val. close()

"""
Due to an error encountered during the training process, the image that does not need to be 8-bit deep is converted to a 24-bit deep image
"""
def changeDepthBit():
    path8 = r'VOCdevkit/VOC2007/JPEGImages'
    newpath24 = r'VOCdevkit/VOC2007/ImageDepth24'

    files8 = os.listdir(path8)
    files8.sort(key=lambda x:int(x.split('.')[0]))
    for img_name in files8:
        imgpath = os.path.join(path8,img_name)
        img = Image.open(imgpath).convert('RGB')
        file_name, file_extend = os.path.splitext(img_name)
        dst = os.path.join(newpath24, file_name + '.jpg')
        img. save(dst)


if __name__ == '__main__':
    # resizeImage()
    # ImageSets()
    # VOC2007()
    changeDepthBit()
    pass

3. Modify the corresponding configuration file

(1) Modify voc.data and voc.names

Tip: Copy voc.data and voc.names under darknet-master-yolov4\darknet-master\build\darknet\x64\data path to the file directory of your own project.

Modify voc.data:

Modify voc.names:

(2) Modify yolov3.cfg configuration file

Detailed explanation of the meaning of each parameter in the yolo configuration file

max_batches modification principle: change the line max_batches to (classes*2000, but not less than the number of training images, and not less than 6000), if you trained 3 classes, then f.e.max_batches=6000.

Tip: A modification like the one above has three changes that need to be made, the one given above is only the first. The places that need to be modified are after yolo and after convolutional.

4. Download the weight file

Darknet’s official homepage

weight file download link

5. Training data set

(1) Training start

darknet.exe detector train data/voc.data cfg/yolov3.cfg preTrain/darknet53.conv.74

data/voc.data

cfg/yolov3.cfg

preTrain/darknet53.conv.74

(2) Training result

(3) Test result

Tip: Readers can use the following link for testing:

https://mydreamambitious.blog.csdn.net/article/details/125520487

Or just use the commands used in the introduction about darknet compilation to test.

"""
@Author : Keep_Trying_Go
@Major : Computer Science and Technology
@Hobby : Computer Vision
@Time : 2023/5/24 8:40
"""

import os
import cv2
import numpy as np

#create window
# cv2.namedWindow(winname='detect',flags=cv2.WINDOW_AUTOSIZE)
# cv2.resizeWindow(winname='detect',width=750,height=600)

#Read YOLO-V3 weight file and network configuration file
net=cv2.dnn.readNet(model='backup/yolov3_final.weights',config='cfg/yolov3.cfg')

#Set confidence threshold and threshold for non-maximum suppression
Confidence_thresh=0.2
Nms_thresh=0.35
#Read the categories in the coco.names file
with open('data/voc.names','r') as fp:
    classes = fp.read().splitlines()

#yolo-v3 detection
def detect(frame):
    # get network model
    model=cv2.dnn_DetectionModel(net)
    #Set the input parameters of the network
    model.setInputParams(scale=1/255, size=(416,416))
    # make predictions
    class_id,scores,boxes=model.detect(frame,confThreshold=Confidence_thresh,
                 nmsThreshold=Nms_thresh)
    #return predicted categories and coordinates
    return class_id, scores, boxes


#Real-time detection
def detect_time():
    #Open camera 'video/los_angeles.mp4' or 'video/soccer.mp4'
    cap=cv2.VideoCapture(0)

    while cap.isOpened():
        OK, frame=cap. read()
        if not OK:
            break

        frame=cv2. flip(src=frame, flipCode=2)
        # frame=cv2.resize(src=frame,dsize=(416,416))
        # make predictions
        class_ids,scores,boxes=detect(frame)
        #draw rectangle
        for (class_id,box) in enumerate(boxes):
            (x,y,w,h)=box
            class_name = classes[class_ids[class_id]]
            confidence = scores[class_id]
            confidence=str(round(confidence,2))
            cv2.rectangle(img=frame,pt1=(x,y),pt2=(x+w,y+h),
                          color=(0,255,0),thickness=2)
            text=class_name + ' ' + confidence
            cv2.putText(img=frame, text=text,
                        org=(x,y-10),fontFace=cv2.FONT_HERSHEY_SIMPLEX,
                        fontScale=1.0,color=(0,255,0),thickness=2)

        cv2.imshow('detect', frame)
        key=cv2.waitKey(1)
        if key==27:
            break

    cap. release()
# Detection of a single image
def signal_detect(image_path='data/2141.png'):
    frame=cv2.imread(image_path)
    frame = cv2.resize(src=frame, dsize=(416, 416))
    # Make predictions
    class_ids, scores, boxes = detect(frame)
    # Draw a rectangle
    for (class_id, box) in enumerate(boxes):
        (x, y, w, h) = box
        class_name = classes[class_ids[class_id]]
        confidence = scores[class_ids[class_id]]
        confidence = str(round(confidence, 2))
        cv2.rectangle(img=frame, pt1=(x, y), pt2=(x + w, y + h),
                      color=(0, 255, 0), thickness=2)
        text = class_name + ' ' + confidence
        cv2.putText(img=frame, text=text,
                    org=(x, y - 10), fontFace=cv2.FONT_HERSHEY_SIMPLEX,
                    fontScale=1.0, color=(0, 255, 0), thickness=2)

    cv2.imshow('detect', frame)
    cv2.waitKey(0)

cv2.destroyAllWindows()


if __name__ == '__main__':
    print('Pycharm')
    #signal_detect()
    detect_time()