Use yolov5 to train your own data set and deploy yolov5 through flask

Use yolov5 to train your own data set (detailed process) and deploy yolov5 through flask

Github project address (click a little love)

My project address

1. Prepare data set

PASCAL VOC

This article uses PASCAL VOC extraction code: 07wp
Take the data set as an example. Place the data set in the project dataset directory. The data set structure is as follows:

---VOC2012
--------Annotations
---------------xml0
---------------xml1
--------JPEGImages
---------------img0
---------------img1
--------pascal_voc_classes.txt

Annotations are all xml files, JPEGImages are all image files, and pascal_voc_classes.txt is the category file.

Get tag file

The format of the yolo tag file is as follows:

102 0.682813 0.415278 0.237500 0.502778
102 0.914844 0.396528 0.168750 0.451389

The first label is the category of the object in the picture.
The last four digits are the position of the object in the picture, (xcenter, ycenter, h, w), which is the relative coordinates and relative height and width of the center position of the target object.
There are two goals in the above picture

If you already have the label file above, you can jump directly to the next step.
If there is no such label file, you can use labelimg to extract the code dbi2
Tag. Generate a label file in xml format, and then convert it to a label file in yolo format. The use of labelimg is very simple and I won’t go into details here.
Convert label files in xml format to yolo format,
Run the get_train_val.py file under the project.

python xml_yolo.py

pascal_voc_classes.txt is the json file corresponding to your category. The following is the voc data set category format.

["aeroplane","bicycle", "bird","boat","bottle","bus","car","cat\ ","chair","cow","diningtable","dog","horse","motorbike","person","pottedplant", "sheep","sofa","train", "tvmonitor"]

Path structure after running the above code

---VOC2012
--------Annotations
--------JPEGImages
--------pascal_voc_classes.json
---yolodata
--------images
--------labels

2. Divide training set and test set

The division of training set and test set is very simple. Just scramble the original data and then divide it into training set and test set according to 9:1.
Run the get_train_val.py file under the project.

python get_train_val.py

Running the above code will generate the following path structure

---VOC2012
--------Annotations
--------JPEGImages
--------pascal_voc_classes.json
---yolodata
--------images
--------labels
---traindata
--------images
----------------train
----------------val
--------labels
----------------train
----------------val

traindata is the final training file required

3. Training model

The training of yolov5 is very simple. This article has simplified the code. The code structure is as follows:

dataset #dataset
------traindata # Training data set
inference # Input and output interface
------inputs #Input data
------outputs # Output data
config # configuration file
------score.yaml # Training configuration file
------yolov5l.yaml # Model configuration file
models # model code
runs # log file
utils # code files
weights #Model saving path, last.pt, best.pt
train.py # training code
detect.py # test code

score.yaml is explained as follows:

# train and val datasets (image directory)
train: ./datasets/traindata/images/train/
val: ./datasets/traindata/images/val/
# number of classes
NC: 2
# class names
names: ['apple','banana']

train: is the train of image data, address
val: is the val and address of the image data
nc: is the number of categories
names: is the name corresponding to the category

yolov5l.yaml is explained as follows:

nc: 2 # number of classes
depth_multiple: 1.0 # model depth multiple
width_multiple: 1.0 # layer channel multiple
anchors:
  - [10,13, 16,30, 33,23] # P3/8
  - [30,61, 62,45, 59,119] # P4/16
  - [116,90, 156,198, 373,326] # P5/32
backbone:
  # [from, number, module, args]
  [[-1, 1, Focus, [64, 3]], # 1-P1/2
   [-1, 1, Conv, [128, 3, 2]], # 2-P2/4
   [-1, 3, Bottleneck, [128]],
   [-1, 1, Conv, [256, 3, 2]], # 4-P3/8
   [-1, 9, BottleneckCSP, [256]],
   [-1, 1, Conv, [512, 3, 2]], # 6-P4/16
   [-1, 9, BottleneckCSP, [512]],
   [-1, 1, Conv, [1024, 3, 2]], # 8-P5/32
   [-1, 1, SPP, [1024, [5, 9, 13]]],
   [-1, 6, BottleneckCSP, [1024]], # 10
  ]
head:
  [[-1, 3, BottleneckCSP, [1024, False]], # 11
   [-1, 1, nn.Conv2d, [na * (nc + 5), 1, 1, 0]], # 12 (P5/32-large)
   [-2, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 6], 1, Concat, [1]], # cat backbone P4
   [-1, 1, Conv, [512, 1, 1]],
   [-1, 3, BottleneckCSP, [512, False]],
   [-1, 1, nn.Conv2d, [na * (nc + 5), 1, 1, 0]], # 17 (P4/16-medium)
   [-2, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 4], 1, Concat, [1]], # cat backbone P3
   [-1, 1, Conv, [256, 1, 1]],
   [-1, 3, BottleneckCSP, [256, False]],
   [-1, 1, nn.Conv2d, [na * (nc + 5), 1, 1, 0]], # 22 (P3/8-small)
   [[], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5)
  ]

nc: is the number of target categories
depth_multiple and width_multiple: Control model depth and width. Different parameters correspond to: s, m, l, x model.
anchors: It is the basic box generated by k-means clustering on the input target box, and uses this basic box to predict the target box.
Yolov5 will automatically generate anchors. Yolov5 uses Euclidean distance to perform k-means clustering, and then uses a genetic algorithm to perform a series of mutations to obtain the final anchors. However, the effect I get by using Euclidean distance for k-means clustering is not as good as using 1-iou for k-means clustering. If you want the source code for k-means clustering using 1-iou, please message me privately. But the effect is actually almost the same.
backbone: The network structure of the image feature extraction part.
head: is the network structure of the final prediction part

train.py configuration is very simple:

We only need to modify the following parameters

epoch: Control the number of training iterations
batch_size enters the number of images for iteration
cfg: configure network model path
data: training configuration file path
weights: Load the model and continue training at breakpoints

Terminal Run

 python train.py

You can start training.

Training process

Training results

3. Test model

Three parameters need to be changed

source: the images/videos path that needs to be detected
out: The path to save the results
weights: The path to the model weight file obtained by training

Terminal Run

 python detect.py

You can start testing.

You can also use the weight files on the coco data set for testing and put them in the weights folder

Extraction code: hhbb

Test results

4. Deploy via flask

Deployment of flask is very simple. If you don’t understand anything, you can refer to my previous blog or comment below.

Alibaba Cloud ECS deploys python and flask projects, which is simple and easy to understand, without nginx and uwsgi

Target detection and multi-target tracking web platform based on yolov3-deepsort-flask

Terminal Run

 python app.py

You can start jumping to the web page for image or video upload detection.