1. Project description
With the development of science and technology, image recognition and target detection technology have been widely used in fields such as autonomous driving and intelligent transportation. However, in complex environmental conditions, such as foggy days, existing target detection technologies may face the problem of reduced recognition rates. To this end, we proposed a pedestrian vehicle target detection project in foggy weather based on Yolov5 to improve detection accuracy in complex environments. YOLOv5 is one of the most advanced target detection algorithms in recent years, which combines high speed and accuracy and is suitable for real-time scenarios. Here we use the RTTS data set as an example. The challenges of this project are:
-
The goal is complex
- The environment is complex and must adapt to daytime, cloudy, foggy, hazy and other climatic environments under various visibility conditions;
- The scene is complex, and scenes such as urban roads, rural areas, and highways are very different;
-
Unbalanced sample
- There are many categories, including: pedestrians, cyclists, cars, buses, motorcycles, and bicycles;
- Each image contains multiple types of targets, as well as various degrees of occlusion and truncation;
Figure 1 – RTTS dataset example
2. Environment description
This example is based on the yolov5 network and was trained on the RTTS data set.
-
PaddlePaddle 2.2
-
OS 64-bit operating system
-
Deep learning framework: PyTorch 1.7 or higher, used to build and train the Yolov5 model.
-
Python 3(3.6/3.7/3.8/3.9), 64-bit version
-
pip/pip3(9.0.1 + ), 64-bit version
-
CUDA >= 10.1
-
cuDNN >= 7.6
3. Data preparation
3.1 Data Introduction
RTTS: Pedestrian and vehicle target detection in foggy weather_Dataset-Flying Paddle AI Studio Galaxy Community
The RTTS data set is derived from the RESIDE data set (RESIDE-Fog Dataset is a public data set used for foggy image processing and computer vision-related research). The RTTS data set contains 4322 real foggy pictures as project training. set. There are also 100 real scene pictures as a verification set. The distribution of the number of images is shown in the following table:
Dataset | train | val |
---|---|---|
Number of images | 4322 | 100 |
Data preprocessing: Clean and preprocess the collected image data, including image enhancement (such as contrast enhancement, image defogging), etc.;
3.2 Data structure
The organization structure of the file is as follows (refer to COCO):
The YOLO format annotation data files are as follows:
The VOC format annotation data files are as follows:
4. Model selection
Joseph Redmon and others proposed the YOLO (You Only Look Once, YOLO) algorithm in 2015, also commonly known as YOLOv1; in 2016, they improved the algorithm and proposed the YOLOv2 version; in 2018, the YOLOv3 version was developed; released in 2020 YOLOv5 version, currently v5 is also the most widely used version among these versions.
-
Yolov5 is a target detection algorithm based on deep learning. Compared with the traditional two-stage target detection method, Yolov5 uses a single-stage detection process, which has a greater advantage in speed [2]. In a foggy environment, due to low visibility, a larger number of images need to be processed, so using a faster algorithm can improve detection efficiency.
-
Yolov5 divides the input image into a fixed-size grid and predicts the object’s bounding box and category on each grid. This grid division method makes Yolov5 better at detecting small targets and can effectively detect people and vehicle targets in foggy environments.
-
Yolov5 can predict at multiple scales and combine feature information at different scales to improve detection accuracy. In a foggy environment, due to the influence of light attenuation and scattering, targets in the image may appear blurred or noisy. Using multi-scale prediction can enhance the ability to distinguish targets.
In summary, Yolov5, as an algorithm for human and vehicle target detection in foggy weather, has the advantages of fast speed, good detection effect on small targets, and the ability to utilize multi-scale information. It can effectively detect human and vehicle targets in foggy weather environments.
5. Model training
The default 8-card configuration is used. If you use single-card training in AI Studio, you need to modify the train.sh file. The specific modifications are as follows:
export CUDA_VISIBLE_DEVICES=0 python -m paddle.distributed.launch --gpus 0 tools/train.py -c configs-hazedet/ppyoloe/ppyoloe_crn_m_100e_hazedet.yml --eval
In [7]
! bash train.sh
6. Model evaluation
Based on the PaddleDetection library, we provide a variety of prediction methods for you to choose.
Model location: output/ppyoloe_crn_m_100e_hazedet
In [4]
!bash eval.sh
7. Model optimization
This section focuses on showing the idea of optimizing accuracy during the model iteration process:
- Baseline: The backbone network loads the CSPResNetb_m model parameters pre-trained by ImageNet, and evaluates it after training for 100 epochs:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.260 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.499 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.237 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.180 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.319 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.447 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.208 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.413 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.428 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.318 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.550 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.574
- COCO pre-trained model: Load the COCO pre-trained ppyoloe_crn_m model model and perform finetune training on the RTTS data set. The final detection mAP was improved by 14.7%.
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.407 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.672 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.416 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.283 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.489 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.716 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.282 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.492 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.510 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.390 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.618 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.769
- Offline data enhancement: Use the defogging algorithm to perform offline data augmentation on the training set. Common defogging methods include MSBDN, Trident-Dehazing network, FFA-net and other models.
Here we chose the MSBDN model, dehazed and enhanced the training set offline, and trained it together with the original training set. The purpose was to enrich the training set by generating images with different fog concentrations, and at the same time reduce the difficulty of identifying dense fog samples. Accelerate model convergence. The final detection mAP was improved by 0.6%.
11583 Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.413 11584 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.672 11585 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.415 11586 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.277 11587 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.519 11588 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.726 11589 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.282 11590 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.499 11591 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.517 11592 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.387 11593 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.641 11594 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.780
8. Inference visualization
Refer to infer.sh
, the final output file is in the output
directory.
In [9]
! python tools/infer.py \ -c configs-hazedet/ppyoloe/ppyoloe_crn_m_100e_hazedet.yml \ --infer_img=dataset/hazedet/val/HR/59.png \ -o weights=output/ppyoloe_crn_m_100e_hazedet/best_model
Before testing
Before testing
Figure 2 – Comparison before and after image detection (taken from the output directory)
By default, 100 epoch iterative calculations are performed, and the result details are as follows:
LABEL visualization:
F1 value curve:
PR curve:
Batch calculation example:
An example of visual reasoning is as follows:
9. Model export
Export inference model
The weight files saved by the PaddlePaddle framework are divided into two types: training models that support forward inference and reverse gradient and inference models that only support forward inference. The difference between the two is that the inference model is optimized for inference speed and video memory. It cuts some tensors that are only needed during the training process to reduce the video memory usage, and performs some similar layer fusion and kernel selection speed optimization. Therefore, you can execute the following command to export the inference model.
By default, it is exported to the inference_model
directory.
In [10]
# Model export ! bash export_model.sh
10. Model deployment
Use Paddle-inference, the native reasoning library of Paddle-inference, for server-side model deployment.
Generally divided into three steps:
- Create PaddlePredictor and set the exported model path
- Create a PaddleTensor for input and pass it into PaddlePredictor
- Get the output PaddleTensor and take out the result
#include "paddle_inference_api.h" //Create a config and modify related settings paddle::NativeConfig config; config.model_dir = "xxx"; config.use_gpu = false; //Create a native PaddlePredictor autopredictor= paddle::CreatePaddlePredictor<paddle::NativeConfig>(config); // Create input tensor int64_t data[4] = {1, 2, 3, 4}; paddle::PaddleTensor tensor; tensor.shape = std::vector<int>({4, 1}); tensor.data.Reset(data, sizeof(data)); tensor.dtype = paddle::PaddleDType::INT64; //Create an output tensor. The memory of the output tensor can be reused. std::vector<paddle::PaddleTensor> outputs; //Perform prediction CHECK(predictor->Run(slots, & amp;outputs)); // Get outputs ...
For more details, see > C++ Prediction API Introduction
Let’s take Paddle Inference’s Python deployment as an example to illustrate:
Use the deploy/python/infer.py script provided by PaddleDetection to perform inference predictions on images. In the project, we use TensorRT FP16 for inference, and the inference speed can reach 208fps on a single card V100.
# Reasoning for a single image CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=inference_model/ppyoloe_crn_m_100e_hazedet --image_file=dataset/hazedet/val/HR/0.png --device=gpu --run_mode=trt_fp16 # All pictures in the inference folder CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=inference_model/ppyoloe_crn_m_100e_hazedet --image_dir=dataset/hazedet/val/ --device=gpu --run_mode=trt_fp16
In [15]
# Reasoning for a single image ! python deploy/python/infer.py --model_dir=inference_model/ppyoloe_crn_m_100e_hazedet --image_file=dataset/h