Human behavior detection based on YOLOv5

Table of Contents

1. Project background and project purpose

2. Introduction to YOLOv5 algorithm

3 Implementation of human behavior detection based on YOLOv5

Four functional design

5. Loss function/hyperparameter adjustment/fitting-detailed explanation

Six changes in loss

Seven model display

Eight Summary


One Project Background and Project Purpose

background:

With the rapid development of computer vision technology, human behavior detection has become one of the research hotspots in the field of computer vision. Human behavior detection has broad application prospects in fields such as intelligent monitoring, human-computer interaction, smart home, and smart medical care. Therefore, accurate and real-time detection of human behavior is of great significance to promote technological progress and application innovation in related fields.

Traditional behavior detection methods are usually based on hand-designed feature extractors, which often perform poorly when dealing with complex backgrounds and dynamic scenes. In recent years, the rise of deep learning technology has provided new solutions for human behavior detection. Target detection methods based on deep learning, such as the YOLO (You Only Look Once) series of algorithms, have achieved significant success in target detection tasks. Among them, YOLOv5, as the latest version of the YOLO series, has higher accuracy and faster reasoning speed, providing new possibilities for human behavior detection.

Purpose:

The purpose of this project is to achieve target detection of five behaviors including falling, standing, squatting, sitting and running based on the YOLOv5 algorithm. By building efficient models, training and tuning them, we aim to improve the accuracy and real-time performance of human behavior detection. At the same time, this project also aims to provide valuable reference and reference for research and application in related fields.

In order to achieve this goal, we will conduct a series of research work, including data set preparation, data preprocessing, network structure design, loss function selection, hyperparameter adjustment, etc. We will make full use of existing computing resources and draw on the latest research results in related fields in order to achieve breakthrough progress. Finally, we will verify the performance of the model through experiments and evaluate its application potential in different scenarios.

2 Introduction to YOLOv5 algorithm

YOLOv5 is a real-time target detection algorithm with the advantages of fast speed, high accuracy, and easy deployment. Compared with previous versions, YOLOv5 has made some structural optimizations, such as the introduction of SPP module, PANet path aggregation, etc., to further improve detection performance. This makes YOLOv5 ideal for human behavior detection

  • Backbone Network: A lightweight backbone network is used as a feature extractor to obtain feature representation in images. A common choice is CSPDarknet as the backbone network, which uses a Cross-Stage Partial connection (CSP) structure to improve the efficiency and accuracy of the network.
  • Neck Network: A network structure called PANet (Path Aggregation Network) is introduced as the Neck part, used to fuse different scales Feature map of. PANet achieves the establishment of a Feature Pyramid through a top-down path and a bottom-up path, allowing the network to focus on targets at different scales at the same time.
  • Head network: The head network is responsible for predicting the location and category information of the target. The head network consists of a series of convolutional layers and fully connected layers, which are used to process and decode the feature maps.

—- YOLOv5 network structure diagram —-

Three implementations of human behavior detection based on YOLOv5

1. Data set preparation

In order to detect the five behaviors of falling, standing, squatting, sitting and running, we first need to prepare a data set containing the annotation information of these behaviors. The data set should include pictures or videos in various scenarios, as well as corresponding behavioral labels.

And what I use is the Human Behavior Image Dataset’

Data size 12400
Data format JPG/ mv
Data category Human action behavior
  • Reading method:
  1. Use Labelimg to label the data set to mark the human body position and corresponding behavioral labels to ensure that each sample has accurate behavioral annotations.
  2. During the datasetpreprocessing phase, images are unified to the same size for input into the model.
  3. Leverage dataenhancement techniques to augment your dataset, including randomcropping, rotation, flip and color transformation Etc., to increase the diversity of data and improve the robustness of the model. Thenimprove the generalization ability of the model.
  • Part of the data set is shown below

2. Data preprocessing

Before inputting data into the model, we need to perform some preprocessing operations on the data, such as normalization, enhancement, etc., to improve the generalization ability of the model.

# Dataset path and annotation tool path
dataset_dir = 'path/to/dataset'
annotation_tool = 'path/to/annotation/tool'
  
# Preprocessing operations
scale_size = 416 # Image scaling size, set according to the requirements of YOLOv5
normalizer = iaa.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225)) # Normalization parameters, set according to the mean and standard deviation of ImageNet
augmenter = iaa.Sequential([ # Data augmentation operation, which can be adjusted according to needs
    iaa.Fliplr(0.5), # Flip horizontally
    iaa.Flipud(0.5), # vertical flip
    iaa.Affine(rotate=(-10, 10)), # Random rotation
    iaa.Affine(scale={"x": (0.8, 1.2), "y": (0.8, 1.2)}), # Random scaling
])
  
# Traverse all image files in the dataset directory
image_files = glob.glob(os.path.join(dataset_dir, '*.jpg')) # Assume that the image file is in jpg format
  
for img_file in image_files:
    # read image
    img = cv2.imread(img_file)
  
    # Image scaling
    img = cv2.resize(img, (scale_size, scale_size))
  
    # Data augmentation
    img_aug = augmenter.augment_image(img)
  
    # Normalized  
    img_norm = normalizer.augment_image(img_aug)
  
    # Save the preprocessed image (optional)
    # cv2.imwrite('path/to/save/preprocessed_image.jpg', img_norm) 

Four functional design

  • Image recognition: Detect human behaviors based on human images: fall, stand, squat, sit, and run.
  • Multi-person image recognition: Human behavior can still be detected and recognized when the image contains multiple people.
  • Model training and selection: yolov5

Five. Loss function/hyperparameter adjustment/fitting-detailed explanation

  1. YOLOv5 uses a loss function called CIoU (Complete Intersection over Union) to optimize the prediction and position regression of the target box. The CIoU loss function comprehensively considers factors such as the position, size, and shape of the target frame, and can more accurately measure the difference between the predicted frame and the real frame.
  2. In order to improve the detection accuracy of the model and adapt to the input required by the network framework, all input images are resized to 640×640 pixels. Limited by the laboratory computer hardware configuration, the batch size was set to 16. Through experiments, this chapter found that after 80 training iterations of the model, the mAP curve gradually flattens out. In order to obtain the optimal model, the number of training iterations is set to 100, and the network parameters are optimized through the SGD optimizer. **Other hyperparameters such as momentum, learning rate, weight attenuation coefficient, etc. are the optimal parameters obtained after 10 hyperparameter evolutions. After determining the training parameters, start training the model.
  3. In order to improve the model convergence speed, first use the COCO data set to pre-train the network and obtain the weight file. COCO (Common Objects in Context) is a commonly used target detection, object recognition and image segmentation data set
  4. At the same time, since there are more targets in the COCO data set, the requirements for the accuracy, robustness and generalization ability of the target detection algorithm are also higher, which also promotes the continuous improvement of target detection algorithms. Development and improvement.
  5. The pre-training weights used are provided by the author of YOLOv5. The weights are trained using the Amazon EC2 cloud server, setting the batch size to 32 and the number of iterations to 300. After downloading the weights, use the transfer learning method to send the training set to the network for retraining.
  6. During the training process, after each iteration, the average precision value, loss value and recall rate are tested on the verification set, and the model is saved once. Finally, this chapter will generate the optimal model as Test objects and complete subsequent identification experiments.
    -As shown below: Parameter debugging training
Enter image size 640×640
Batch processing 16
Initial learning rate 0.0032
Attenuation index 0.00036
Number of iterations 500
Momentum 0.843

Changes in six losses

The changes are shown in the figure below:

  • Figure 1: P-R curves of five types of human behavior detection

  • Figure 2: F1 curves of five human behavior detections

  • Figure 3: Diagram of various evaluation indicators during network training process

Seven Model Display

  • Code block:

  • Model nuggets:

Eight Summary

This article introduces a research on human behavior detection based on YOLOv5 algorithm. This study adopted an appropriate data set, and after data reading and preprocessing, used the adjusted YOLOv5 network structure for training. Through reasonable hyperparameter adjustment, problems such as under-fitting, over-fitting, gradient disappearance and gradient explosion are effectively avoided, and good training effects are achieved. The evaluation results on the test set show that the model achieves high accuracy and precision in the detection of five behaviors: falling, standing, squatting, sitting, and running. In addition, this article provides detailed comments on the source code and a brief introduction and analysis of classic algorithms.

Overall, the human behavior detection method based on YOLOv5 shows good performance and real-time performance in target detection tasks, providing new ideas and directions for research in the field of human behavior detection. This research has positive significance for promoting the development of the field of computer vision and improving people’s quality of life.