Vehicle driving direction detection system (turning, lane changing, straight driving) based on DeepSort and STAM-LSTM

1. Research background and significance

Project ReferenceAAAI Association for the Advancement of Artificial Intelligence

research background and meaning

With the popularization of transportation and increasingly busy road traffic, accurate detection of vehicle driving direction is crucial for traffic management and the development of intelligent transportation systems. Accurate detection of vehicle driving direction can help traffic management departments better plan roads, optimize traffic flow, and provide real-time traffic information to drivers and passengers, thereby improving road safety and traffic efficiency.

However, there are some problems with traditional vehicle driving direction detection methods. First, traditional methods are usually based on a combination of feature extraction and classifiers, requiring manual design of features and training of classifiers. Such methods are often difficult to achieve accurate detection results in complex traffic scenes. Secondly, traditional methods are often unable to handle problems such as occlusion, deformation, and illumination changes of vehicles in complex scenes, resulting in unstable detection results. In addition, traditional methods also have certain challenges in real-time requirements and cannot meet the needs of real-time traffic monitoring and early warning.

In order to solve the above problems, deep learning technology has achieved great success in the field of computer vision in recent years. Deep learning technology can automatically learn features and improve detection accuracy and robustness through training on large-scale data. Object detection methods based on deep learning have achieved remarkable results in many fields, such as face recognition, object detection, etc. Therefore, applying deep learning technology to the field of vehicle driving direction detection has important research significance and practical application value.

This research aims to achieve accurate detection of vehicle traveling direction through deep learning technology based on the vehicle traveling direction detection system of DeepSort and STAM-LSTM. Specifically, DeepSort is a multi-target tracking algorithm based on deep learning that can achieve real-time tracking and identification of vehicles. STAM-LSTM is a long short-term memory network based on the spatiotemporal attention mechanism, which can model and predict the movement trajectory of the vehicle. By combining these two methods, we can achieve accurate detection of vehicle driving direction and provide real-time traffic information.

The significance of this study is mainly reflected in the following aspects:

Improve the accuracy of vehicle driving direction detection: Through deep learning technology, the characteristics of vehicle driving direction can be automatically learned, and the accuracy of detection can be improved through large-scale data training. Compared with traditional methods, deep learning-based methods can better handle complex traffic scenes and improve detection accuracy and robustness.
Improve the real-time performance of vehicle driving direction detection: Deep learning technology can achieve efficient calculation through methods such as GPU acceleration, thereby meeting the needs of real-time traffic monitoring and early warning. The vehicle driving direction detection system based on DeepSort and STAM-LSTM can track and predict the movement trajectory of vehicles in real time and provide real-time traffic information.
Promote the development of intelligent transportation systems: Accurate detection of vehicle driving direction is crucial to the development of intelligent transportation systems. By providing accurate driving direction information, it can help traffic management departments better plan roads, optimize traffic flow, and provide real-time traffic information to drivers and passengers, thereby improving road safety and traffic efficiency.

In short, the vehicle driving direction detection system based on DeepSort and STAM-LSTM has important research significance and practical application value. Through the application of deep learning technology, the accuracy and real-time performance of vehicle driving direction detection can be improved, and the development of intelligent transportation systems can be promoted. It is hoped that this study can provide useful reference and reference for research and application in the field of vehicle driving direction detection.

2. Picture demonstration

3. Video demonstration

Vehicle driving direction detection system (turning, lane changing, straight driving) based on DeepSort and STAM-LSTM

4. Collection, labeling and organization of data sets

Collection of pictures

First, we need to collect the required images. This can be achieved in different ways, such as using existing datasets,

Use labelImg for labeling

labelImg is a graphical image annotation tool that supports VOC and YOLO formats. The following are the steps to use labelImg to label images in VOC format:

(1) Download and install labelImg.
(2) Open labelImg and select “Open Dir” to select your image directory.
(3) Set a label name for your target object.
(4) Draw a rectangular frame on the picture and select the corresponding label.
(5) Save the annotation information, which will generate an XML file with the same name as the picture in the picture directory.
(6) Repeat this process until all pictures are labeled.

Convert to YOLO format

Since the annotation is in txt format, we need to convert the VOC format to txt format. This can be achieved using various conversion tools or scripts.

Here’s a simple way to do it using a Python script that reads the XML file and then converts it to txt format.

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import xml.etree.ElementTree as ET
import os

classes = [] # initialized to an empty list

CURRENT_DIR = os.path.dirname(os.path.abspath(__file__))

def convert(size, box):
    dw = 1./size[0]
    dh = 1. / size[1]
    x = (box[0] + box[1]) / 2.0
    y = (box[2] + box[3]) / 2.0
    w = box[1] - box[0]
    h = box[3] - box[2]
    x = x * dw
    w = w * dw
    y = y * dh
    h = h * dh
    return (x, y, w, h)

def convert_annotation(image_id):
    in_file = open('./label_xml\%s.xml' % (image_id), encoding='UTF-8')
    out_file = open('./label_txt\%s.txt' % (image_id), 'w') # Generate txt format file
    tree = ET.parse(in_file)
    root = tree.getroot()
    size = root.find('size')
    w = int(size.find('width').text)
    h = int(size.find('height').text)

    for obj in root.iter('object'):
        cls = obj.find('name').text
        if cls not in classes:
            classes.append(cls) # If the category does not exist, add it to the classes list
        cls_id = classes.index(cls)
        xmlbox = obj.find('bndbox')
        b = (float(xmlbox.find('xmin').text), float(xmlbox.find('xmax').text), float(xmlbox.find('ymin').text),
             float(xmlbox.find('ymax').text))
        bb = convert((w, h), b)
        out_file.write(str(cls_id) + " " + " ".join([str(a) for a in bb]) + '\
')

xml_path = os.path.join(CURRENT_DIR, './label_xml/')

# xml list
img_xmls = os.listdir(xml_path)
for img_xml in img_xmls:
    label_name = img_xml.split('.')[0]
    print(label_name)
    convert_annotation(label_name)

print("Classes:") # Print the final classes list
print(classes) #Print the final classes list

Organize the data folder structure

We need to organize the dataset into the following structure:

-----data
   |-----train
   | |-----images
   | |-----labels
   |
   |-----valid
   | |-----images
   | |-----labels
   |
   |-----test
       |-----images
       |-----labels

Make sure the following:

All training images are located in the data/train/images directory, and the corresponding label files are located in the data/train/labels directory.
All verification images are located in the data/valid/images directory, and the corresponding label files are located in the data/valid/labels directory.
All test images are located in the data/test/images directory, and the corresponding label files are located in the data/test/labels directory.
Such a structure makes data management and model training, verification, and testing very convenient.

5. Core code explanation

5.1 angel.py

Here is the core part of encapsulating the code into a class:


class AngleCalculator:
    def __init__(self, p1, p2):
        self.p1 = p1
        self.p2 = p2

    def angle_between(self):
        ang1 = np.arctan2(*self.p1[::-1])
        ang2 = np.arctan2(*self.p2[::-1])
        return np.rad2deg((ang1 - ang2) % (2 * np.pi))

#315.

In this class, we encapsulate the angle_between function as a method of the class. We can calculate the angle between two points by creating an instance of AngleCalculator and passing in two points p1 and p2. We can then use the angle_between method to calculate the angle and do subsequent processing as needed.

The file name of this program is angel.py. Its main function is to calculate the angle between two vectors.

The program first imports the numpy library, and then defines a function named angle_between, which accepts two parameters p1 and p2, representing the coordinates of the two vectors respectively. Inside the function, the arctan2 function of the numpy library is used to calculate the angle between the two vectors and the x-axis, and then the rad2deg function is used to convert the radian to an angle, and the result is returned.

Next, the program defines the coordinates of two vectors A and B, which are (1, 0) and (1, -1) respectively.

Then, the program comments out a line of code that calls the angle_between function to calculate the angle between vector A and vector B, and prints the result. The calculated angle is 45 degrees.

Next, the program calculates the angle between vector B and vector A and assigns the result to the variable ang. If the angle is greater than 180 degrees, subtract 360 degrees from the angle, otherwise it remains unchanged. Finally, the program prints out the value of the variable ang, which is 315 degrees.

5.2 detector_CPU.py


classDetector:
    def __init__(self):
        self.img_size = 640
        self.threshold = 0.02
        self.stride = 1
        self.weights = './weights/output_of_small_target_detection.pt'
        self.device = '0' if torch.cuda.is_available() else 'cpu'
        self.device = select_device(self.device)
        model = attempt_load(self.weights, map_location=self.device)
        model.to(self.device).eval()
        model.float()
        self.m = model
        self.names = model.module.names if hasattr(model, 'module') else model.names

    def preprocess(self, img):
        img0 = img.copy()
        img = letterbox(img, new_shape=self.img_size)[0]
        img = img[:, :, ::-1].transpose(2, 0, 1)
        img = np.ascontiguousarray(img)
        img = torch.from_numpy(img).to(self.device)
        img = img.float()
        img /= 255.0
        if img.ndimension() == 3:
            img = img.unsqueeze(0)
        return img0, img

    def detect(self, im):
        im0, img = self.preprocess(im)
        pred = self.m(img, augment=False)[0]
        pred = pred.float()
        pred = non_max_suppression(pred, self.threshold, 0.4)
        boxes = []
        for det in pred:
            if det is not None and len(det):
                det[:, :4] = scale_coords(img.shape[2:], det[:, :4], im0.shape).round()
                for *x, conf, cls_id in det:
                    lbl = self.names[int(cls_id)]
                    if lbl not in ['car', 'bus', 'truck']:
                        continue
                    x1, y1 = int(x[0]), int(x[1])
                    x2, y2 = int(x[2]), int(x[3])
                    xm = x2
                    ym = y2
                    boxes.append((x1, y1, x2, y2, lbl, conf))
        return boxes

This program file is a target detector and the file name is detector_CPU.py. It uses PyTorch and OpenCV libraries to implement object detection functions. In the initialization function, some parameters are defined, including image size, threshold and step size. The path to the model weights file is also specified here. The program will select the device based on whether it has a GPU and load the model. The preprocessing function resizes and channels the input image and converts it into a PyTorch tensor. The detection function inputs the preprocessed image into the model to obtain the prediction result. The prediction results are then filtered and screened based on the threshold and non-maximum suppression algorithm, and finally the coordinates, category and confidence of the target box are returned.

5.3 LSTM.py

import torch
import torch.nn as nn
import torch.nn.functional as F

class STAM_LSTM(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super(STAM_LSTM, self).__init__()
        self.feature_extraction = FeatureExtraction(input_dim, hidden_dim)
        self.spatial_attention = SpatialAttention(hidden_dim)
        self.temporal_attention = TemporalAttention(hidden_dim)
        self.encoder = Encoder(hidden_dim, hidden_dim)
        self.decoder = Decoder(hidden_dim, output_dim)

    def forward(self, x):
        x = self.feature_extraction(x)
        x = self.spatial_attention(x)
        x = x.unsqueeze(1).repeat(1, x.size(1), 1)
        encoder_out, (hn, cn) = self.encoder(x)
        context = self.temporal_attention(encoder_out)
        decoder_in = context.unsqueeze(1)
        return self.decoder(decoder_in)

The code encapsulated into a class is as shown above.

This is a program file named LSTM.py, which defines a class named STAM_LSTM, which inherits from nn.Module. This class contains several sub-modules, including FeatureExtraction, SpatialAttention, TemporalAttention, Encoder and Decoder. In the initialization function, it accepts input dimensions, hidden dimensions, and output dimensions as parameters and creates instances of these submodules.

In the forward propagation function, the input data is processed by the three sub-modules of feature_extraction, spatial_attention and temporal_attention, and a context vector is obtained. Then, pass the context vector as input to the encoder submodule to get the output and hidden state of the encoder. Then, the context vector is expanded into a three-dimensional tensor and passed to the decoder sub-module for decoding, and finally the output result is obtained.

In summary, this program file defines a neural network model that uses an LSTM model for sequence processing. It processes input data through steps such as feature extraction, spatial attention, temporal attention, encoding and decoding, and generates corresponding output results.

5.4 model.py

import torch
import torch.nn as nn
import torch.nn.functional as F

class SpatialAttention(nn.Module):
    def __init__(self, input_dim):
        super(SpatialAttention, self).__init__()
        self.linear = nn.Linear(input_dim, 1)

    def forward(self, x):
        attention_weights = F.softmax(self.linear(x), dim=1)
        return torch.sum(attention_weights * x, dim=1)

class TemporalAttention(nn.Module):
    def __init__(self, input_dim):
        super(TemporalAttention, self).__init__()
        self.linear = nn.Linear(input_dim, 1)

    def forward(self, x):
        attention_weights = F.softmax(self.linear(x), dim=1)
        return torch.sum(attention_weights * x, dim=1)

class FeatureExtraction(nn.Module):
    def __init__(self, input_dim, hidden_dim):
        super(FeatureExtraction, self).__init__()
        self.global_feature = nn.Linear(input_dim, hidden_dim)
        self.local_feature = nn.Linear(input_dim, hidden_dim)
        self.feature_fusion = nn.Linear(hidden_dim * 2, hidden_dim)

    def forward(self, x):
        global_feature = self.global_feature(x)
        local_feature = self.local_feature(x)
        fusion_feature = torch.cat((global_feature, local_feature), dim=-1)
        return self.feature_fusion(fusion_feature)

class Encoder(nn.Module):
    def __init__(self, input_dim, hidden_dim):
        super(Encoder, self).__init__()
        self.lstm = nn.LSTM(input_dim, hidden_dim, batch_first=True)

    def forward(self, x):
        return self.lstm(x)

class Decoder(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super(Decoder, self).__init__()
        self.lstm = nn.LSTM(input_dim, hidden_dim, batch_first=True)
        self.linear = nn.Linear(hidden_dim, output_dim)

    def forward(self, x):
        lstm_out, _ = self.lstm(x)
        return self.linear(lstm_out)

This program file is a model for sequence data processing. It contains several different modules:

SpatialAttention: The spatial attention module is used to calculate the attention weight and weighted sum operation of the input data in the spatial dimension.
TemporalAttention: The temporal attention module is used to calculate the attention weight and weighted summation operation of the input data in the time dimension.
FeatureExtraction: Feature extraction module is used to extract global features and local features from the input data and fuse them into a feature vector.
Encoder: The encoder module uses the LSTM network to encode the input data.
Decoder: The decoder module uses the LSTM network to decode the encoded data and output the final prediction result.

The input dimension of this model is input_dim, the hidden layer dimension is hidden_dim, and the output dimension is output_dim. The specific calculation process of the model is implemented in the forward method of each module.

6. Overall system structure

Based on the above analysis, the overall functions and architecture can be summarized as follows:

This project is a vehicle driving direction detection system based on DeepSort and STAM-LSTM. It uses a deep learning model for target detection and tracking, and an LSTM model to predict the target’s driving direction. The entire system contains multiple program files, each file is responsible for different functional modules.

Here is an overview of the functionality of each file:

File path	Function
E:\Visual Project\shop \Vehicle driving direction detection system (steering, lane changing, straight driving) based on DeepSort and STAM-LSTM\code\angel.py	Calculate the angle between two vectors
E:\Visual project\shop\Vehicle driving direction detection system (turning, changing lanes, going straight) based on DeepSort and STAM-LSTM\code\detector_CPU.py	Use CPU Program file for target detection
E:\Visual project\shop\Vehicle driving direction detection system (turning, lane changing, straight driving) based on DeepSort and STAM-LSTM\code \detector_CPU2.py	Program file for target detection using CPU
E:\Visual project\shop\Vehicle driving based on DeepSort and STAM-LSTM Direction detection system (turn, change lane, go straight)\code\detector_GPU.py	Program file using GPU for target detection
E:\Visual Project\shop\Vehicle driving direction detection system (turning, changing lanes, going straight) based on DeepSort and STAM-LSTM\code\LSTM.py	Defines an LSTM model containing multiple sub-modules for Sequence data processing and direction prediction
E:\Visual Project\shop\Vehicle driving direction detection system (turning, lane changing, going straight) based on DeepSort and STAM-LSTM\code \model.py	Defines multiple modules, including spatial attention, temporal attention, feature extraction, encoder and decoder, for sequence data processing and prediction direction
…	…

Note: Due to the large number of files, it is impossible to list the functions of all files one by one.

7. Vehicle trajectory prediction model

Framework overview

The overall framework of the STAM-LSTM model is shown in the figure. In the AAAI competition works, this model introduces the spatial attention (Spatial Attention, SA) and temporal attention (Temporal Attention, TA) mechanisms into the vehicle trajectory-to-trajectory prediction task. In , the SA layer is used to capture the relative importance of surrounding vehicles to the target vehicle, and the TA layer is used to calculate the different degrees of influence of feature vectors at each historical moment on generating future trajectories. In addition, the model also effectively integrates the motion features of the target vehicle, which can be used as supplementary information to better represent the vehicle’s movement behavior characteristics in real scenes.

The model mainly contains the following two modules:
(1)Feature Extraction Module. This module is to extract the multi-scale feature state information of the vehicle in the road network environment, including the global spatial features and local movement features of the target vehicle, and then combine them into the vehicle’s features through a feature fusion module (Feature Fusion Module, FFM). Comprehensive feature representation, and finally the comprehensive feature vector of each historical moment is used as the input of the encoder module.
(2) Attention-based Encoder-DecoderModule. This module can be further decomposed into the following three sub-parts: encoding module, temporal attention layer and decoding module. The encoding module converts the feature vectors of each historical moment into a high-dimensional tensor representation, and at the same time can capture the shallow correlation of these local feature information in time series; the temporal attention layer is to give a comprehensive feature vector of each historical moment Assign different weight coefficients, and then calculate the weighted temporal context vector c that is input to the decoder; the decoding module takes this temporal context vector c, and random noise r as input, and the hidden state vector output by the decoding passes through the prediction layer to generate the target vehicle future trajectory.

Spatial attention module

The movement behavior of vehicles traveling in a road network environment is not isolated, but is also affected by other surrounding vehicles at the same time. Therefore, combined with the theoretical knowledge of the graph attention neural network proposed by the Google Brain research team, a spatial attention module is designed in this section to extract different social interaction relationships between the target vehicle and its neighbor vehicles, and then calculate the target vehicle The global spatial feature at this sampling moment has its structure as shown in the figure.

First, the local location feature set {e(,e…e.}) of multiple vehicles calculated by the location feature extraction module is used as the input of this module, where n represents the total number of vehicles in the mobile scene at the sampling moment. Then Calculate the spatial correlation of the feature vectors corresponding to the vehicle pair (i, j) according to formula (3-6); then calculate it through formula (3-7)
Calculate the attention weight coefficient αlf between them, which represents the relative importance between a pair of vehicles (i, j).

8. Vehicle driving direction detection

Vehicle driving direction

Based on the vehicle tracking information obtained above, we can calculate the driving angle of each vehicle. Referring to the AAAI method, we use opencv to implement it. The specific method is that for each vehicle, we calculate its position change in two consecutive frames to obtain the direction vector of its motion. Then calculate the angle between this direction vector and the horizontal direction to obtain the vehicle’s driving angle.

import numpy as np

def calculate_angle(pt1, pt2):
    # Calculate the angle between two points
    x_diff = pt2[0] - pt1[0]
    y_diff = pt2[1] - pt1[1]
    return np.degrees(np.arctan2(y_diff, x_diff))

# Calculate the vehicle driving angle in the loop
for track in tracker.tracks:
    if len(track.locations) > 1:
        angle = calculate_angle(track.locations[-2], track.locations[-1])
        #...

Vehicle driving direction

Finally, we determine the driving direction of the vehicle based on the calculated vehicle driving angle. For example, if the angle is close to 0 degrees or 180 degrees, we can think that the vehicle is going straight; if the angle deviates from these two values by a large amount, we can think that the vehicle is turning or changing lanes.

def judge_direction(angle):
    if abs(angle) < 15 or abs(angle - 180) < 15:
        return "go straight"
    elif angle > 15 and angle < 165:
        return "Turn left or change lanes left"
    elif angle < -15 and angle > -165:
        return "Turn right or change lanes right"

# Determine the vehicle's driving direction in the loop
for track in tracker.tracks:
    if len(track.locations) > 1:
        angle = calculate_angle(track.locations[-2], track.locations[-1])
        direction = judge_direction(angle)
        #...

Combining STAM-LSTM for behavioral prediction

In order to more accurately predict the future driving direction of the vehicle, we can also combine the STAM-LSTM model to predict spatiotemporal behavior. By training the STAM-LSTM model, we can predict the future driving direction of the vehicle based on its historical driving trajectory.

Overall, the vehicle driving direction detection system based on OpenCV, DeepSort and STAM-LSTM can effectively identify and predict the vehicle’s turning, lane changing and straight driving behaviors. By accurately calculating the vehicle’s driving angle and combining it with a spatiotemporal behavior prediction model, the system can achieve stable and accurate vehicle tracking and driving direction judgment in complex traffic scenarios.

9. System integration

The complete source code & data set & environment deployment video tutorial & custom UI interface shown below

Reference blog “Vehicle driving direction detection system (turning, lane changing, going straight) based on DeepSort and STAM-LSTM”

10. References

[1] Song Yafei. Research on recommendation methods based on graph neural network [D]. 2021.

[2]Kaouther Messaoud, Itheri Yahiaoui, Anne Verroust-Blondet, et al. Attention Based Vehicle Trajectory Prediction[J]. IEEE Transactions on Intelligent Vehicles.2020,6(1).175-185.DOI:10.1109/TIV.2020.2991952.

[3]James J. Q. Yu.Travel Mode Identification With GPS Trajectories Using Wavelet Transform and Deep Learning[J].IEEE transactions on intelligent transportation systems.2020,22(2).1093-1103.DOI:10.1109/TITS.2019.2962741.

[4]Ling Zhao, Yujiao Song, Chao Zhang, et al. T-GCN: A Temporal Graph Convolutional Network for Traffic Prediction[J]. Intelligent Transportation Systems, IEEE Transactions on.2019,21(9).3848-3858.DOI :10.1109/TITS.2019.2935152 .

[5]Shuai Zhang, Lina Yao, Aixin Sun, et al.Deep Learning Based Recommender System[J].ACM computing surveys.2019,52(1).1-38.DOI:10.1145/3285029.

[6]Nachiket Deo,Akshay Rangesh,Mohan M. Trivedi.How Would Surround Vehicles Move? A Unified Framework for Maneuver Classification and Motion Prediction[J].IEEE Transactions on Intelligent Vehicles.2018,3(2).129-140. DOI:10.1109/TIV.2018.2804159 .

[7]Lefèvre Stéphanie,Vasquez Dizan,Laugier Christian.A survey on motion prediction and risk assessment for intelligent vehicles[J].ROBOMECH Journal.2014,1(1).DOI:10.1186/s40648-014-0001-z.

[8]Toledo-Moreo, R.,Zamora-Izquierdo, M. A..IMM-Based Lane-Change Prediction in Highways With Low-Cost GPS/INS[J].IEEE transactions on intelligent transportation systems.2009,10(1).

[9]Chrysanthou Y,Lerner A,Lischinski D.Crowds by example[J].Computer Graphics Forum: Journal of the European Association for Computer Graphics.2007,26(3).

[10] Polychronopoulos A., Tsogas M., Amditis A. J., et al. Sensor Fusion for Predicting Vehicles’ Path for Collision Avoidance Systems [J]. IEEE transactions on intelligent transportation systems. 2007, 8 (3). 549-562.