[Visual Algorithm Series 1] Using KerasCV YOLOv8 for traffic light detection (Part 1)

Tip: To obtain the complete code and data set involved in this article for free, please contact the assistant teacher peaeci122

A comprehensive guide to traffic light detection using the latest “KerasCV YOLOv8” model

YOLO object detection models have found their way into countless applications, from surveillance systems to self-driving cars. So, what happens if you pair this feature of YOLOv8 under the KerasCV framework?

Recently, KerasCV integrated the famous YOLOv8 detection model into its library. Let’s discuss “How to fine-tune YOLOv8 using a custom data set”, which will involve the following points:

· Fine-tuning YOLOv8 on the traffic light detection data set

· Run inference on validation images

· Analysis results

Figure 1. KerasCV YOLOv8 output for traffic light detection

Traffic light detection data set

Train the KerasCV YOLOv8 model using the Traffic Light Detection Dataset. The Small Traffic Light Dataset (S2TLD) is provided by Thinklab. A collection of images and annotations is provided in the download link of the notebook. The dataset contains 4564 images, annotations are presented in XML format, and the following images clearly depict the different scenarios in which the images were collected.

Figure 2. Image samples in the S2TLD dataset

The version of the dataset to be used contains four categories

red
yellow
green
off

Using KerasCV YOLOv8 for object detection, start by setting up the necessary libraries:

!pip install keras-cv==0.5.1
!pip install keras-core

In the initial steps, set up the environment to take advantage of the capabilities of “KerasCV YOLOv8” for object detection. Installing keras-cv and keras-core ensures the availability of all modules required when starting object detection. It is important to maintain the correct versions to prevent compatibility issues. In this tutorial, we use keras-cv version 0.5.1 to get the best results from YOLOv8.

Software package and library import

Import required packages and libraries

import os
import xml.etree.ElementTree as ET
import tensorflow astf
import keras_cv
import requests
import zipfile
 
from tqdm.auto import tqdm
from tensorflow import keras
from keras_cv import bounding_box
from keras_cv import visualization

Before we delve into the core functionality of “KerasCV YOLOv8” for object detection, let’s lay the foundation by importing the necessary libraries and modules:

os: Helps interact with the underlying operating system that Python runs on, for directory operations;

xml.etree .ElementTree (ET): Assists in parsing XML files, often used to mark data sets where objects are located;

tensorflow & keras: KerasCV YOLOv8 “is the basis for realizing deep learning functions;

keras_cv: An important repository that provides tools for our project to utilize the YOLOv8 model;

requests: Allows sending HTTP requests, which are essential for obtaining online datasets or model weights;

zipfile: Convenient for extracting compressed files, which may be useful when processing compressed data sets or model files

tqdm: Improve code through progress bar, making lengthy programs simple and easy to use

bounding_box and visualization from keras_cv: These are crucial for handling bounding box operations and visualizing the results after detecting objects using KerasCV YOLOv8.

After ensuring that these modules are imported, we can proceed with the rest of the object detection process efficiently.

Download data set

First, download the traffic light detection dataset from the direct source

# Download dataset.
def download_file(url, save_name):
    if not os.path.exists(save_name):
        print(f"Downloading file")
        file = requests.get(url, stream=True)
        total_size = int(file.headers.get('content-length', 0))
        block_size = 1024
        progress_bar = tqdm(
            total=total_size,
            unit='iB',
            unit_scale=True
        )
        with open(os.path.join(save_name), 'wb') as f:
            for data in file.iter_content(block_size):
                progress_bar.update(len(data))
                f.write(data)
        progress_bar.close()
    else:
        print('File already present')
         
download_file(
    'https://www.dropbox.com/scl/fi/suext2oyjxa0v4p78bj3o/S2TLD_720x1280.zip?rlkey=iequuynn54uib0uhsc7eqfci4 & amp;dl=1',
    'S2TLD_720x1280.zip'
)
To obtain the data set, please contact peaeci122
</code><img class="look-more-preCode contentImg-no-view" src="//i2.wp.com/csdnimg.cn/release/blogv2/dist/pc/img/newCodeMoreBlack. png" alt="" title="">

Unzip the dataset

# Unzip the data file
def unzip(zip_file=None):
    try:
        with zipfile.ZipFile(zip_file) as z:
            z.extractall("./")
            print("Extracted all")
    except:
        print("Invalid file")
 
unzip('S2TLD_720x1280.zip')

The dataset will be extracted into the S2TLD_720x1280 directory.

Dataset and training parameters

Appropriate dataset and training parameters need to be defined, including dataset splits for training and validation, batch size, learning rate, and the number of epochs for which the KerasCV YOLOv8 model needs to be trained.

SPLIT_RATIO = 0.2
BATCH_SIZE = 8
LEARNING_RATE = 0.001
EPOCH = 75
GLOBAL_CLIPNORM = 10.0

20% of the data is used for validation and the remaining data is used for training. Considering the model and image size used for training, the batch size is 8, the learning rate is set to 0.001, and the model is trained for 75 epochs.

Dataset preparation

This is the most important aspect of training a deep learning model, we first define the class name and have access to all image and annotation files.

class_ids = [
    "red",
    "yellow",
    "green",
    "off",
]
class_mapping = dict(zip(range(len(class_ids)), class_ids))
 
# Path to images and annotations
path_images = "S2TLD_720x1280/images/"
path_annot = "S2TLD_720x1280/annotations/"
 
# Get all XML file paths in path_annot and sort them
xml_files = sorted(
    [
        os.path.join(path_annot, file_name)
        for file_name in os.listdir(path_annot)
        if file_name.endswith(".xml")
    ]
)
 
# Get all JPEG image file paths in path_images and sort them
jpg_files = sorted(
    [
        os.path.join(path_images, file_name)
        for file_name in os.listdir(path_images)
        if file_name.endswith(".jpg")
    ]
)
</code><img class="look-more-preCode contentImg-no-view" src="//i2.wp.com/csdnimg.cn/release/blogv2/dist/pc/img/newCodeMoreBlack. png" alt="" title="">

The class_mapping dictionary provides easy lookup from numeric IDs to corresponding class names, and all image and annotation file paths are stored in xml_files and jpg_files respectively.

Next is parsing the XML annotation file to store the labels and bounding box annotations required for training.

def parse_annotation(xml_file):
    tree = ET.parse(xml_file)
    root = tree.getroot()
 
    image_name = root.find("filename").text
    image_path = os.path.join(path_images, image_name)
 
    boxes = []
    classes = []
    for obj in root.iter("object"):
        cls = obj.find("name").text
        classes.append(cls)
 
        bbox = obj.find("bndbox")
        xmin = float(bbox.find("xmin").text)
        ymin = float(bbox.find("ymin").text)
        xmax = float(bbox.find("xmax").text)
        ymax = float(bbox.find("ymax").text)
        boxes.append([xmin, ymin, xmax, ymax])
 
    class_ids = [
        list(class_mapping.keys())[list(class_mapping.values()).index(cls)]
        for cls in classes
    ]
    return image_path, boxes, class_ids
 
 
image_paths = []
bbox = []
classes = []
for xml_file in tqdm(xml_files):
    image_path, boxes, class_ids = parse_annotation(xml_file)
    image_paths.append(image_path)
    bbox.append(boxes)
    classes.append(class_ids)
</code><img class="look-more-preCode contentImg-no-view" src="//i2.wp.com/csdnimg.cn/release/blogv2/dist/pc/img/newCodeMoreBlack. png" alt="" title="">

The parse_annotation(xml_file) function digs into each XML file, extracting the file name, object category, and their respective bounding box coordinates. With the help of class_mapping, it converts class names into class IDs for easy usage.

After parsing all the XML files, we collect all image paths, bounding boxes and class ids in separate lists and then combine them into a TensorFlow dataset using tf.data.Dataset.from_tensor_slices.

bbox = tf.ragged.constant(bbox)
classes = tf.ragged.constant(classes)
image_paths = tf.ragged.constant(image_paths)
 
data = tf.data.Dataset.from_tensor_slices((image_paths, classes, bbox))

All data is not stored in a single tf.data.Dataset object and needs to be divided into a training set and a validation set using SPLIT_RATIO.

# Determine the number of validation samples
num_val = int(len(xml_files) * SPLIT_RATIO)
 
# Split the dataset into train and validation sets
val_data = data.take(num_val)
train_data = data.skip(num_val)

Load images and annotations and apply required preprocessing.

def load_image(image_path):
    image = tf.io.read_file(image_path)
    image = tf.image.decode_jpeg(image, channels=3)
    return image
 
def load_dataset(image_path, classes, bbox):
    # Read Image
    image = load_image(image_path)
    bounding_boxes = {
        "classes": tf.cast(classes, dtype=tf.float32),
        "boxes": bbox,
    }
    return {"images": tf.cast(image, tf.float32), "bounding_boxes": bounding_boxes}
 
augmenter = keras.Sequential(
    layers=[
        keras_cv.layers.RandomFlip(mode="horizontal", bounding_box_format="xyxy"),
        keras_cv.layers.JitteredResize(
            target_size=(640, 640),
            scale_factor=(1.0, 1.0),
            bounding_box_format="xyxy",
        ),
    ]
)
 
train_ds = train_data.map(load_dataset, num_parallel_calls=tf.data.AUTOTUNE)
train_ds = train_ds.shuffle(BATCH_SIZE * 4)
train_ds = train_ds.ragged_batch(BATCH_SIZE, drop_remainder=True)
train_ds = train_ds.map(augmenter, num_parallel_calls=tf.data.AUTOTUNE)
</code><img class="look-more-preCode contentImg-no-view" src="//i2.wp.com/csdnimg.cn/release/blogv2/dist/pc/img/newCodeMoreBlack. png" alt="" title="">

For the training set, the images were resized to 640×640 resolution and a random horizontal flip augmentation was applied. The augmentation will ensure that the model does not overfit prematurely.

As for the validation set, no enhancement is required, just resizing the image is enough.

resizing = keras_cv.layers.JitteredResize(
    target_size=(640, 640),
    scale_factor=(1.0, 1.0),
    bounding_box_format="xyxy",
)
 
val_ds = val_data.map(load_dataset, num_parallel_calls=tf.data.AUTOTUNE)
val_ds = val_ds.shuffle(BATCH_SIZE * 4)
val_ds = val_ds.ragged_batch(BATCH_SIZE, drop_remainder=True)
val_ds = val_ds.map(resizing, num_parallel_calls=tf.data.AUTOTUNE)

Before moving on to the next stage, visualize some samples using the training and validation datasets created above.

def visualize_dataset(inputs, value_range, rows, cols, bounding_box_format):
    inputs = next(iter(inputs.take(1)))
    images, bounding_boxes = inputs["images"], inputs["bounding_boxes"]
    visualization.plot_bounding_box_gallery(
        images,
        value_range=value_range,
        rows=rows,
        cols=cols,
        y_true=bounding_boxes,
        scale=5,
        font_scale=0.7,
        bounding_box_format=bounding_box_format,
        class_mapping=class_mapping,
    )
 
 
visualize_dataset(
    train_ds, bounding_box_format="xyxy", value_range=(0, 255), rows=2, cols=2
)
 
visualize_dataset(
    val_ds, bounding_box_format="xyxy", value_range=(0, 255), rows=2, cols=2
)
</code><img class="look-more-preCode contentImg-no-view" src="//i2.wp.com/csdnimg.cn/release/blogv2/dist/pc/img/newCodeMoreBlack. png" alt="" title="">

Here are some outputs from the above visualization function

Figure 3. Traffic light image annotated by KerasCV visualization module

Finally, create the final dataset format

def dict_to_tuple(inputs):
    return inputs["images"], inputs["bounding_boxes"]
 
 
train_ds = train_ds.map(dict_to_tuple, num_parallel_calls=tf.data.AUTOTUNE)
train_ds = train_ds.prefetch(tf.data.AUTOTUNE)
 
val_ds = val_ds.map(dict_to_tuple, num_parallel_calls=tf.data.AUTOTUNE)
val_ds = val_ds.prefetch(tf.data.AUTOTUNE)

To facilitate model training, use the dict_too_tuple function to convert the data set, and optimize the data set through the prefetch function to obtain better performance.

Directory