Classic Convolutional Model Review 33-Garbage Detection Using YOLOv3 (Tensorflow2.0)

YOLOv3 (You Only Look Once version 3, full name: “You only look once” version 3) is an object detection algorithm, which is the third version of the YOLO series of algorithms. YOLOv3 was launched in 2018 by Joseph Redmon and Ali Farhadi.

Compared with the previous two versions, YOLOv3 has higher detection accuracy and better performance. It uses some new techniques, including residual blocks and cross-layer connections, to improve the effect of feature extraction and thus improve detection accuracy.

The YOLOv3 model is divided into two parts: feature extraction network and detection network. The feature extraction network uses the Darknet-53 architecture, which can quickly and accurately extract the features of the input image. The detection network is based on the output of the feature extraction network, through multiple detection layers to predict the target frame, confidence and category at different scales, and then complete the task of object detection.

The advantage of YOLOv3 is that it is fast, can be used for real-time detection, and can detect multiple targets at the same time without preprocessing the input image. Compared with traditional region-based object detection methods, YOLOv3 also has better robustness and higher detection accuracy, and can be applied to more application scenarios.

Here are the steps to use YOLOv3 for garbage detection:

1. Prepare the garbage image dataset and divide it into training set, validation set and test set.
2. Train the YOLOv3 model. You can use the already implemented YOLOv3 code base and adapt it to your dataset. During training, you can monitor the performance of the model by viewing the training loss and prediction results.
3. Perform model optimization and testing. Use the validation set to refine the model and the test set to test the performance of the model.
4. Use the model for spam detection. Use the model to predict junk objects in new images and compare them to the labels in the junk image dataset.

The following is an example of YOLOv3 garbage detection based on TensorFlow 2.0

1. Install TensorFlow 2.0 and related dependencies:

pip install tensorflow==2.0.0
pip install opencv-python
pip install matplotlib

2. Download the YOLOv3 weight file yolov3.weights and related configuration files:

wget https://pjreddie.com/media/files/yolov3.weights
wget https://raw.githubusercontent.com/pjreddie/darknet/master/cfg/yolov3.cfg
wget https://raw.githubusercontent.com/pjreddie/darknet/master/data/coco.names

3. Use OpenCV to load the image and preprocess it:

import cv2
import numpy as np

def load_image(img_path):
    img = cv2.imread(img_path)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    img = cv2.resize(img, (416, 416))
    img = img.astype(np.float32)
    img = img / 255.0
    img = np. expand_dims(img, axis=0)
    return img

4. Create YOLOv3 model:

import tensorflow as tf

def create_yolo_model():
    input_layer = tf.keras.layers.Input([416, 416, 3])
    conv_layer_1 = tf.keras.layers.Conv2D(32, (3, 3), strides=(1, 1), padding='same', activation='relu')(input_layer)
    pool_layer_1 = tf.keras.layers.MaxPooling2D(pool_size=(2, 2), strides=(2, 2))(conv_layer_1)
    conv_layer_2 = tf.keras.layers.Conv2D(64, (3, 3), strides=(1, 1), padding='same', activation='relu')(pool_layer_1)
    pool_layer_2 = tf.keras.layers.MaxPooling2D(pool_size=(2, 2), strides=(2, 2))(conv_layer_2)
    conv_layer_3 = tf.keras.layers.Conv2D(128, (3, 3), strides=(1, 1), padding='same', activation='relu')(pool_layer_2)
    conv_layer_4 = tf.keras.layers.Conv2D(64, (1, 1), strides=(1, 1), padding='same', activation='relu')(conv_layer_3)
    conv_layer_5 = tf.keras.layers.Conv2D(128, (3, 3), strides=(1, 1), padding='same', activation='relu')(conv_layer_4)
    pool_layer_3 = tf.keras.layers.MaxPooling2D(pool_size=(2, 2), strides=(2, 2))(conv_layer_5)
    conv_layer_6 = tf.keras.layers.Conv2D(256, (3, 3), strides=(1, 1), padding='same', activation='relu')(pool_layer_3)
    conv_layer_7 = tf.keras.layers.Conv2D(128, (1, 1), strides=(1, 1), padding='same', activation='relu')(conv_layer_6)
    conv_layer_8 = tf.keras.layers.Conv2D(256, (3, 3), strides=(1, 1), padding='same', activation='relu')(conv_layer_7)
    pool_layer_4 = tf.keras.layers.MaxPooling2D(pool_size=(2, 2), strides=(2, 2))(conv_layer_8)
    conv_layer_9 = tf.keras.layers.Conv2D(512, (3, 3), strides=(1, 1), padding='same', activation='relu')(pool_layer_4)
    conv_layer_10 = tf.keras.layers.Conv2D(256, (1, 1), strides=(1, 1), padding='same', activation='relu')(conv_layer_9)
    conv_layer_11 = tf.keras.layers.Conv2D(512, (3, 3), strides=(1, 1), padding='same', activation='relu')(conv_layer_10)
    conv_layer_12 = tf.keras.layers.Conv2D(256, (1, 1), strides=(1, 1), padding='same', activation='relu')(conv_layer_11)
    conv_layer_13 = tf.keras.layers.Conv2D(512, (3, 3), strides=(1, 1), padding='same', activation='relu')(conv_layer_12)
    skip_connection = conv_layer_13
    pool_layer_5 = tf.keras.layers.MaxPooling2D(pool_size=(2, 2), strides=(2, 2))(conv_layer_13)
    conv_layer_14 = tf.keras.layers.Conv2D(1024, (3, 3), strides=(1, 1), padding='same', activation='relu')(pool_layer_5)
    conv_layer_15 = tf.keras.layers.Conv2D(512, (1, 1), strides=(1, 1), padding='same', activation='relu')(conv_layer_14)
    conv_layer_16 = tf.keras.layers.Conv2D(1024, (3, 3), strides=(1, 1), padding='same', activation='relu')(conv_layer_15)
    conv_layer_17 = tf.keras.layers.Conv2D(512, (1, 1), strides=(1, 1), padding='same', activation='relu')(conv_layer_16)
    conv_layer_18 = tf.keras.layers.Conv2D(1024, (3, 3), strides=(1, 1), padding='same', activation='relu')(conv_layer_17)
    conv_layer_19 = tf.keras.layers.Conv2D(1024, (3, 3), strides=(1, 1), padding='same', activation='relu')(conv_layer_18)
    conv_layer_20 = tf.keras.layers.Conv2D(1024, (3, 3), strides=(1, 1), padding='same', activation='relu')(skip_connection)
    concatenate_layer = tf.keras.layers.concatenate([conv_layer_20, conv_layer_19], axis=-1)
    conv_layer_21 = tf.keras.layers.Conv2D(1024, (3, 3), strides=(1, 1), padding='same', activation='relu')(concatenate_layer)
    flatten_layer = tf.keras.layers.Flatten()(conv_layer_21)
    dense_layer_1 = tf.keras.layers.Dense(4096, activation='relu')(flatten_layer)
    dropout_layer_1 = tf.keras.layers.Dropout(0.5)(dense_layer_1)
    output_layer = tf.keras.layers.Dense(2535, activation='softmax')(dropout_layer_1)
    return tf.keras.Model(inputs=input_layer, outputs=output_layer)

5. Load YOLOv3 weight file:

import struct

def load_weights(model, weights_file):
    with open(weights_file, "rb") as f:
        # Skip header
        np.fromfile(f, dtype=np.int32, count=5)

        for layer in model.layers:
            if not layer.name.startswith('conv2d'):
                continue

            filters = layer.filters
            k_size = layer. kernel_size[0]
            in_dim = layer.input_shape[-1]

            if layer.activation == tf.keras.activations.linear:
                # Darknet serializes convolutional weights as:
                # [bias/beta, [gamma, mean, variance], conv_weights]
                # We need to split them up and flatten the conv_weights
                conv_bias = np.fromfile(f, dtype=np.float32, count=filters)
                bn_weights = np.fromfile(f, dtype=np.float32, count=4 * filters)
                bn_weights = bn_weights.reshape((4, filters))[[1, 0, 2, 3]]

                kernel_shape = (k_size, k_size, in_dim, filters)
                conv_weights = np.fromfile(f, dtype=np.float32, count=np.product(kernel_shape))
                conv_weights = conv_weights.reshape(kernel_shape).transpose([2, 3, 1, 0])

                layer.set_weights([conv_weights, conv_bias, bn_weights])
            else:
                # Load conv. bias
                conv_bias = np.fromfile(f, dtype=np.float32, count=filters)

                # Load conv. weights
                kernel_shape = (k_size, k_size, in_dim, filters)
                conv_weights = np.fromfile(f, dtype=np.float32, count=np.product(kernel_shape))
                conv_weights = conv_weights.reshape(kernel_shape).transpose([2, 3, 1, 0])

                layer.set_weights([conv_weights, conv_bias])

6. Load category name:

def load_class_names(names_file):
    with open(names_file, "r") as f:
        class_names = f.read().splitlines()
    return class_names

7. Make a forecast:

def predict(image_file, model, class_names):
    # Load image
    image_data = load_image(image_file)

    # Predict
    pred = model. predict(image_data)

    # Decode prediction
    boxes, objectness, classes = tf. split(pred, (4, 1, -1), axis=-1)
    boxes = decode_boxes(boxes)
    scores = objectness * classes
    scores = tf.reshape(scores, shape=(-1,))
    boxes, scores, classes, valid_detections = tf.image.combined_non_max_suppression(
        boxes=tf.reshape(boxes, (tf.shape(boxes)[0], -1, 1, 4)),
        scores=tf.reshape(scores, (tf.shape(scores)[0], -1, tf.shape(scores)[-1])),
        max_output_size_per_class=50,
        max_total_size=50,
        iou_threshold=0.5,
        score_threshold=0.5
    )
    pred_bbox = [boxes.numpy(), scores.numpy(), classes.numpy(), valid_detections.numpy()]

    # Visualize prediction
    visualize_prediction(image_file, pred_bbox, class_names)