[2023 MathorCup College Mathematical Modeling Challenge-Big Data Competition] Track A: Pothole road detection and identification based on computer vision python code analysis

1 Question

Pothole detection and identification is a computer vision task that aims to identify roads with potholes from digital images, usually surface images of potholes. This is true for the earth. It is of great significance to research and application in fields such as geophysical exploration, aerospace science and natural disasters. For example, it can help identify potholes in Earth orbit, as well as analyze and model the morphology of the Earth’s surface.

In the task of pothole road detection, traditional classification algorithms often cannot achieve good results because the features of pothole images are often very complex and changeable. However, the development of deep learning technology in recent years has provided new solutions for pothole road detection.

Deep learning has strong feature extraction and representation capabilities and can automatically extract the most important features from images. In the pothole image classification task, deep learning can be used to extract features such as the contour, texture, and morphology of potholes and convert them into a representation that is easier to classify. At the same time, classification performance can be further improved through techniques such as transfer learning and knowledge distillation. For example, some researchers use deep learning-based methods to classify road images into normal and pothole categories; in addition, some researchers also use transfer learning-based methods to learn potholes from general pre-training models. Features of pothole images and use these features to classify pothole images.

This competition question hopes to automatically identify potholes in a new road image by analyzing, feature extraction and modeling of marked road images. The specific tasks are as follows:
Preliminary Questions
Question 1: Combine the given image files, extract image features, and build a model with high recognition rate, fast speed, and accurate classification, which can be used to identify whether the roads in the image are normal or potholes.
Question 2: Train the model built in Question 1 and evaluate the model from different dimensions.
Question 3: Use the trained model to identify pothole images in the test set, and put the recognition results in “test_result.csv”. (Note: The test set will be released 48 hours before the end of the competition. The download link will be announced, please pay attention to the registration website in time)

Attachment description:
Attachment 1: data.zip;
The training data set contains a total of 301 images in the file.
The file name containing the “normal” character indicates a normal road, otherwise it is a potholed road.

Figure 1: Example of normal road

Figure 2: Example of potholed road

Attachment 2: test_result.csv;
Submit the test result file. The header in the file remains unchanged. The data is only an example. Delete it and refill it when submitting. See the table below for field descriptions.
Table 1: test_result table field description

Field	Description
fnames	File name of the test image
label	Classification identification: fill in 1 and 0, 1 represents normal road; 0 represents pothole road

Attachment 3: test_data.zip

The test data set contains thousands of images in the file. The specific number is subject to the published data.

The download link for the test data set will be announced 48 hours before the end of the competition. Please pay attention to the registration website in time.

2 Idea analysis

First of all, the training set only has 301 images, indicating that this is a small sample problem. Follow the following process to establish a baseline, and then gradually optimize each part.

(1) Data preprocessing:

Resize the image: Since the deep learning model has strict requirements on the size of the input image, you can use an image processing algorithm (such as the resize function in the OpenCV library) to uniformly scale the image to a fixed size. In the following example, the unified size is 224*224.
Data enhancement: Image enhancement algorithms (such as translation, rotation, flip, etc. functions in the OpenCV library) can be used to enhance the image to expand the number of samples and increase data diversity.

(2) Feature extraction:

Feature extraction based on traditional computer vision algorithms: Traditional image feature extraction algorithms (such as SIFT, HOG, LBP, etc.) can be used to extract local or global features of the image for training deep learning models.
Feature extraction based on deep learning models: You can use pre-trained convolutional neural networks (such as VGG, ResNet, Inception, etc.) to extract high-level features of images and use these features as input to train deep learning models. The following is an example of feature extraction by VGG, see Section 3.3.

(3) Visual analysis data set:

Use an image processing algorithm (such as the imshow function in the OpenCV library) to display the image: You can randomly select some sample images of normal roads and pothole roads, and use an image processing algorithm to visually display them to understand the characteristics and difficulties of the data set.
Draw statistical charts such as histograms and scatter plots: You can use statistical methods, such as drawing histograms of normal road and pothole road image pixels, scatter plots of color features, etc., to observe the distribution of the data set and determine the image characteristics. Is there any distinction?

(4) Establish a deep learning model:

The baseline uses convolutional neural networks (such as VGG, ResNet, Inception, etc.), autoencoders, recurrent neural networks, etc., and performs fine-tuning or transfer learning according to the characteristics of the data set.
Other cutting-edge image classification techniques include
- Transfer learning: Transfer models trained on large-scale data sets (such as ImageNet) to small sample problems, and solve classification problems through fine-tuning or feature extraction.
- Data enhancement: Use image enhancement algorithms (such as rotation, translation, flipping, cropping, etc.) to expand samples and increase the number and diversity of samples.
- Generative Adversarial Network (GAN): Increase the number of samples by synthesizing sample data, and use the GAN generator to generate realistic samples to expand the data.
- Meta Learning: Learn how to learn and generalize quickly from limited samples, and optimize the utilization efficiency of samples through learned prior knowledge.
- Semi-supervised learning: Use a small number of labeled samples and a large number of unlabeled samples for training to improve classification accuracy.
- Active Learning: Use active selection and labeling of key samples to reduce labeling costs and improve model performance.
- Small sample learning methods: Special algorithms and methods are proposed for small sample problems, such as Few-shot Learning, One-shot Learning, Zero-shot Learning, etc.
- Incremental Learning: Gradually learn and incrementally update the model to adapt to the introduction of new samples and the forgetting of old samples.
- Model compression and quantification: Through techniques such as model pruning, quantization, and distillation, model parameters and calculations are reduced to adapt to small sample problems.
- Ensemble learning: combine the results of multiple classifiers to improve classification accuracy and robustness, such as bagging, boosting, etc.

(5) Model evaluation and optimization:

Use cross-validation methods to evaluate the model: You can use methods such as k-fold cross-validation to evaluate the model and obtain indicators such as accuracy and recall to judge the performance of the model.
Adjust parameters and optimize the model: You can try different loss functions, optimizers, learning rates and other hyperparameters, as well as increase the size of the data set and reduce the complexity of the model to optimize the deep learning model.

3 python code implementation

3.1 Data preprocessing

import os
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Activation, Dropout, Flatten, Dense
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.layers import Convolution2D, MaxPooling2D, ZeroPadding2D
from tensorflow.keras import optimizers
from tensorflow.keras import applications
from tensorflow.keras.models import Model
from IPython.display import Image
from tensorflow.keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img
import os
from sklearn.model_selection import train_test_split
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import pandas as pd
from PIL import Image
import os

# Unify the pixel format of the images and store them in folders respectively

#Create folder
processed_normal_dir = "data/processed_normal"
processed_wavy_dir = "data/processed_wavy"
os.makedirs(processed_normal_dir, exist_ok=True)
os.makedirs(processed_wavy_dir, exist_ok=True)

# Process images
data_dir = "data"
for filename in os.listdir(data_dir):
    img_path = os.path.join(data_dir, filename)
    img = Image.open(img_path)
    
    # Scale the image
    img = img.resize((224, 224))
    
    # Decide in which folder the images should be stored
    if "normal" in filename:
        save_dir = processed_normal_dir
    else:
        save_dir = processed_wavy_dir
    # save image
    save_path = os.path.join(save_dir, filename)
    img.save(save_path)

(2) Data loading

There are a total of 301 pictures, 30 pictures are selected as the test set, and 1 picture is taken out separately for testing, otherwise it is difficult to divide it into integers.

img_width, img_height = 224, 224
num_classes = 2
batch_size = 10

datagen = ImageDataGenerator(rescale=1./255)

X = []
y = []
normal_dir = "data/processed_normal"
wavy_dir = "data/processed_wavy"

for img_name in os.listdir(normal_dir):
    img_path = os.path.join(normal_dir, img_name)
    X.append(img_path)
    y.append('0')
for img_name in os.listdir(wavy_dir):
    img_path = os.path.join(wavy_dir, img_name)
    X.append(img_path)
    y.append('1')

X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.1, random_state=42)
train_df = pd.DataFrame(data={<!-- -->'filename': X_train, 'class': y_train})
val_df = pd.DataFrame(data={<!-- -->'filename': X_val, 'class': y_val})

train_generator = datagen.flow_from_dataframe(
        ...slightly

validation_generator = datagen.flow_from_dataframe(
        ...slightly

Found 270 validated image filenames belonging to 2 classes.
Found 30 validated image filenames belonging to 2 classes.

3.2 Convolution model training

(1) Define convolutional network

model = Sequential()
model.add(Convolution2D(32, (3, 3), input_shape=(img_width, img_height,3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Convolution2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Convolution2D(64, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Flatten())
model.add(Dense(64))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(1))
model.add(Activation('sigmoid'))

model.compile(loss='binary_crossentropy',
              optimizer='rmsprop',
              metrics=['accuracy'])

(2) Model training

epochs = 20
train_samples = 270
validation_samples = 30
batch_size =10
model.fit_generator(
        train_generator,
        steps_per_epoch=train_samples // batch_size,
        epochs=epochs,
        validation_data=validation_generator,
        validation_steps=validation_samples// batch_size,)

model.save_weights('models/basic_cnn_20_epochs.h5')
model.load_weights('models_trained/basic_cnn_30_epochs.h5')

(3) Model verification

# Take out the extra picture and predict it
img = load_img('data/normal1.jpg')
x = img_to_array(img)
prediction = model.predict(x.reshape((1,img_width, img_height,3)),batch_size=10, verbose=0)
print(prediction)

0

model.evaluate_generator(validation_generator, validation_samples)

[0.7280968427658081, 0.8999999761581421]

3.3 Data Enhancement Training

(1) Data enhancement

A dataset artificially enhanced with new, unseen images by applying random transformations to the training set. Reduce overfitting and provide our network with better generalization capabilities.

train_datagen_augmented = ImageDataGenerator(
        rescale=1./255, # normalize pixel values to [0,1]
        shear_range=0.2, # randomly applies shearing transformation
        zoom_range=0.2, # randomly applies shearing transformation
        horizontal_flip=True) # randomly flip the images

train_generator_augmented = train_datagen_augmented.flow_from_dataframe(
       ...slightly

(2) Model training

model.fit_generator(
        train_generator_augmented,
        steps_per_epoch=train_samples // batch_size,
        epochs=epochs,
        validation_data=validation_generator,
        validation_steps=validation_samples // batch_size,)

(3) Model evaluation

model.save_weights('models/augmented_20_epochs.h5')
#model.load_weights('models_trained/augmented_30_epochs.h5')

model.evaluate_generator(validation_generator, validation_samples)

[0.2453145980834961, 0.8666666746139526]

3.4 Pre-trained model

By using a general-purpose, pre-trained image classifier, it is possible to surpass previous models in terms of performance and efficiency. This example uses VGG16, a model trained on the ImageNet dataset, which contains millions of images classified into 1000 categories.

(1) Load the weights of the VGG model

model_vgg =

train_generator_bottleneck = datagen.flow_from_dataframe(
        dataframe=train_df,
        directory=None,
        x_col='filename',
        y_col='class',
        target_size=(img_width, img_height),
        batch_size=batch_size,
        class_mode='binary')

validation_generator_bottleneck = datagen.flow_from_dataframe(
        dataframe=val_df,
        directory=None,
        x_col='filename',
        y_col='class',
        target_size=(img_width, img_height),
        batch_size=batch_size,
        class_mode='binary')

(2) Use the model to extract features

bottleneck_features_train = model_vgg.predict_generator(train_generator_bottleneck, train_samples // batch_size)
np.save(open('models/bottleneck_features_train.npy', 'wb'), bottleneck_features_train)

bottleneck_features_validation = model_vgg.predict_generator(validation_generator_bottleneck, validation_samples // batch_size)
np.save(open('models/bottleneck_features_validation.npy', 'wb'), bottleneck_features_validation)

(3) Read preprocessed data

train_data = np.load(open('models/bottleneck_features_train.npy', 'rb'))
train_labels = np.array([0] * (train_samples // 2) + [1] * (train_samples // 2))

validation_data = np.load(open('models/bottleneck_features_validation.npy', 'rb'))
validation_labels = np.array([0] * (validation_samples // 2) + [1] * (validation_samples // 2))

(4) Fully connected network model training

model_top = Sequential()
model_top.add(Flatten(input_shape=train_data.shape[1:]))
model_top.add(Dense(256, activation='relu'))
model_top.add(Dropout(0.5))
model_top.add(Dense(1, activation='sigmoid'))

model_top.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['accuracy'])

model_top.fit(train_data, train_labels,
        epochs=epochs,
        batch_size=batch_size,
        validation_data=(validation_data, validation_labels))

model_top.save_weights('models/bottleneck_20_epochs.h5')

(5) Model evaluation

model_top.evaluate(validation_data, validation_labels)

[2.3494818210601807, 0.4333333373069763]

3.5 Fine-tuning the pre-trained model

Build a classifier model on top of the convolutional model. To fine-tune, start with a fully trained classifier. The weights from the earlier model will be used. Then add this model to the convolution base

weights_path = 'weight/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5'
model_vgg = applications.VGG16(include_top=False, weights=weights_path, input_shape=(224, 224, 3))

top_model = Sequential()
top_model.add(Flatten(input_shape=model_vgg.output_shape[1:]))
top_model.add(Dense(256, activation='relu'))
top_model.add(Dropout(0.5))
top_model.add(Dense(1, activation='sigmoid'))

top_model.load_weights('models/bottleneck_20_epochs.h5')

#model_vgg.add(top_model)
model = Model(inputs = model_vgg.input, outputs = top_model(model_vgg.output))
# Fine-tuning, only requires training a few layers. This line will set the first 25 layers (up to the conv block) as untrainable.

for layer in model_vgg.layers[:15]:
    layer.trainable = False
    
model.compile(loss='binary_crossentropy',
              optimizer=optimizers.SGD(lr=1e-4, momentum=0.9),
              metrics=['accuracy'])

data augmentation

# Data enhancement
train_datagen = ImageDataGenerator(
        rescale=1./255,
        shear_range=0.2,
        zoom_range=0.2,
        horizontal_flip=True)

test_datagen = ImageDataGenerator(rescale=1./255)


train_generator = datagen.flow_from_dataframe(
      ...slightly

validation_generator = datagen.flow_from_dataframe(
        ...slightly

Model fine-tuning

# Fine-tune the model
model.fit_generator(
    train_generator,
    steps_per_epoch=train_samples // batch_size,
    epochs=epochs,
    validation_data=validation_generator,
    validation_steps=validation_samples // batch_size)

model.save_weights('models/finetuning_20epochs_vgg.h5')
model.load_weights('models/finetuning_20epochs_vgg.h5')

Model evaluation

model.evaluate_generator(validation_generator, validation_samples)

[nan, 0.8666666746139526]

In the end, the model does not converge in this way, indicating that there are unreasonable aspects in the network setting process, such as the number of layers of frozen parameters, the network model used, whether data enhancement is needed, and other factors will all affect it. This method is provided for students to improve.

4 Download the complete program

The above code is incomplete. If you need the complete code, please download the source file.
Includes trained model and weight files