[2023 MathorCup College Mathematical Modeling Challenge-Big Data Competition] Track A: Pothole road detection and identification based on computer vision python code analysis
1 Question
Pothole detection and identification is a computer vision task that aims to identify roads with potholes from digital images, usually surface images of potholes. This is true for the earth. It is of great significance to research and application in fields such as geophysical exploration, aerospace science and natural disasters. For example, it can help identify potholes in Earth orbit, as well as analyze and model the morphology of the Earth’s surface.
In the task of pothole road detection, traditional classification algorithms often cannot achieve good results because the features of pothole images are often very complex and changeable. However, the development of deep learning technology in recent years has provided new solutions for pothole road detection.
Deep learning has strong feature extraction and representation capabilities and can automatically extract the most important features from images. In the pothole image classification task, deep learning can be used to extract features such as the contour, texture, and morphology of potholes and convert them into a representation that is easier to classify. At the same time, classification performance can be further improved through techniques such as transfer learning and knowledge distillation. For example, some researchers use deep learning-based methods to classify road images into normal and pothole categories; in addition, some researchers also use transfer learning-based methods to learn potholes from general pre-training models. Features of pothole images and use these features to classify pothole images.
This competition question hopes to automatically identify potholes in a new road image by analyzing, feature extraction and modeling of marked road images. The specific tasks are as follows:
Preliminary Questions
Question 1: Combine the given image files, extract image features, and build a model with high recognition rate, fast speed, and accurate classification, which can be used to identify whether the roads in the image are normal or potholes.
Question 2: Train the model built in Question 1 and evaluate the model from different dimensions.
Question 3: Use the trained model to identify pothole images in the test set, and put the recognition results in “test_result.csv”. (Note: The test set will be released 48 hours before the end of the competition. The download link will be announced, please pay attention to the registration website in time)
Attachment description:
Attachment 1: data.zip;
The training data set contains a total of 301 images in the file.
The file name containing the “normal” character indicates a normal road, otherwise it is a potholed road.
Figure 1: Example of normal road
Figure 2: Example of potholed road
Attachment 2: test_result.csv;
Submit the test result file. The header in the file remains unchanged. The data is only an example. Delete it and refill it when submitting. See the table below for field descriptions.
Table 1: test_result table field description
Field | Description |
---|---|
fnames | File name of the test image |
label | Classification identification: fill in 1 and 0, 1 represents normal road; 0 represents pothole road |
Attachment 3: test_data.zip
The test data set contains thousands of images in the file. The specific number is subject to the published data.
The download link for the test data set will be announced 48 hours before the end of the competition. Please pay attention to the registration website in time.
2 Idea analysis
First of all, the training set only has 301 images, indicating that this is a small sample problem. Follow the following process to establish a baseline, and then gradually optimize each part.
(1) Data preprocessing:
- Resize the image: Since the deep learning model has strict requirements on the size of the input image, you can use an image processing algorithm (such as the resize function in the OpenCV library) to uniformly scale the image to a fixed size. In the following example, the unified size is 224*224.
- Data enhancement: Image enhancement algorithms (such as translation, rotation, flip, etc. functions in the OpenCV library) can be used to enhance the image to expand the number of samples and increase data diversity.
(2) Feature extraction:
- Feature extraction based on traditional computer vision algorithms: Traditional image feature extraction algorithms (such as SIFT, HOG, LBP, etc.) can be used to extract local or global features of the image for training deep learning models.
- Feature extraction based on deep learning models: You can use pre-trained convolutional neural networks (such as VGG, ResNet, Inception, etc.) to extract high-level features of images and use these features as input to train deep learning models. The following is an example of feature extraction by VGG, see Section 3.3.
(3) Visual analysis data set:
- Use an image processing algorithm (such as the imshow function in the OpenCV library) to display the image: You can randomly select some sample images of normal roads and pothole roads, and use an image processing algorithm to visually display them to understand the characteristics and difficulties of the data set.
- Draw statistical charts such as histograms and scatter plots: You can use statistical methods, such as drawing histograms of normal road and pothole road image pixels, scatter plots of color features, etc., to observe the distribution of the data set and determine the image characteristics. Is there any distinction?
(4) Establish a deep learning model:
- The baseline uses convolutional neural networks (such as VGG, ResNet, Inception, etc.), autoencoders, recurrent neural networks, etc., and performs fine-tuning or transfer learning according to the characteristics of the data set.
- Other cutting-edge image classification techniques include
- Transfer learning: Transfer models trained on large-scale data sets (such as ImageNet) to small sample problems, and solve classification problems through fine-tuning or feature extraction.
- Data enhancement: Use image enhancement algorithms (such as rotation, translation, flipping, cropping, etc.) to expand samples and increase the number and diversity of samples.
- Generative Adversarial Network (GAN): Increase the number of samples by synthesizing sample data, and use the GAN generator to generate realistic samples to expand the data.
- Meta Learning: Learn how to learn and generalize quickly from limited samples, and optimize the utilization efficiency of samples through learned prior knowledge.
- Semi-supervised learning: Use a small number of labeled samples and a large number of unlabeled samples for training to improve classification accuracy.
- Active Learning: Use active selection and labeling of key samples to reduce labeling costs and improve model performance.
- Small sample learning methods: Special algorithms and methods are proposed for small sample problems, such as Few-shot Learning, One-shot Learning, Zero-shot Learning, etc.
- Incremental Learning: Gradually learn and incrementally update the model to adapt to the introduction of new samples and the forgetting of old samples.
- Model compression and quantification: Through techniques such as model pruning, quantization, and distillation, model parameters and calculations are reduced to adapt to small sample problems.
- Ensemble learning: combine the results of multiple classifiers to improve classification accuracy and robustness, such as bagging, boosting, etc.
(5) Model evaluation and optimization:
- Use cross-validation methods to evaluate the model: You can use methods such as k-fold cross-validation to evaluate the model and obtain indicators such as accuracy and recall to judge the performance of the model.
- Adjust parameters and optimize the model: You can try different loss functions, optimizers, learning rates and other hyperparameters, as well as increase the size of the data set and reduce the complexity of the model to optimize the deep learning model.
3 python code implementation
3.1 Data preprocessing
import os import numpy as np from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Activation, Dropout, Flatten, Dense from tensorflow.keras.preprocessing.image import ImageDataGenerator from tensorflow.keras.layers import Convolution2D, MaxPooling2D, ZeroPadding2D from tensorflow.keras import optimizers from tensorflow.keras import applications from tensorflow.keras.models import Model from IPython.display import Image from tensorflow.keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img import os from sklearn.model_selection import train_test_split from tensorflow.keras.preprocessing.image import ImageDataGenerator import pandas as pd from PIL import Image import os
# Unify the pixel format of the images and store them in folders respectively #Create folder processed_normal_dir = "data/processed_normal" processed_wavy_dir = "data/processed_wavy" os.makedirs(processed_normal_dir, exist_ok=True) os.makedirs(processed_wavy_dir, exist_ok=True) # Process images data_dir = "data" for filename in os.listdir(data_dir): img_path = os.path.join(data_dir, filename) img = Image.open(img_path) # Scale the image img = img.resize((224, 224)) # Decide in which folder the images should be stored if "normal" in filename: save_dir = processed_normal_dir else: save_dir = processed_wavy_dir # save image save_path = os.path.join(save_dir, filename) img.save(save_path)
(2) Data loading
There are a total of 301 pictures, 30 pictures are selected as the test set, and 1 picture is taken out separately for testing, otherwise it is difficult to divide it into integers.
img_width, img_height = 224, 224 num_classes = 2 batch_size = 10 datagen = ImageDataGenerator(rescale=1./255) X = [] y = [] normal_dir = "data/processed_normal" wavy_dir = "data/processed_wavy" for img_name in os.listdir(normal_dir): img_path = os.path.join(normal_dir, img_name) X.append(img_path) y.append('0') for img_name in os.listdir(wavy_dir): img_path = os.path.join(wavy_dir, img_name) X.append(img_path) y.append('1') X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.1, random_state=42) train_df = pd.DataFrame(data={<!-- -->'filename': X_train, 'class': y_train}) val_df = pd.DataFrame(data={<!-- -->'filename': X_val, 'class': y_val}) train_generator = datagen.flow_from_dataframe( ...slightly validation_generator = datagen.flow_from_dataframe( ...slightly
Found 270 validated image filenames belonging to 2 classes.
Found 30 validated image filenames belonging to 2 classes.
3.2 Convolution model training
(1) Define convolutional network
model = Sequential() model.add(Convolution2D(32, (3, 3), input_shape=(img_width, img_height,3))) model.add(Activation('relu')) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Convolution2D(32, (3, 3))) model.add(Activation('relu')) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Convolution2D(64, (3, 3))) model.add(Activation('relu')) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Flatten()) model.add(Dense(64)) model.add(Activation('relu')) model.add(Dropout(0.5)) model.add(Dense(1)) model.add(Activation('sigmoid')) model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
(2) Model training
epochs = 20 train_samples = 270 validation_samples = 30 batch_size =10 model.fit_generator( train_generator, steps_per_epoch=train_samples // batch_size, epochs=epochs, validation_data=validation_generator, validation_steps=validation_samples// batch_size,) model.save_weights('models/basic_cnn_20_epochs.h5') model.load_weights('models_trained/basic_cnn_30_epochs.h5')
(3) Model verification
# Take out the extra picture and predict it img = load_img('data/normal1.jpg') x = img_to_array(img) prediction = model.predict(x.reshape((1,img_width, img_height,3)),batch_size=10, verbose=0) print(prediction)
0
model.evaluate_generator(validation_generator, validation_samples)
[0.7280968427658081, 0.8999999761581421]
3.3 Data Enhancement Training
(1) Data enhancement
A dataset artificially enhanced with new, unseen images by applying random transformations to the training set. Reduce overfitting and provide our network with better generalization capabilities.
train_datagen_augmented = ImageDataGenerator( rescale=1./255, # normalize pixel values to [0,1] shear_range=0.2, # randomly applies shearing transformation zoom_range=0.2, # randomly applies shearing transformation horizontal_flip=True) # randomly flip the images train_generator_augmented = train_datagen_augmented.flow_from_dataframe( ...slightly
(2) Model training
model.fit_generator( train_generator_augmented, steps_per_epoch=train_samples // batch_size, epochs=epochs, validation_data=validation_generator, validation_steps=validation_samples // batch_size,)
(3) Model evaluation
model.save_weights('models/augmented_20_epochs.h5') #model.load_weights('models_trained/augmented_30_epochs.h5') model.evaluate_generator(validation_generator, validation_samples)
[0.2453145980834961, 0.8666666746139526]
3.4 Pre-trained model
By using a general-purpose, pre-trained image classifier, it is possible to surpass previous models in terms of performance and efficiency. This example uses VGG16, a model trained on the ImageNet dataset, which contains millions of images classified into 1000 categories.
(1) Load the weights of the VGG model
model_vgg =
train_generator_bottleneck = datagen.flow_from_dataframe( dataframe=train_df, directory=None, x_col='filename', y_col='class', target_size=(img_width, img_height), batch_size=batch_size, class_mode='binary') validation_generator_bottleneck = datagen.flow_from_dataframe( dataframe=val_df, directory=None, x_col='filename', y_col='class', target_size=(img_width, img_height), batch_size=batch_size, class_mode='binary')
(2) Use the model to extract features
bottleneck_features_train = model_vgg.predict_generator(train_generator_bottleneck, train_samples // batch_size) np.save(open('models/bottleneck_features_train.npy', 'wb'), bottleneck_features_train) bottleneck_features_validation = model_vgg.predict_generator(validation_generator_bottleneck, validation_samples // batch_size) np.save(open('models/bottleneck_features_validation.npy', 'wb'), bottleneck_features_validation)
(3) Read preprocessed data
train_data = np.load(open('models/bottleneck_features_train.npy', 'rb')) train_labels = np.array([0] * (train_samples // 2) + [1] * (train_samples // 2)) validation_data = np.load(open('models/bottleneck_features_validation.npy', 'rb')) validation_labels = np.array([0] * (validation_samples // 2) + [1] * (validation_samples // 2))
(4) Fully connected network model training
model_top = Sequential() model_top.add(Flatten(input_shape=train_data.shape[1:])) model_top.add(Dense(256, activation='relu')) model_top.add(Dropout(0.5)) model_top.add(Dense(1, activation='sigmoid')) model_top.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['accuracy']) model_top.fit(train_data, train_labels, epochs=epochs, batch_size=batch_size, validation_data=(validation_data, validation_labels)) model_top.save_weights('models/bottleneck_20_epochs.h5')
(5) Model evaluation
model_top.evaluate(validation_data, validation_labels)
[2.3494818210601807, 0.4333333373069763]
3.5 Fine-tuning the pre-trained model
Build a classifier model on top of the convolutional model. To fine-tune, start with a fully trained classifier. The weights from the earlier model will be used. Then add this model to the convolution base
weights_path = 'weight/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5' model_vgg = applications.VGG16(include_top=False, weights=weights_path, input_shape=(224, 224, 3)) top_model = Sequential() top_model.add(Flatten(input_shape=model_vgg.output_shape[1:])) top_model.add(Dense(256, activation='relu')) top_model.add(Dropout(0.5)) top_model.add(Dense(1, activation='sigmoid')) top_model.load_weights('models/bottleneck_20_epochs.h5') #model_vgg.add(top_model) model = Model(inputs = model_vgg.input, outputs = top_model(model_vgg.output)) # Fine-tuning, only requires training a few layers. This line will set the first 25 layers (up to the conv block) as untrainable. for layer in model_vgg.layers[:15]: layer.trainable = False model.compile(loss='binary_crossentropy', optimizer=optimizers.SGD(lr=1e-4, momentum=0.9), metrics=['accuracy'])
data augmentation
# Data enhancement train_datagen = ImageDataGenerator( rescale=1./255, shear_range=0.2, zoom_range=0.2, horizontal_flip=True) test_datagen = ImageDataGenerator(rescale=1./255) train_generator = datagen.flow_from_dataframe( ...slightly validation_generator = datagen.flow_from_dataframe( ...slightly
Model fine-tuning
# Fine-tune the model model.fit_generator( train_generator, steps_per_epoch=train_samples // batch_size, epochs=epochs, validation_data=validation_generator, validation_steps=validation_samples // batch_size) model.save_weights('models/finetuning_20epochs_vgg.h5') model.load_weights('models/finetuning_20epochs_vgg.h5')
Model evaluation
model.evaluate_generator(validation_generator, validation_samples)
[nan, 0.8666666746139526]
In the end, the model does not converge in this way, indicating that there are unreasonable aspects in the network setting process, such as the number of layers of frozen parameters, the network model used, whether data enhancement is needed, and other factors will all affect it. This method is provided for students to improve.
4 Download the complete program
The above code is incomplete. If you need the complete code, please download the source file.
Includes trained model and weight files