Explore the impact of image resolution on the model, and develop and build a detection and identification system for cracks and defects in cement caves such as bridges, tunnels, culverts, etc. based on yolov5x

In many previous articles, we have done a lot of practical work on the development of projects in this area. If you are interested, you can read it by yourself:

“Safety intelligent inspection of cave scenes, development and construction of base building defect detection and identification system based on yolov7/yolov7x/yolov7e6e”

“Facilitating intelligent highway maintenance, developing and constructing a highway cracking detection and identification system based on YOLOv5s integrated SPD-BIFPN-SE”

“Development and construction of tunnel infrastructure cracks, shedding and other defect detection systems based on lightweight YOLOv5s”

Here we only list the work done just before the festival, and we will not list the work done any longer.

In the past project development, most of them were developed based on existing data sets, and less attention was paid to the problems of the data itself. In other words, for the improvement or optimization of results, most of them were considered to be processed at the algorithm model level. I pay less attention to the issue of data sources. During the holidays that just passed, I met some old friends when I returned to my hometown. After a good meal, we talked about some work-related content. One friend talked about his own project development experience and how to improve data. The quality of the source is more effective in improving the final effect. The main purpose of my article here is to build the training and development of the same model under different data qualities, so as to conduct comparative analysis of model performance.

Before starting the formal content, let’s take a look at concepts that are easily confused in the past.

The size of an image depends on several factors, including:

Resolution: Resolution refers to the number of pixels contained in an image per inch, usually expressed in PPI (Pixels Per Inch). Under the same printing or display size, high-resolution images can display or print more details and finer image quality than low-resolution images.
Image Depth: Image depth refers to the number of bits used per pixel, which determines the color resolution of the image. The higher the image depth, the greater the number of colors that can be represented and the more realistic the image will look.
Image format: Different image formats have different effects on image size. Some formats compress image data, making image files smaller, but may result in a loss of image quality. On the contrary, some high-quality formats retain more image information, making the file size larger.
Color Mode: Color mode also affects image size. For example, an image in RGB mode uses 24 bits per pixel to store color information, while an image in grayscale mode or black and white mode uses fewer bits to store it.
Compression algorithm: Some compression algorithms (such as JPEG, PNG, etc.) can compress images while ensuring image quality, thereby reducing file size.
Size: The size of the image is also a factor that affects the size. The larger the size, the more pixels the image contains and the naturally larger file size.

Image resolution refers to the amount of information stored in an image, usually measured by the number of pixels per inch of the image. The unit is PPI (Pixels Per Inch), which can also be called pixels per inch. Under the same printing or display size, high-resolution images can display or print more details and finer image quality than low-resolution images. Generally speaking, the higher the resolution of the image, the better the image quality, which is suitable for large-size printing or high-definition display. Lower-resolution images are suitable for network transmission or small-size printing because they take up less storage space.

Image resolution and image size are two different concepts, and they have different effects on the quality and size of the image. Image resolution refers to the number of pixels per inch in the image, usually expressed in PPI (Pixels Per Inch). Image resolution is an important indicator to describe the clarity of an image, which determines the details and display effect of the image. High-resolution images contain more pixels, so they can display more detail and richer colors. Generally speaking, the higher the resolution of the image, the better the image quality, which is suitable for large-size printing or high-definition display. The size of an image usually refers to the width and height of the image, and the unit can be pixels, length units (such as centimeters, millimeters, inches, etc.) or other units. Dimensions are data describing the size of an image, which determines how much space the image occupies on the display device. While keeping the image resolution unchanged, adjusting the size of the image will directly affect the display effect and details of the image. Although image resolution and image size both affect the quality and size of an image, they describe different aspects of the image. Image resolution focuses on the clarity and detail of the image, while image size focuses on how large the image appears on the screen or paper. When processing images, you need to choose the appropriate image resolution and size according to actual needs to achieve the best image quality and display effect.

The same data set as before is used here, but it is manually processed into a two-part data set. One part is a low-quality data set with reduced resolution and size, and the other part is a high-quality data set captured by the original camera.

At the model level, the x-series model with the largest number of parameters under the yolov5 series is used, as follows:

# YOLOv5  by Ultralytics, GPL-3.0 license

#Parameters
nc: 7 # number of classes
depth_multiple: 1.33 # model depth multiple
width_multiple: 1.25 # layer channel multiple
anchors:
  - [10,13, 16,30, 33,23] # P3/8
  - [30,61, 62,45, 59,119] # P4/16
  - [116,90, 156,198, 373,326] # P5/32

# YOLOv5 v6.0 backbone
backbone:
  # [from, number, module, args]
  [[-1, 1, Conv, [64, 6, 2, 2]], # 0-P1/2
   [-1, 1, Conv, [128, 3, 2]], # 1-P2/4
   [-1, 3, C3, [128]],
   [-1, 1, Conv, [256, 3, 2]], # 3-P3/8
   [-1, 6, C3, [256]],
   [-1, 1, Conv, [512, 3, 2]], # 5-P4/16
   [-1, 9, C3, [512]],
   [-1, 1, Conv, [1024, 3, 2]], # 7-P5/32
   [-1, 3, C3, [1024]],
   [-1, 1, SPPF, [1024, 5]], # 9
  ]

# YOLOv5 v6.0 head
head:
  [[-1, 1, Conv, [512, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 6], 1, Concat, [1]], # cat backbone P4
   [-1, 3, C3, [512, False]], # 13

   [-1, 1, Conv, [256, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 4], 1, Concat, [1]], # cat backbone P3
   [-1, 3, C3, [256, False]], # 17 (P3/8-small)

   [-1, 1, Conv, [256, 3, 2]],
   [[-1, 14], 1, Concat, [1]], # cat head P4
   [-1, 3, C3, [512, False]], # 20 (P4/16-medium)

   [-1, 1, Conv, [512, 3, 2]],
   [[-1, 10], 1, Concat, [1]], # cat head P5
   [-1, 3, C3, [1024, False]], # 23 (P5/32-large)

   [[17, 20, 23], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5)
  ]

The default is 100 epoch iterative calculations. The progress of high-quality data set training visible to the naked eye is very, very slow. Basically, a single epoch is 30 times more time-consuming than a low-quality data set. . . . .

After a long wait, all training is finally completed. Let’s take a look at the overall comparison and analysis:

【Precision Curve】
The Precision-Recall Curve is a visual tool used to evaluate the accuracy performance of a binary classification model under different thresholds. It helps us understand how the model performs at different thresholds by plotting the relationship between precision and recall at different thresholds.
Precision refers to the ratio of the number of samples that are correctly predicted as positive examples to the number of samples that are predicted to be positive examples. Recall refers to the ratio of the number of samples that are correctly predicted as positive examples to the number of samples that are actually positive examples.
The steps to draw the accuracy curve are as follows:
Convert predicted probabilities into binary class labels using different thresholds. Usually, when the predicted probability is greater than the threshold, the sample is classified as a positive example, otherwise it is classified as a negative example.
For each threshold, the corresponding precision and recall are calculated.
Plot precision and recall at each threshold on the same graph to form a precision curve.
Based on the shape and changing trend of the accuracy curve, an appropriate threshold can be selected to achieve the required performance requirements.
By observing the precision curve, we can determine the best threshold according to our needs to balance precision and recall. Higher precision means fewer false positives, while higher recall means fewer false negatives. Depending on specific business needs and cost trade-offs, appropriate operating points or thresholds can be selected on the curve.
Precision curves are often used together with recall curves to provide a more comprehensive analysis of classifier performance and help evaluate and compare the performance of different models.

【Recall Curve】
Recall Curve is a visualization tool used to evaluate the recall performance of a binary classification model under different thresholds. It helps us understand the performance of the model under different thresholds by plotting the relationship between the recall rate at different thresholds and the corresponding precision rate.
Recall refers to the ratio of the number of samples that are correctly predicted as positive examples to the number of samples that are actually positive examples. Recall rate is also called sensitivity (Sensitivity) or true positive rate (True Positive Rate).
The steps to draw the recall curve are as follows:
Convert predicted probabilities into binary class labels using different thresholds. Usually, when the predicted probability is greater than the threshold, the sample is classified as a positive example, otherwise it is classified as a negative example.
For each threshold, the corresponding recall rate and the corresponding precision rate are calculated.
Plot recall and precision at each threshold on the same graph to form a recall curve.
Based on the shape and changing trend of the recall curve, an appropriate threshold can be selected to achieve the required performance requirements.
By observing the recall curve, we can determine the best threshold according to our needs to balance recall and precision. Higher recall means fewer false negatives, while higher precision means fewer false positives. Depending on specific business needs and cost trade-offs, appropriate operating points or thresholds can be selected on the curve.
Recall curves are often used together with precision curves to provide a more comprehensive analysis of classifier performance and help evaluate and compare the performance of different models.

【F1 value curve】
The F1 value curve is a visual tool used to evaluate the performance of a binary classification model under different thresholds. It helps us understand the overall performance of the model by plotting the relationship between Precision, Recall and F1 score at different thresholds.
The F1 score is the harmonic average of precision and recall, which takes into account both performance indicators. The F1 value curve can help us determine a balance point between different precision and recall rates to choose the best threshold.
The steps to draw the F1 value curve are as follows:
Convert predicted probabilities into binary class labels using different thresholds. Usually, when the predicted probability is greater than the threshold, the sample is classified as a positive example, otherwise it is classified as a negative example.
For each threshold, the corresponding precision, recall and F1 score are calculated.
Plot the precision, recall and F1 score at each threshold on the same graph to form an F1 value curve.
According to the shape and changing trend of the F1 value curve, an appropriate threshold can be selected to achieve the required performance requirements.
F1 value curves are often used together with receiver operating characteristic curves (ROC curves) to help evaluate and compare the performance of different models. They provide a more comprehensive analysis of classifier performance, allowing the selection of appropriate models and threshold settings based on specific application scenarios.

[Full training loss curve]

From the overall comparison: It is not difficult to see that the high-quality data set has achieved overall leading results under the exact same training configuration. However, behind this rumored result is a dozens-fold increase in the amount of calculation. Is it true in actual projects? Whether this is necessary is still open to question, but this experiment also proves that if you want to blindly increase points to improve the effect, the data source is also a point that can be focused on.