Based on the full series of yolov5 [n/s/m/l/x] models with different parameter levels, a system for detecting and segmenting cement building wall defects and diseases in tunnel inspection scenarios was developed and constructed.

In the previous article, we developed and constructed a cave building disease defect detection and segmentation system in the culvert scenario based on the most lightweight n-series model of YOLOv5. If you are interested, you can read it by yourself:

“Development and construction of a cave wall defect detection and segmentation system in culvert scenarios based on lightweight yolov5n”

The core idea of this article is similar to the previous article, which is mainly to help automate manual operation and maintenance operations in scenarios related to tunnel inspection and quality inspection. In addition, this article develops and practices models of all parameter levels of the YOLOv5 series for actual analysis and comparison. How to choose the most appropriate model during business application development.

First look at the renderings:

Next look at the data set:

It comes from real scene collection and completes data annotation based on labelme.

The instance annotation data is as follows:

2 0.9898554752640356 0.45831017231795446 0.6580739299610895 0.48533135692668766 0.6459143968871596 0.4644864430856649 0.5 891699092088197 0.4629423753937373 0.5387947007596813 0.4637144092397011 0.48262923846581435 0.4783830523130134 0.45773114693 348155 0.47297881539126674 0.44383453770613307 0.4698906800074115 0.4241476746340559 0.47915508615897723 0.4212525477116917 0 .4930516953863258 0.4525199184732258 0.49845593230807245 0.5231610153789142 0.5046322030757829 0.5579025384472855 0.496139830 77018103 0.5909069853622383 0.49845593230807245 0.6349129145821752 0.4930516953863258 0.6517046507318881 0.4899635600024705 0 .6499675745784695 0.5108084738434934 0.659811006114508 0.5146686430733124 0.6621271076523995 0.5262491507627695 0.69107837687 0.546 3220307578284 0.9475866221975172 0.5687110122907788 0.9950667037242914 0.5748872830584892 0.9956457291087643 0.4629423753937373
4 0.6730859010270775 1.2249844382197324 0.6781921101774043 1.1919156551509493 0.6913223622782446 1.1666277622159975 0.68110 0.7073704481792717 1.0275443510737627 0.7044526143790849 0.9954481792717087 0.6956991129785247 0.9652972300031123 0.6646628641017992 0.904032872 7176327 0.6582941892832289 0.8916061901448123 0.6577672735760971 0.878734827264239 0.6665207749766573 0.8359399315281668 0.66 88568694701262 0.8141465354408987 0.6628734827264239 0.7766106442577031 0.6665207749766573 0.7338157485216309 0.6723564425770 308 0.6598972922502335 0.6723564425770308 0.6151571739807035 0.6691675365344467 0.6004075951883885 0.671808206581171 0.594815 5880306194 0.6738153594771242 0.5383208839091193 0.6767331932773109 0.4799642079053844 0.6701680672268908 0.4157718643012761 0.6657913165266107 0.3739495798319328 0.6716528730490108 0.3566375053848958 0.6722742071776517 0.34586771382178483 0.6727402077741326 0.3359263677635285 0.6760037348272642 0.318510 73762838467 0.6744488766278953 0.30755210922225534 0.6746042101600556 0.29636809490671706 0.6792642161248632 0.29077608774894 79 0.6794195496570234 0.283112966829042 0.6801962173178248 0.27669251416641816 0.6803804855275444 0.2630718954248366 0.684700 889750472 0.2580524903071876 0.6878075603936772 0.2466613646154356 0.6898268963117605 0.24024091195281175 0.6913223622782446 0.21346872082166202 0.7066409897292251 0.1638655462184874 0.7219596171802054 0.13760504201680673 0.7358193277310925 0.2387566 1375661378 0.741654995331466 0.2436196700902583 0.7394666199813259 0.15219421101774044 0.7489495798319328 0.11523498288204173 0.7606209150326797 0.10064581388110802 0.7744806255835668 0.07049486461251168 0.7701038748832867 0.01700124494242143 0.71028 82819794585 0.013110799875505775 0.7212301587301587 0.04131652661064428 0.7190417833800187 0.08022097727980083 0.7146650326797386 0.11426237161531282 0.708829365079365 0.1337145969498911 0.7098458235753318 0.143073 98733628246 0.7000878220140516 0.16746899123948306 0.690736403851158 0.19891144071471942 0.6826047358834244 0.241738225344782 73 0.6716269841269842 0.2834807875791483 0.6683743169398907 0.3165495706479314 0.6675611501431175 0.34636568652962096 0.66823 55353414852 0.3556019485038275 0.6606492323705438 0.37347124642206614 0.6683743169398907 0.44231936854887677 0.66918748373666 41 0.4694249284413219 0.6464188134270101 0.44828259172521473 0.6220238095238096 0.4260560326134097 0.6008814728077023 0.40870 84742822448 0.5878708040593287 0.4087084742822448 0.5785193858964351 0.4087084742822448 0.5642889669529014 0.4081663630843959 7 0.559816549570648 0.4059979182930003 0.5439597970335676 0.41141903027148935 0.5403005464480876 0.4179243646456761 0.5248503 773093938 0.4309350333940498 0.5142792089513402 0.4325613669875965 0.4951697892271663 0.43852459016393447 0.4858183710642727 0.44665625813166804 0.467928701535259 0.45424581490155264 0.45695094977881867 0.4607 511492757395 0.46955503512880564 0.4618353716714373 0.48378545407233936 0.4574984820886461 0.4996422066094198 0.4488247029230 636 0.5110265417642468 0.4396088125596323 0.520784543325527 0.4390667013617834 0.5350149622690606 0.42876658860265426 0.54680 58808222743 0.4206349206349207 0.5569704657819413 0.4146716974585828 0.5695745511319282 0.42117703183276956 0.586651053864168 6 0.41738225344782726 0.5980353890189957 0.41684014224997834 0.6122658079625294 0.4276823662069564 0.6155184751496228 0.43418 770058114325 0.6285291438979964 0.441235146153179 0.6334081446786366 0.4471983693295169 0.6533307311995836 0.4661722612542285 0.6671545667447307 0.485688264376789 0.6691874837366641 0.5073727122907451 0.6671545667447307 0.563752276867031 0.6667479833 463441 0.5897736143637783 0.6622755659640907 0.5984473935293608 0.6648181976339596 0.604549822712662 0.6665268664877223 0.6107631639990722 0.6667479833463441 0.625010842223957 0.6671545667447307 0.657537514 0948911 0.6639018995576373 0.7274698586173995 0.6594294821753839 0.7730071992367075 0.6614623991673172 0.792523202359268 0.66 41968635053185 0.8044123007588561 0.6643521970374787 0.821602544984591 0.662643528183716 0.8379643437054711 0.656119519832985 3 0.8520479172880008 0.6536341833184212 0.8698594956423766 0.6527021821254597 0.8845644033535475 0.6561195198329853 0.8965768 631739405 0.6742935430957351 0.9305431288729827 0.6887395615866387 0.9701014017297942 0.6864095586042349 1.0117307883487423 0 .6860988915399144 1.0471468336812804 0.6778662143354209 1.0732428670842031 0.6746042101600556 1.0962322298439209 0.6758468784 173376 1.1266776021473306 0.6808175514464657 1.1507025217881168 0.6834582214931901 0.6753808778208569 1.188 396792259005 0.6721188736454915 1.2022732544653212 0.6699442041952479 1.221327501076979 0.6699442041952479 1.2273337309871755

The default training configuration is exactly the same. Next, let’s look at the model results in turn:

【n】

The model file looks like this:

# YOLOv5  by Ultralytics, GPL-3.0 license

#Parameters
nc: 3 # number of classes
depth_multiple: 0.33 # model depth multiple
width_multiple: 0.25 # layer channel multiple
anchors:
  - [10,13, 16,30, 33,23] # P3/8
  - [30,61, 62,45, 59,119] # P4/16
  - [116,90, 156,198, 373,326] # P5/32

# YOLOv5 v6.0 backbone
backbone:
  # [from, number, module, args]
  [[-1, 1, Conv, [64, 6, 2, 2]], # 0-P1/2
   [-1, 1, Conv, [128, 3, 2]], # 1-P2/4
   [-1, 3, C3, [128]],
   [-1, 1, Conv, [256, 3, 2]], # 3-P3/8
   [-1, 6, C3, [256]],
   [-1, 1, Conv, [512, 3, 2]], # 5-P4/16
   [-1, 9, C3, [512]],
   [-1, 1, Conv, [1024, 3, 2]], # 7-P5/32
   [-1, 3, C3, [1024]],
   [-1, 1, SPPF, [1024, 5]], # 9
  ]

# YOLOv5 v6.0 head
head:
  [[-1, 1, Conv, [512, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 6], 1, Concat, [1]], # cat backbone P4
   [-1, 3, C3, [512, False]], # 13

   [-1, 1, Conv, [256, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 4], 1, Concat, [1]], # cat backbone P3
   [-1, 3, C3, [256, False]], # 17 (P3/8-small)

   [-1, 1, Conv, [256, 3, 2]],
   [[-1, 14], 1, Concat, [1]], # cat head P4
   [-1, 3, C3, [512, False]], # 20 (P4/16-medium)

   [-1, 1, Conv, [512, 3, 2]],
   [[-1, 10], 1, Concat, [1]], # cat head P5
   [-1, 3, C3, [1024, False]], # 23 (P5/32-large)

   [[17, 20, 23], 1, Segment, [nc, anchors, 32, 256]], # Detect(P3, P4, P5)
  ]

The result details are as follows:

【s】

The model file looks like this:

# YOLOv5  by Ultralytics, GPL-3.0 license

#Parameters
nc: 80 # number of classes
depth_multiple: 0.33 # model depth multiple
width_multiple: 0.5 # layer channel multiple
anchors:
  - [10,13, 16,30, 33,23] # P3/8
  - [30,61, 62,45, 59,119] # P4/16
  - [116,90, 156,198, 373,326] # P5/32

# YOLOv5 v6.0 backbone
backbone:
  # [from, number, module, args]
  [[-1, 1, Conv, [64, 6, 2, 2]], # 0-P1/2
   [-1, 1, Conv, [128, 3, 2]], # 1-P2/4
   [-1, 3, C3, [128]],
   [-1, 1, Conv, [256, 3, 2]], # 3-P3/8
   [-1, 6, C3, [256]],
   [-1, 1, Conv, [512, 3, 2]], # 5-P4/16
   [-1, 9, C3, [512]],
   [-1, 1, Conv, [1024, 3, 2]], # 7-P5/32
   [-1, 3, C3, [1024]],
   [-1, 1, SPPF, [1024, 5]], # 9
  ]

# YOLOv5 v6.0 head
head:
  [[-1, 1, Conv, [512, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 6], 1, Concat, [1]], # cat backbone P4
   [-1, 3, C3, [512, False]], # 13

   [-1, 1, Conv, [256, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 4], 1, Concat, [1]], # cat backbone P3
   [-1, 3, C3, [256, False]], # 17 (P3/8-small)

   [-1, 1, Conv, [256, 3, 2]],
   [[-1, 14], 1, Concat, [1]], # cat head P4
   [-1, 3, C3, [512, False]], # 20 (P4/16-medium)

   [-1, 1, Conv, [512, 3, 2]],
   [[-1, 10], 1, Concat, [1]], # cat head P5
   [-1, 3, C3, [1024, False]], # 23 (P5/32-large)

   [[17, 20, 23], 1, Segment, [nc, anchors, 32, 256]], # Detect(P3, P4, P5)
  ]

The result details are as follows:

【m】

The model file looks like this:

# YOLOv5  by Ultralytics, GPL-3.0 license

#Parameters
nc: 80 # number of classes
depth_multiple: 0.67 # model depth multiple
width_multiple: 0.75 # layer channel multiple
anchors:
  - [10,13, 16,30, 33,23] # P3/8
  - [30,61, 62,45, 59,119] # P4/16
  - [116,90, 156,198, 373,326] # P5/32

# YOLOv5 v6.0 backbone
backbone:
  # [from, number, module, args]
  [[-1, 1, Conv, [64, 6, 2, 2]], # 0-P1/2
   [-1, 1, Conv, [128, 3, 2]], # 1-P2/4
   [-1, 3, C3, [128]],
   [-1, 1, Conv, [256, 3, 2]], # 3-P3/8
   [-1, 6, C3, [256]],
   [-1, 1, Conv, [512, 3, 2]], # 5-P4/16
   [-1, 9, C3, [512]],
   [-1, 1, Conv, [1024, 3, 2]], # 7-P5/32
   [-1, 3, C3, [1024]],
   [-1, 1, SPPF, [1024, 5]], # 9
  ]

# YOLOv5 v6.0 head
head:
  [[-1, 1, Conv, [512, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 6], 1, Concat, [1]], # cat backbone P4
   [-1, 3, C3, [512, False]], # 13

   [-1, 1, Conv, [256, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 4], 1, Concat, [1]], # cat backbone P3
   [-1, 3, C3, [256, False]], # 17 (P3/8-small)

   [-1, 1, Conv, [256, 3, 2]],
   [[-1, 14], 1, Concat, [1]], # cat head P4
   [-1, 3, C3, [512, False]], # 20 (P4/16-medium)

   [-1, 1, Conv, [512, 3, 2]],
   [[-1, 10], 1, Concat, [1]], # cat head P5
   [-1, 3, C3, [1024, False]], # 23 (P5/32-large)

   [[17, 20, 23], 1, Segment, [nc, anchors, 32, 256]], # Detect(P3, P4, P5)
  ]

The result details are as follows:

【l】

The model file looks like this:

# YOLOv5  by Ultralytics, GPL-3.0 license

#Parameters
nc: 80 # number of classes
depth_multiple: 1.0 # model depth multiple
width_multiple: 1.0 # layer channel multiple
anchors:
  - [10,13, 16,30, 33,23] # P3/8
  - [30,61, 62,45, 59,119] # P4/16
  - [116,90, 156,198, 373,326] # P5/32

# YOLOv5 v6.0 backbone
backbone:
  # [from, number, module, args]
  [[-1, 1, Conv, [64, 6, 2, 2]], # 0-P1/2
   [-1, 1, Conv, [128, 3, 2]], # 1-P2/4
   [-1, 3, C3, [128]],
   [-1, 1, Conv, [256, 3, 2]], # 3-P3/8
   [-1, 6, C3, [256]],
   [-1, 1, Conv, [512, 3, 2]], # 5-P4/16
   [-1, 9, C3, [512]],
   [-1, 1, Conv, [1024, 3, 2]], # 7-P5/32
   [-1, 3, C3, [1024]],
   [-1, 1, SPPF, [1024, 5]], # 9
  ]

# YOLOv5 v6.0 head
head:
  [[-1, 1, Conv, [512, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 6], 1, Concat, [1]], # cat backbone P4
   [-1, 3, C3, [512, False]], # 13

   [-1, 1, Conv, [256, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 4], 1, Concat, [1]], # cat backbone P3
   [-1, 3, C3, [256, False]], # 17 (P3/8-small)

   [-1, 1, Conv, [256, 3, 2]],
   [[-1, 14], 1, Concat, [1]], # cat head P4
   [-1, 3, C3, [512, False]], # 20 (P4/16-medium)

   [-1, 1, Conv, [512, 3, 2]],
   [[-1, 10], 1, Concat, [1]], # cat head P5
   [-1, 3, C3, [1024, False]], # 23 (P5/32-large)

   [[17, 20, 23], 1, Segment, [nc, anchors, 32, 256]], # Detect(P3, P4, P5)
  ]

The result details are as follows:

【x】

The model file looks like this:

# YOLOv5  by Ultralytics, GPL-3.0 license

#Parameters
nc: 80 # number of classes
depth_multiple: 1.33 # model depth multiple
width_multiple: 1.25 # layer channel multiple
anchors:
  - [10,13, 16,30, 33,23] # P3/8
  - [30,61, 62,45, 59,119] # P4/16
  - [116,90, 156,198, 373,326] # P5/32

# YOLOv5 v6.0 backbone
backbone:
  # [from, number, module, args]
  [[-1, 1, Conv, [64, 6, 2, 2]], # 0-P1/2
   [-1, 1, Conv, [128, 3, 2]], # 1-P2/4
   [-1, 3, C3, [128]],
   [-1, 1, Conv, [256, 3, 2]], # 3-P3/8
   [-1, 6, C3, [256]],
   [-1, 1, Conv, [512, 3, 2]], # 5-P4/16
   [-1, 9, C3, [512]],
   [-1, 1, Conv, [1024, 3, 2]], # 7-P5/32
   [-1, 3, C3, [1024]],
   [-1, 1, SPPF, [1024, 5]], # 9
  ]

# YOLOv5 v6.0 head
head:
  [[-1, 1, Conv, [512, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 6], 1, Concat, [1]], # cat backbone P4
   [-1, 3, C3, [512, False]], # 13

   [-1, 1, Conv, [256, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 4], 1, Concat, [1]], # cat backbone P3
   [-1, 3, C3, [256, False]], # 17 (P3/8-small)

   [-1, 1, Conv, [256, 3, 2]],
   [[-1, 14], 1, Concat, [1]], # cat head P4
   [-1, 3, C3, [512, False]], # 20 (P4/16-medium)

   [-1, 1, Conv, [512, 3, 2]],
   [[-1, 10], 1, Concat, [1]], # cat head P5
   [-1, 3, C3, [1024, False]], # 23 (P5/32-large)

   [[17, 20, 23], 1, Segment, [nc, anchors, 32, 256]], # Detect(P3, P4, P5)
  ]

The result details are as follows:

Finally, in order to facilitate the overall comparative analysis of different series of models, a comprehensive comparison and visualization is performed here as follows:

【Precision Curve】
The Precision-Recall Curve is a visual tool used to evaluate the accuracy performance of a binary classification model under different thresholds. It helps us understand how the model performs at different thresholds by plotting the relationship between precision and recall at different thresholds.
Precision refers to the ratio of the number of samples that are correctly predicted as positive examples to the number of samples that are predicted to be positive examples. Recall refers to the ratio of the number of samples that are correctly predicted as positive examples to the number of samples that are actually positive examples.
The steps to draw the accuracy curve are as follows:
Convert predicted probabilities into binary class labels using different thresholds. Usually, when the predicted probability is greater than the threshold, the sample is classified as a positive example, otherwise it is classified as a negative example.
For each threshold, the corresponding precision and recall are calculated.
Plot precision and recall at each threshold on the same graph to form a precision curve.
Based on the shape and changing trend of the accuracy curve, an appropriate threshold can be selected to achieve the required performance requirements.
By observing the precision curve, we can determine the best threshold according to our needs to balance precision and recall. Higher precision means fewer false positives, while higher recall means fewer false negatives. Depending on specific business needs and cost trade-offs, appropriate operating points or thresholds can be selected on the curve.
Precision curves are often used together with recall curves to provide a more comprehensive analysis of classifier performance and help evaluate and compare the performance of different models.

【Recall Curve】
Recall Curve is a visualization tool used to evaluate the recall performance of a binary classification model under different thresholds. It helps us understand the performance of the model under different thresholds by plotting the relationship between the recall rate at different thresholds and the corresponding precision rate.
Recall refers to the ratio of the number of samples that are correctly predicted as positive examples to the number of samples that are actually positive examples. Recall rate is also called sensitivity (Sensitivity) or true positive rate (True Positive Rate).
The steps to draw the recall curve are as follows:
Convert predicted probabilities into binary class labels using different thresholds. Usually, when the predicted probability is greater than the threshold, the sample is classified as a positive example, otherwise it is classified as a negative example.
For each threshold, the corresponding recall rate and the corresponding precision rate are calculated.
Plot recall and precision at each threshold on the same graph to form a recall curve.
Based on the shape and changing trend of the recall curve, an appropriate threshold can be selected to achieve the required performance requirements.
By observing the recall curve, we can determine the best threshold according to our needs to balance recall and precision. Higher recall means fewer false negatives, while higher precision means fewer false positives. Depending on specific business needs and cost trade-offs, appropriate operating points or thresholds can be selected on the curve.
Recall curves are often used together with precision curves to provide a more comprehensive analysis of classifier performance and help evaluate and compare the performance of different models.

【F1 value curve】
The F1 value curve is a visual tool used to evaluate the performance of a binary classification model under different thresholds. It helps us understand the overall performance of the model by plotting the relationship between Precision, Recall and F1 score at different thresholds.
The F1 score is the harmonic average of precision and recall, which takes into account both performance indicators. The F1 value curve can help us determine a balance point between different precision and recall rates to choose the best threshold.
The steps to draw the F1 value curve are as follows:
Convert predicted probabilities into binary class labels using different thresholds. Usually, when the predicted probability is greater than the threshold, the sample is classified as a positive example, otherwise it is classified as a negative example.

【loss curve】

【mAP0.5】

mAP0.5 (Mean Average Precision at 0.5) is one of the commonly used evaluation indicators in target detection tasks. mAP0.5 measures the average precision when the intersection-over-union (IoU) threshold is 0.5. It measures the accuracy of an object detection algorithm in identifying and locating objects.

In the object detection task, the algorithm needs to predict the bounding box and category of the object, and the mAP0.5 indicator is used to evaluate the accuracy of these predictions. Specifically, for each category, mAP0.5 calculates the accuracy when the IoU threshold is 0.5, and averages the accuracy across all categories to get the final mAP0.5 score.

IoU refers to the overlap measure between predicted and ground-truth bounding boxes. When the IoU between the predicted bounding box and the real bounding box is greater than or equal to 0.5, it is considered a correct detection. mAP0.5 considers the prediction results of different categories, ranks the predictions of each category, calculates the accuracy under different thresholds, and finally takes the average to get the final score.

【mAP0.5:mAP0.95】

mAP0.5 and mAP0.95 are two commonly used evaluation indicators in target detection tasks. They respectively measure the average accuracy of the algorithm under different IoU thresholds.

mAP0.5 refers to the average accuracy of the algorithm when the IoU threshold is 0.5. It can reflect the accuracy of the algorithm in identifying targets that change significantly. The calculation method of mAP0.5 is to set the IoU threshold between the predicted box and the real box to 0.5, calculate the overlap area between the predicted box and the real box, and then calculate the accuracy based on the overlap area between the predicted box and the real box for each category. Finally, Take the average to get mAP0.5.

mAP0.95 refers to the average accuracy of the algorithm when the IoU threshold is 0.95. It can reflect the accuracy of the algorithm in identifying targets with small changes. mAP0.95 is calculated similarly to mAP0.5 except that the IoU threshold is set to 0.95.

Above, we have comprehensively compared and visualized the performance indicators of models with different parameter levels. On each picture, we have drawn an overall comparison chart and an indicator analysis chart of a single model. Overall, the m series model has the most appropriate calculation Balance of complexity and performance results. If you are interested, you can try it yourself!