yolov5 pruning and knowledge distillation [with code]

Both pruning and knowledge distillation belong to the lightweight design of the model. Pruning is to obtain a lightweight network by pruning the existing network, which can be divided into unstructured pruning and structured Cut, this technology can save the artificial design of lightweight network, but by calculating the contribution of each weight or channel, cut off the weight or channel with small contribution, and then Restore accuracy after fine-tuning training to get the final model. This method is naturally possible, but in some tasks, if there are many prunings, the effect will be poor, and even fine-tuning training will not restore much accuracy.

The pruning used in this article is channel pruning (structured pruning), you can refer to my other blog (this article has been collected by multiple open source communities, so it is worth a try): YOLOv5 channel pruning, while in my other YOLOV4, YOLOX, YOLOR, YOLOV7 and other prunings have also been implemented in the blog. Welcome to like and collect.

Knowledge distillation is to establish a loss function between a large model with high precision and a small model with low precision, and “compress” the large model into a small model [not strictly compressed]. This is also a method that has been used more frequently in the past two years. The previous knowledge distillation was carried out in the classification network, and now it is also applied to target detection. The knowledge distillation of the classification network can refer to: knowledge distillation, self-distillation

Knowledge Distillation Reference for Target Detection: SSD Knowledge Distillation

The distillation methods of knowledge distillation include online and offline, and can also be divided into feature distillation and logical distillation. The code I posted here is an offline logical distillation.

Directory

project instruction

Environmental description

1. Train your own dataset

2. Pruning any convolutional layer

3. Training after pruning

4. Model prediction after pruning

5. Knowledge distillation training

the code


Project Description

1. Train your own dataset

2. Pruning any convolutional layer

3. Training after pruning

4. Model prediction after pruning

5. Use knowledge distillation to train the pruned model

Environment description

gitpython>=3.1.30
matplotlib>=3.3
numpy>=1.18.5
opencv-python>=4.1.1
Pillow>=7.1.2
psutil # system resources
PyYAML>=5.3.1
requests>=2.23.0
scipy>=1.4.1
thop>=0.1.1 # FLOPs computation
torch>=1.7.0 # see https://pytorch.org/get-started/locally (recommended)
torchvision>=0.8.1
tqdm>=4.64.0
ultralytics>=8.0.100
torch_pruning==0.2.7
pandas>=1.1.4
seaborn>=0.11.0

1. Train your own data set

Put the dataset you made yourself under the dataset file, the directory format is as follows:

dataset

|– Annotations

|– ImageSets

|– images

|– labels

Annotations are for storing xml tag files, images are for storing images, ImageSets are for storing four txt files [it will be automatically generated when the code is run later ], labels is to convert xml to txt file.

1. Run makeTXT.py. This will generate four files trainval.txt, test.txt, train.txt, and val.txt in the ImageSets folder [if you open these txt files, there are only image names in them].

2. Open voc_label.py, and modify the code classes=[“”] to fill in your own class name. For example, if you are training cats and dogs, then it is classes=[“dog”,”cat” ], and then run the program. At this time, a txt file corresponding to each image will be generated under the labels file, and the form is as follows: [The first 0 is the index corresponding to the class. I only have one class here, and the next four numbers are the parameters of the box, all of which are normalized In the future, they represent the upper left and lower right coordinates of the box respectively, and will be processed into center_x, center_y, w, h when training. The form is as follows.

0 0.4723557692307693 0.5408653846153847 0.34375 0.8990384615384616
0 0.8834134615384616 0.5793269230769231 0.21875 0.8221153846153847

3. Create a new mydata.yaml file under the data folder. The content is as follows [you can also copy coco.yaml].

You only need to modify nc and names, nc is the number of classes, and names is the name of the class.

train: ./dataset/train.txt
val: ./dataset/val.txt
test: ./dataset/test.txt

# number of classes
nc: 1

# class names
names: [‘target’]

4. Terminal input parameters to start training.

Take yolov5s as an example:

python train.py --weights yolov5s.pt --cfg models/yolov5s.yaml --data data/mydata.yaml

from n params module arguments 0 -1 1 3520 models.common.Conv [3, 32, 6, 2, 2] 1 -1 1 18560 models.common.Conv [32, 64, 3, 2] 2 -1 1 18816 models.common.C3 [64, 64, 1] 3 -1 1 73984 models.common.Conv [64, 128, 3, 2] 4 -1 2 115712 models.common.C3 [128, 128, 2] 5 – 1 1 295424 models.common.Conv [128, 256, 3, 2] 6 -1 3 625152 models.common.C3 [256, 256, 3] 7 -1 1 1180672 models.common.Conv [256, 512, 3 , 2] 8 -1 1 1182720 models.common.C3 [512, 512, 1] 9 -1 1 656896 models.common.SPPF [512, 512, 5] 10 -1 1 131584 models.common.Conv [512, 256, 1, 1] 11 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, ‘nearest’] 12 [-1, 6] 1 0 models.common.Concat [1] 13 – 1 1 361984 models.common.C3 [512, 256, 1, False] 20 -1 1 296448 models.common.C3 [256, 256, 1, False] 21 -1 1 590336 models.common.Conv [256, 256 , 3, 2] 22 [-1, 10] 1 0 models.common.Concat [1] 23 -1 1 1182720 models.common.C3 [512, 512, 1, False] 24 [17, 20, 23] 1 16182 models.yolo.Detect [1, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326 ]], [128, 256, 512]] Model Summary: 270 layers, 7022326 parameters, 7022326 gradients, 15.8 GFLOPs

Starting training for 300 epochs…

Epoch gpu_mem box obj cls labels img_size 0/299 0.589G 0.0779 0.03841 0 4 640: 6%|████▋ | 23/359 [00:23<04:15, 1.31it/s]

See the above information and start training.

2. Pruning any convolutional layer

Before using the pruning function, you need to install the pruning library. Version 0.2.7 needs to be installed, some fans of 0.2.8 say there is a problem. Some log information during pruning will be automatically saved in the logs folder. I set the size of each log to 1MB, and you can change it if you have other needs.

pip install torch_pruning==0.2.7

YOLOv5 is different from the pruning I wrote before. The weight of v5 after training and saving saves the complete model, that is, torch.save(model,…) is used instead of torch.save(model.state_dict (),…), so there is no need to save the network structure once.

Model pruning code is in tools/prunmodel.py. You only need to find this part of the code and modify it: I am taking pruning the convolutional layer of the entire backbone as an example. If you want to prune other layers modify as needed. Included_layers is the layer you want to prune .

"""
    Write the layer to be pruned here
    """
    included_layers = []
    for layer in model.model[:10]:
        if type(layer) is Conv:
            included_layers.append(layer.conv)
        elif type(layer) is C3:
            included_layers.append(layer.cv1.conv)
            included_layers.append(layer.cv2.conv)
            included_layers.append(layer.cv3.conv)
        elif type(layer) is SPPF:
            included_layers.append(layer.cv1.conv)
            included_layers.append(layer.cv2.conv)

Next, find the following line of code, the amount is the pruning rate, which is also modified as needed. [One thing to understand here is that the pruning rate here is only so much for all the layers you want to prune, not for pruning the entire network from the beginning to the end. Some fans said that I chose a layer with a pruning rate of 50 %, why the model is still so big, there is no change, this is because he was confused, he thought it was pruning the entire network by 50%].

pruning_plan = DG.get_pruning_plan(m, tp.prune_conv, idxs=strategy(m.weight, amount=0.8))

Next, call the pruning function, and the incoming parameter is the path of your trained weight file.

layer_pruning('../runs/train/exp/weights/best.pt')

If you see the following form, it means that the pruning is successful, and the weight after pruning will be saved under model_data, named layer_pruning.pt.

It needs to be explained here that the saved weight file not only contains the network structure and weight content, but also the weight of the optimizer. It is also possible to save only the network structure and weight, so that pt will A little smaller, I save here by default to be consistent with the official pt format.

————-
[ prune_conv on model.9.cv2.conv (Conv2d(208, 512, kernel_size=(1, 1), stride=(1, 1), bias=False))>, Index=[0 , 1, 2, 3, 7, 8, 10, 11, 12, 13, 16, 17, 18, 19, 21, 22, 23, 25, 27, 28, 29, 30, 31, 32, 33, 34 , 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 53, 54, 56, 57, 58, 59, 60, 61, 62 , 63, 65, 67, 69, 70, 71, 72, 73, 74, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 89, 90, 91, 92 , 95, 96, 97, 99, 100, 102, 103, 104, 105, 106, 107, 109, 110, 111, 113, 114, 115, 117, 118, 119, 120, 121, 122, 123, 124 , 125, 126, 127, 128, 129, 130, 132, 133, 135, 137, 139, 142, 143, 144, 146, 148, 150, 152, 153, 154, 155, 156, 157, 158, 159 , 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 173, 174, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189 , 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 215, 216 , 217, 219, 220, 221, 222, 223, 224, 225, 226, 228, 229, 230, 232, 233, 234, 235, 236, 237, 239, 240, 241, 242, 243, 246, 247 , 248, 249, 251, 252, 253, 254, 257, 258, 259, 260, 263, 264, 265, 266, 267, 268, 270, 271, 272, 273, 274, 275, 276, 277, 278 , 280, 281, 282, 283, 284, 285, 286, 287, 288, 292, 293, 294, 295, 296, 297, 299, 301, 302, 303, 306, 307, 308, 309, 310, 311 , 312, 313, 314, 315, 317, 318, 321, 322, 323, 324, 325, 326, 327, 329, 330, 331, 332, 334, 335, 338, 339, 341, 342, 343, 344 , 346, 347, 349, 351, 353, 354, 355, 356, 357, 358, 359, 361, 362, 363, 364, 365, 366, 368, 369, 370, 372, 373, 374, 375, 378 , 379, 381, 382, 383, 385, 386, 387, 388, 389, 390, 391, 392, 393, 395, 396, 397, 398, 399, 401, 402, 403, 404, 405, 407, 408 , 411, 413, 414, 415, 416, 418, 419, 420, 421, 422, 423, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438 , 440, 441, 442, 443, 444, 445, 446, 448, 449, 451, 452, 453, 454, 455, 456, 457, 458, 459, 461, 463, 465, 466, 468, 470, 472 , 473, 474, 475, 476, 477, 478, 479, 480, 482, 483, 484, 485, 486, 487, 488, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499 , 500, 502, 503, 505, 506, 507, 510, 511], NumPruned=85072]
[ prune_batchnorm on model.9.cv2.bn (BatchNorm2d(512, eps=0.001, momentum=0.03, affine=True, track_running_stats=True))>, Index=[0, 1, 2, 3 , 7, 8, 10, 11, 12, 13, 16, 17, 18, 19, 21, 22, 23, 25, 27, 28, 29, 30, 31, 32, 33, 34, 36, 37, 38 , 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 53, 54, 56, 57, 58, 59, 60, 61, 62, 63, 65, 67 , 69, 70, 71, 72, 73, 74, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 89, 90, 91, 92, 95, 96, 97 , 99, 100, 102, 103, 104, 105, 106, 107, 109, 110, 111, 113, 114, 115, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127 , 128, 129, 130, 132, 133, 135, 137, 139, 142, 143, 144, 146, 148, 150, 152, 153, 154, 155, 156, 157, 158, 159, 161, 162, 163 , 164, 165, 166, 167, 168, 169, 170, 171, 173, 174, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 191, 192, 193 , 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 215, 216, 217, 219, 220 , 221, 222, 223, 224, 225, 226, 228, 229, 230, 232, 233, 234, 235, 236, 237, 239, 240, 241, 242, 243, 246, 247, 248, 249, 251 , 252, 253, 254, 257, 258, 259, 260, 263, 264, 265, 266, 267, 268, 270, 271, 272, 273, 274, 275, 276, 277, 278, 280, 281, 282 , 283, 284, 285, 286, 287, 288, 292, 293, 294, 295, 296, 297, 299, 301, 302, 303, 306, 307, 308, 309, 310, 311, 312, 313, 314 , 315, 317, 318, 321, 322, 323, 324, 325, 326, 327, 329, 330, 331, 332, 334, 335, 338, 339, 341, 342, 343, 344, 346, 347, 349 , 351, 353, 354, 355, 356, 357, 358, 359, 361, 362, 363, 364, 365, 366, 368, 369, 370, 372, 373, 374, 375, 378, 379, 381, 382 , 383, 385, 386, 387, 388, 389, 390, 391, 392, 393, 395, 396, 397, 398, 399, 401, 402, 403, 404, 405, 407, 408, 411, 413, 414 , 415, 416, 418, 419, 420, 421, 422, 423, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 440, 441, 442 , 443, 444, 445, 446, 448, 449, 451, 452, 453, 454, 455, 456, 457, 458, 459, 461, 463, 465, 466, 468, 470, 472, 473, 474, 475 , 476, 477, 478, 479, 480, 482, 483, 484, 485, 486, 487, 488, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, 500, 502, 503 , 505, 506, 507, 510, 511], NumPruned=818]
[ _prune_elementwise_op on _ElementWiseOp()>, Index=[0, 1, 2, 3, 7, 8, 10, 11, 12, 13, 16, 17, 18, 19, 21, 22, 23 , 25, 27, 28, 29, 30, 31, 32, 33, 34, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51 , 53, 54, 56, 57, 58, 59, 60, 61, 62, 63, 65, 67, 69, 70, 71, 72, 73, 74, 76, 77, 78, 79, 80, 81, 82 , 83, 84, 85, 86, 87, 89, 90, 91, 92, 95, 96, 97, 99, 100, 102, 103, 104, 105, 106, 107, 109, 110, 111, 113, 114 , 115, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 132, 133, 135, 137, 139, 142, 143, 144, 146, 148 , 150, 152, 153, 154, 155, 156, 157, 158, 159, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 173, 174, 178, 179, 180 , 181, 182, 183, 184, 185, 186, 187, 188, 189, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206 , 207, 208, 209, 210, 211, 212, 213, 215, 216, 217, 219, 220, 221, 222, 223, 224, 225, 226, 228, 229, 230, 232, 233, 234, 235 , 236, 237, 239, 240, 241, 242, 243, 246, 247, 248, 249, 251, 252, 253, 254, 257, 258, 259, 260, 263, 264, 265, 266, 267, 268 , 270, 271, 272, 273, 274, 275, 276, 277, 278, 280, 281, 282, 283, 284, 285, 286, 287, 288, 292, 293, 294, 295, 296, 297, 299 , 301, 302, 303, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 317, 318, 321, 322, 323, 324, 325, 326, 327, 329, 330, 331 , 332, 334, 335, 338, 339, 341, 342, 343, 344, 346, 347, 349, 351, 353, 354, 355, 356, 357, 358, 359, 361, 362, 363, 364, 365 , 366, 368, 369, 370, 372, 373, 374, 375, 378, 379, 381, 382, 383, 385, 386, 387, 388, 389, 390, 391, 392, 393, 395, 396, 397 , 398, 399, 401, 402, 403, 404, 405, 407, 408, 411, 413, 414, 415, 416, 418, 419, 420, 421, 422, 423, 425, 426, 427, 428, 429 , 430, 431, 432, 433, 434, 435, 436, 437, 438, 440, 441, 442, 443, 444, 445, 446, 448, 449, 451, 452, 453, 454, 455, 456, 457 , 458, 459, 461, 463, 465, 466, 468, 470, 472, 473, 474, 475, 476, 477, 478, 479, 480, 482, 483, 484, 485, 486, 487, 488, 490 , 491, 492, 493, 494, 495, 496, 497, 498, 499, 500, 502, 503, 505, 506, 507, 510, 511], NumPruned=0]
[ _prune_elementwise_op on _ElementWiseOp()>, Index=[0, 1, 2, 3, 7, 8, 10, 11, 12, 13, 16, 17, 18, 19, 21, 22, 23 , 25, 27, 28, 29, 30, 31, 32, 33, 34, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51 , 53, 54, 56, 57, 58, 59, 60, 61, 62, 63, 65, 67, 69, 70, 71, 72, 73, 74, 76, 77, 78, 79, 80, 81, 82 , 83, 84, 85, 86, 87, 89, 90, 91, 92, 95, 96, 97, 99, 100, 102, 103, 104, 105, 106, 107, 109, 110, 111, 113, 114 , 115, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 132, 133, 135, 137, 139, 142, 143, 144, 146, 148 , 150, 152, 153, 154, 155, 156, 157, 158, 159, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 173, 174, 178, 179, 180 , 181, 182, 183, 184, 185, 186, 187, 188, 189, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206 , 207, 208, 209, 210, 211, 212, 213, 215, 216, 217, 219, 220, 221, 222, 223, 224, 225, 226, 228, 229, 230, 232, 233, 234, 235 , 236, 237, 239, 240, 241, 242, 243, 246, 247, 248, 249, 251, 252, 253, 254, 257, 258, 259, 260, 263, 264, 265, 266, 267, 268 , 270, 271, 272, 273, 274, 275, 276, 277, 278, 280, 281, 282, 283, 284, 285, 286, 287, 288, 292, 293, 294, 295, 296, 297, 299 , 301, 302, 303, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 317, 318, 321, 322, 323, 324, 325, 326, 327, 329, 330, 331 , 332, 334, 335, 338, 339, 341, 342, 343, 344, 346, 347, 349, 351, 353, 354, 355, 356, 357, 358, 359, 361, 362, 363, 364, 365 , 366, 368, 369, 370, 372, 373, 374, 375, 378, 379, 381, 382, 383, 385, 386, 387, 388, 389, 390, 391, 392, 393, 395, 396, 397 , 398, 399, 401, 402, 403, 404, 405, 407, 408, 411, 413, 414, 415, 416, 418, 419, 420, 421, 422, 423, 425, 426, 427, 428, 429 , 430, 431, 432, 433, 434, 435, 436, 437, 438, 440, 441, 442, 443, 444, 445, 446, 448, 449, 451, 452, 453, 454, 455, 456, 457 , 458, 459, 461, 463, 465, 466, 468, 470, 472, 473, 474, 475, 476, 477, 478, 479, 480, 482, 483, 484, 485, 486, 487, 488, 490 , 491, 492, 493, 494, 495, 496, 497, 498, 499, 500, 502, 503, 505, 506, 507, 510, 511], NumPruned=0]
[ prune_related_conv on model.10.conv (Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False))>, Index=[0, 1 , 2, 3, 7, 8, 10, 11, 12, 13, 16, 17, 18, 19, 21, 22, 23, 25, 27, 28, 29, 30, 31, 32, 33, 34, 36 , 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 53, 54, 56, 57, 58, 59, 60, 61, 62, 63 , 65, 67, 69, 70, 71, 72, 73, 74, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 89, 90, 91, 92, 95 , 96, 97, 99, 100, 102, 103, 104, 105, 106, 107, 109, 110, 111, 113, 114, 115, 117, 118, 119, 120, 121, 122, 123, 124, 12 5 , 126, 127, 128, 129, 130, 132, 133, 135, 137, 139, 142, 143, 144, 146, 148, 150, 152, 153, 154, 155, 156, 157, 158, 159, 161 , 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 173, 174, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 191 , 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 215, 216, 217 , 219, 220, 221, 222, 223, 224, 225, 226, 228, 229, 230, 232, 233, 234, 235, 236, 237, 239, 240, 241, 242, 243, 246, 247, 248 , 249, 251, 252, 253, 254, 257, 258, 259, 260, 263, 264, 265, 266, 267, 268, 270, 271, 272, 273, 274, 275, 276, 277, 278, 280 , 281, 282, 283, 284, 285, 286, 287, 288, 292, 293, 294, 295, 296, 297, 299, 301, 302, 303, 306, 307, 308, 309, 310, 311, 312 , 313, 314, 315, 317, 318, 321, 322, 323, 324, 325, 326, 327, 329, 330, 331, 332, 334, 335, 338, 339, 341, 342, 343, 344, 346 , 347, 349, 351, 353, 354, 355, 356, 357, 358, 359, 361, 362, 363, 364, 365, 366, 368, 369, 370, 372, 373, 374, 375, 378, 379 , 381, 382, 383, 385, 386, 387, 388, 389, 390, 391, 392, 393, 395, 396, 397, 398, 399, 401, 402, 403, 404, 405, 407, 408, 411 , 413, 414, 415, 416, 418, 419, 420, 421, 422, 423, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 440 , 441, 442, 443, 444, 445, 446, 448, 449, 451, 452, 453, 454, 455, 456, 457, 458, 459, 461, 463, 465, 466, 468, 470, 472, 473 , 474, 475, 476, 477, 478, 479, 480, 482, 483, 484, 485, 486, 487, 488, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, 500 , 502, 503, 505, 506, 507, 510, 511], NumPruned=104704]
190594 parameters will be pruned
————-

2022-09-29 12:30:50.396 | INFO | __main__:layer_pruning:75 – Params: 7022326 => 3056461

2022-09-29 12:30:50.691 | INFO | __main__:layer_pruning:89 – Pruning complete

If you just want to cut a layer, you can write it like this:

included_layers = [model.model[3].conv] # just want to cut a convolutional layer

3. Training after pruning

It needs to be distinguished from sparse training here, because many people asked me if I had sparse training in previous projects. My channel pruning here is offline, that is, Pruning for the trained model, while training while pruning is online pruning. This training process is also sparse training. So there is still a difference.

The pruning training after training is the same as the training part, except that a pt parameter is added. The command is as follows:

python train.py --weights model_data/layer_pruning.pt --data data/mydata.yaml --pt 

4. Model prediction after pruning

The prediction after pruning is the same as the normal prediction.

python detect.py --weights model_data/layer_pruning.pt --source [your image path]

Let me explain again here! ! This article is just to create a wheel for everyone. The specific final pruning effect needs to be realized according to your own needs and actual effects. My fine-tuning training after pruning 80% of the entire backbone is not effective anyway. After SPPF, other The layer pruning is a little better, and many people on the Internet say that the effect of backbone pruning is not good.

5. Knowledge distillation training

Project requirements: I want to use knowledge distillation to do fine-tuning training of the pruned network

Teacher Network: Unpruned

Student Network: Pruned

Since the student network is pruned, it can be detached from the model’s yaml configuration file.

The knowledge distillation of this project is Logical distillation (distillation without feature layer).

Model instantiation code

s_ckpt = torch.load(s_weights, map_location=device)
s_model = s_ckpt['model'] # student network

# Creation of teacher network
t_ckpt = torch.load(t_weights, map_location=device)
t_model = Model(t_cfg, ch=3, nc=nc, anchors=hyp.get('anchors')).to(device) # teacher model create

Key code for distillation:

Where d_weight is the distillation weight. You can adjust it according to your actual situation.

s_pred = s_model(imgs) # student forward
_, t_pred = t_model(imgs) # teacher forward
s_hard_loss, loss_items = compute_loss(s_pred, targets.to(device)) # student hard loss
d_outputs_loss = compute_distillation_output_loss(s_pred, t_pred, s_model, d_weight=10)
loss = d_outputs_loss + s_hard_loss

–t_weights: teacher network weight path

–s_weights: student network weight path

–data:data.yaml path

–kd: enable distillation training

python train_dil.py --t_weights best.pt --s_weights layer_pruning.pt --data data/mydata.yaml --batch-size 16 --kd

The training result will be saved in runs/train/exp_kd

Code

GitHub – YINYIPENG-EN/Knowledge_distillation_Pruning_Yolov5: This project supports knowledge distillation training for the pruned YOLOv5 model

Supplementary note: The test effect depends on the actual application scenario, data set, network model, etc. The code released in this article is not a panacea~