foreword
As the current advanced deep learning target detection algorithm YOLOv8, a large number of tricks have been assembled, but there is still room for improvement and improvement. According to the detection difficulties in specific application scenarios, different improvement methods can be used. The following series of articles will focus on how to improve YOLOv8 in detail. The purpose is to provide meager help and reference for those students engaged in scientific research who need innovation or friends who engage in engineering projects to achieve better results. Since YOLOv8, YOLOv7, and YOLOv5 algorithms have emerged in 2020, a large number of improved papers have emerged. Whether it is for students engaged in scientific research or friends who are already working, the value and novelty of the research are not enough. In order to keep pace with the times In the future, the improved algorithm will be based on YOLOv7. The previous YOLOv5 improvement method is also applicable to YOLOv7, so continue the serial number of the YOLOv5 series improvement. In addition, the improvement method can also be applied to other algorithms such as YOLOv5 for improvement. Hope to be helpful to everyone.
1. Solve the problem
Try to change the loss function in the original YOLOv7/v5 to wiou to improve the accuracy and effect. The more advanced eiou, siou, and a-iou frame position regression functions have been modified before, and the accuracy has been improved. The new wiou can be tried and improved.
2. Basic principles
Github code
paper
3. Add method
The relevant codes of Fsternet are as follows: specific improvement methods, private message after paying attention
class FasterNet(nn.Module): def __init__(self, in_chans=3, num_classes=1000, embed_dim=96, depths=(1, 2, 8, 2), mlp_ratio=2., n_div=4, patch_size=4, patch_stride=4, patch_size2=2, # for subsequent layers patch_stride2=2, patch_norm=True, feature_dim=1280, drop_path_rate=0.1, layer_scale_init_value=0, norm_layer='BN', act_layer='RELU', fork_feat=False, init_cfg=None, pretrained=None, pconv_fw_type='split_cat', **kwargs): super().__init__() if norm_layer == 'BN': norm_layer = nn.BatchNorm2d else: raise NotImplementedError if act_layer == 'GELU': act_layer = nn.GELU elif act_layer == 'RELU': act_layer = partial(nn.ReLU, inplace=True) else: raise NotImplementedError if not fork_feat: self.num_classes = num_classes self.num_stages = len(depths) self.embed_dim = embed_dim self.patch_norm = patch_norm self.num_features = int(embed_dim * 2 ** (self.num_stages - 1)) self. mlp_ratio = mlp_ratio self. depths = depths # split image into non-overlapping patches self.patch_embed = PatchEmbed( patch_size=patch_size, patch_stride=patch_stride, in_chans = in_chans, embed_dim = embed_dim, norm_layer=norm_layer if self.patch_norm else None ) # stochastic depth decay rule dpr = [x. item() for x in torch.linspace(0, drop_path_rate, sum(depths))] # build layers stages_list = [] for i_stage in range(self. num_stages): stage = BasicStage(dim=int(embed_dim * 2 ** i_stage), n_div=n_div, depth=depths[i_stage], mlp_ratio=self.mlp_ratio, drop_path=dpr[sum(depths[:i_stage]):sum(depths[:i_stage + 1])], layer_scale_init_value = layer_scale_init_value, norm_layer=norm_layer, act_layer=act_layer, pconv_fw_type=pconv_fw_type ) stages_list.append(stage) # patch merging layer if i_stage < self. num_stages - 1: stages_list.append( PatchMerging(patch_size2=patch_size2, patch_stride2 = patch_stride2, dim=int(embed_dim * 2 ** i_stage), norm_layer=norm_layer) ) self. stages = nn. Sequential(*stages_list) self.fork_feat = fork_feat if self. fork_feat: self.forward = self.forward_det # add a norm layer for each output self.out_indices = [0, 2, 4, 6] for i_emb, i_layer in enumerate(self.out_indices): if i_emb == 0 and os.environ.get('FORK_LAST3', None): raise NotImplementedError else: layer = norm_layer(int(embed_dim * 2 ** i_emb)) layer_name = f'norm{i_layer}' self. add_module(layer_name, layer) else: self.forward = self.forward_cls # Classifier head self.avgpool_pre_head = nn.Sequential( nn.AdaptiveAvgPool2d(1), nn.Conv2d(self.num_features, feature_dim, 1, bias=False), act_layer() ) self.head = nn.Linear(feature_dim, num_classes) \ if num_classes > 0 else nn. Identity() self.apply(self.cls_init_weights) self.init_cfg = copy.deepcopy(init_cfg) if self.fork_feat and (self.init_cfg is not None or pretrained is not None): self.init_weights() def cls_init_weights(self, m): if isinstance(m, nn. Linear): trunc_normal_(m.weight, std=.02) if isinstance(m, nn.Linear) and m.bias is not None: nn.init.constant_(m.bias, 0) elif isinstance(m, (nn.Conv1d, nn.Conv2d)): trunc_normal_(m.weight, std=.02) if m.bias is not None: nn.init.constant_(m.bias, 0) elif isinstance(m, (nn.LayerNorm, nn.GroupNorm)): nn.init.constant_(m.bias, 0) nn.init.constant_(m.weight, 1.0)
Model Summary: 281 layers, 5821465 parameters, 5821465 gradients, 12.9 GFLOPs
Fourth, summary
A preview: the next article will continue to share related improvement methods for deep learning algorithms. Interested friends can pay attention to me, if you have any questions, you can leave a message or chat with me privately
PS: This method is not only suitable for improving YOLOv5, but also can improve other YOLO networks and target detection networks, such as YOLOv7, v6, v4, v3, Faster rcnn, ssd, etc.
Finally, please pay attention to private message me if you need it. Pay attention to receive free learning materials for deep learning algorithms!