MMDetection is an open source project launched for target detection tasks. It implements a large number of target detection algorithms based on Pytorch, and encapsulates the processes of data set construction, model building, and training strategies into modules. Through module calls, we A new algorithm can be implemented with a small amount of code, which greatly improves the code reuse rate. This article records how to use MMdetection. It may be more vernacular. If you are a professional, you can go to the following tutorial:
MMDetection Framework Getting Started Tutorial
Official document – config file tutorial
1. Folder structure
Download the code of mmdetection from github, and the directory obtained after decompression is as follows (only the main folder is shown here):
├─mmdetection-master │ ├─build │ ├─checkpoints # Store breakpoints │ ├─configs # store configuration files │ ├─data # store data │ ├─demo │ ├─dist │ ├─docker │ ├─docs │ ├─mmdet # The main source code of mmdetection, including model definition and the like │ ├─requirements │ ├─resources │ ├─src │ ├─tests │ ├─tools # Training, testing, printing config files and other main tools │ └─work_dirs # Store training logs and training results
2. Environment configuration
- Create an environment and install pytorch:
conda create --name envName python=3.7
conda activate envName
conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=10.2 -c pytorch
- Follow the tutorial on the official github to install mmcv:
pip install -U openmim
mim install mmcv-full
- Install mmdet:
pip install mmdet
In the past, it was very easy to report errors when installing mmcv, but now basically as long as you install pytorch according to the corresponding version, and then use openmim to install mmcv, basically no errors will be reported. The above command is to configure the environment of python3.7, if it is other python versions, it should also work.
3. Model training
The key to mastering the training model using MMdetection is to understand config (configuration file). If you want to train faster rcnn
, you only need to configure the configuration file, and then use the following command to train:
python tools/train.py configs/faster_rcnn/faster_rcnn_r101_fpn_2x_towervoc.py
Among them, configs/faster_rcnn/faster_rcnn_r101_fpn_2x_towervoc.py
is the configuration file we need to use during training. All parameter settings required during training are defined in this configuration file.
When using it, try to pay attention to a few points:
- Try not to modify parameters other than configuration files
- Do not change the original configuration file, if you want to perform new tasks, create a new configuration file
Because there are many files in the MMdetection project, if you train a certain network and change its original configuration file or parameters in which py file, you may forget it after a while. If you use it next time, other networks also need this Modules are problematic.
ok Next, let’s introduce the config file.
1. config file naming rules:
{model}_[model setting]_{backbone}_{neck}_[norm setting]_[misc]_[gpu x batch_per_gpu]_{schedule}_{dataset}
The meaning of each field:
{model}: model type, such as faster_rcnn, mask_rcnn, etc. [model setting]: specific model, such as without_semantic in htc, moment in reppoints, etc. {backbone}: The type of backbone network such as r50 (ResNet-50), x101 (ResNeXt-101), etc. {neck}: The types of Neck models include fpn, pafpn, nasfpn, c4, etc. [norm_setting]: bn (Batch Normalization) is used by default, and other specifications can include gn (Group Normalization), syncbn (Synchronized Batch Normalization), etc. gn-head/gn-neck means that GN is only applied to the Head or Neck of the network, and gn-all means that GN is used on the entire model, such as the backbone network, Neck and Head. [misc]: Various settings/plugins in the model, such as dconv, gcb, attention, albu, mstrain, etc. [gpu x batch_per_gpu]: Number of GPUs and number of samples per GPU, 8x2 is used by default. {schedule}: training program, the options are 1x, 2x, 20e, etc. 1x and 2x represent 12 epoch and 24 epoch respectively, and 20e is used in the cascade model to represent 20 epoch. For 1x/2x, the initial learning rate is decayed by a factor of 10 at the 8th/16th and 11th/22nd epoch; for 20e, the initial learning rate is decayed by a factor of 10 at the 16th and 19th epoch. {dataset}: dataset, such as coco, cityscapes, voc_0712, wider_face, etc.
Second, config file content analysis
The config file for each network consists of four parts:
- model settings
- dataset settings
- schedules
- runtime
In the official tutorial given at the beginning of the article, there are detailed comments written line by line using the configuration file of mask rcnn
as an example. Here is just a rough record of some of my initial misunderstandings. First of all, you should learn to use a tool tools/misc/print_config.py
. The parameters printed by this tool are the parameters that are finally input into the network for training. The syntax is:
python tools/misc/print_config.py configs/yolox/yolox_l_8x8_300e_coco.py
1. Inherit initial parameters from _base_
This means inheriting from these base config
when initializing the configuration file. If you do not redefine later, these base config
parameters will be used by default. Taking configs/yolox/yolox_l_8x8_300e_coco.py
as an example, the parameter lr_config
about learning rate scheduling in YOLOX was initially inherited from configs/_base_/schedules/schedule_1x. py
, which means it should be:
lr_config = dict( policy='step', warmup='linear', warmup_iters=500, # The learning rate is "warmed up", the initial learning rate is 0.001, and it reaches the optimizer after 500 iterations warmup_ratio=0.001, # defined lr step=[8, 11])
But in the end, it was found that the learning rate schedule printed by print_config is not the case. This is because after the configuration file initially inherited lr_config from the _base_ file, it was modified later:
lr_config = dict( _delete_=True, policy='YOLOX', warmup='exp', by_epoch=False, warmup_by_epoch=True, warmup_ratio=1, warmup_iters=5, # 5 epochs num_last_epochs=num_last_epochs, min_lr_ratio=0.05)
_delete_=True
means to delete the original lr_config inherited from _base_, and replace it with a new set of key-value pairs defined here. If you only modify some parameters, such as only modifying the step, then you don’t need _delete_, just add it in the configuration file:
lr_config = dict( step=[7, 10])
It should be noted that the key-value pairs in the config file are read in order. If you define the same parameter multiple times, the one written later will overwrite the previous one.
2. Learning rate automatic adjustment
At first I mistakenly thought that this parameter was to adjust batch_szie. But in fact, the meaning of this parameter is that the learning rate set in this project is based on 8 gpus*8 batch_size
, if your settings are different, it will be based on this to your batchsize
automatically adjusts your initial learning rate, so don’t change this value, and don’t change the initial learning rate either.
The place to adjust batch_size
is here (samples_per_gpu
):
4. Model training practice
It is very simple to use MMdetection to train the coco format data set, so how to train on the voc data set defined by yourself? Here I take the ssd model as an example to introduce. First, let me introduce my data set, voc format, there are three categories in total, the folder structure is as follows:
├─TowerVoc │ └─VOC2012 │ ├─Annotations │ ├─ImageSets │ │ └─Main │ └─JPEGImages
Here I only introduce how to implement it, and you can compare the configuration file I gave here with the original configuration file (the code I gave will also mark the changed places) for the specific parameters to be changed.
Open the configuration file corresponding to the ssd and you can see the following:
As you can see, the coco dataset is used for training by default. Look at the inheritance relationship of the configuration file:
To train a custom voc dataset, three configuration files need to be created:
- Copy
ssd512_coco.py
and name itssd512_towervoc.py
. Among them, tower is the name of my data set, which is taken randomly here. - Copy
ssd300_coco.py
and name itssd300_voc.py
. - Copy
configs/_base_/datasets/voc0712.py
and name itconfigs/_base_/datasets/voctower.py
.
The codes of the three configuration files are as follows:
ssd512_towervoc.py
_base_ = 'ssd300_voc.py' # change 1 input_size = 512 model = dict( neck = dict( out_channels=(512, 1024, 512, 256, 256, 256, 256), level_strides=(2, 2, 2, 2, 1), level_paddings=(1, 1, 1, 1, 1), last_kernel_size=4), bbox_head = dict( in_channels=(512, 1024, 512, 256, 256, 256, 256), anchor_generator = dict( type='SSDAnchorGenerator', scale_major=False, input_size=input_size, basesize_ratio_range=(0.1, 0.9), strides=[8, 16, 32, 64, 128, 256, 512], ratios=[[2], [2, 3], [2, 3], [2, 3], [2, 3], [2], [2]]))) # dataset settings dataset_type = 'VOCDataset' # change 3 data_root = 'data/TowerVoc/' # change 4 img_norm_cfg = dict(mean=[123.675, 116.28, 103.53], std=[1, 1, 1], to_rgb=True) train_pipeline = [ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), dict( type='Expand', mean=img_norm_cfg['mean'], to_rgb=img_norm_cfg['to_rgb'], ratio_range=(1, 4)), dict( type='MinIoURandomCrop', min_ious=(0.1, 0.3, 0.5, 0.7, 0.9), min_crop_size=0.3), dict(type='Resize', img_scale=(640, 640), keep_ratio=False), dict(type='RandomFlip', flip_ratio=0.5), dict( type='PhotoMetricDistortion', brightness_delta=32, contrast_range=(0.5, 1.5), saturation_range=(0.5, 1.5), hue_delta=18), dict(type='Normalize', **img_norm_cfg), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']), ] test_pipeline = [ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(512, 512), flip=False, transforms=[ dict(type='Resize', keep_ratio=False), dict(type='Normalize', **img_norm_cfg), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']), ]) ] data = dict( samples_per_gpu=4, # If necessary, you can change it to your own batchsize workers_per_gpu=2, train=dict( _delete_=True, type='RepeatDataset', times=5, dataset=dict( type=dataset_type, ann_file=data_root + 'VOC2012/ImageSets/Main/train.txt', # Change 5 img_prefix=data_root + 'VOC2012/', pipeline=train_pipeline)), val=dict(pipeline=test_pipeline), test=dict(pipeline=test_pipeline)) # optimizer optimizer = dict(type='SGD', lr=2e-3, momentum=0.9, weight_decay=5e-4) optimizer_config = dict(_delete_=True) custom_hooks = [ dict(type='NumClassCheckHook'), dict(type='CheckInvalidLossHook', interval=50, priority='VERY_LOW') ] # evaluation = dict(interval=1, metric='mAP') # NOTE: `auto_scale_lr` is for automatically scaling LR, # USER SHOULD NOT CHANGE ITS VALUES. # base_batch_size = (8 GPUs) x (8 samples per GPU) auto_scale_lr = dict(base_batch_size=64)
ssd300_voc.py
_base_ = [ '../_base_/models/ssd300.py', '../_base_/datasets/voctower.py', # Change 1 '../_base_/schedules/schedule_2x.py', '../_base_/default_runtime.py' ] # model settings input_size = 300 model = dict( type='SingleStageDetector', backbone=dict( type='SSDVGG', depth=16, with_last_pool=False, ceil_mode=True, out_indices=(3, 4), out_feature_indices=(22, 34), init_cfg=dict( type='Pretrained', checkpoint='open-mmlab://vgg16_caffe')), neck = dict( type='SSDNeck', in_channels=(512, 1024), out_channels=(512, 1024, 512, 256, 256, 256), level_strides=(2, 2, 1, 1), level_paddings=(1, 1, 0, 0), l2_norm_scale=20), bbox_head = dict( type='SSSDHead', in_channels=(512, 1024, 512, 256, 256, 256), num_classes=3, # change 2 anchor_generator = dict( type='SSDAnchorGenerator', scale_major=False, input_size=input_size, basesize_ratio_range=(0.15, 0.9), strides=[8, 16, 32, 64, 100, 300], ratios=[[2], [2, 3], [2, 3], [2, 3], [2], [2]]), bbox_coder = dict( type='DeltaXYWHBBoxCoder', target_means=[.0, .0, .0, .0], target_stds=[0.1, 0.1, 0.2, 0.2])), # model training and testing settings train_cfg = dict( assigner = dict( type='MaxIoUAssigner', pos_iou_thr=0.5, neg_iou_thr=0.5, min_pos_iou=0., ignore_iof_thr=-1, gt_max_assign_all=False), smoothl1_beta=1., allowed_border=-1, pos_weight=-1, neg_pos_ratio=3, debug=False), test_cfg=dict( nms_pre=1000, nms=dict(type='nms', iou_threshold=0.45), min_bbox_size=0, score_thr=0.02, max_per_img=200)) cudnn_benchmark = True # dataset settings dataset_type = 'VOCDataset' # change 3 data_root = 'data/TowerVoc/' img_norm_cfg = dict(mean=[123.675, 116.28, 103.53], std=[1, 1, 1], to_rgb=True) train_pipeline = [ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), dict( type='Expand', mean=img_norm_cfg['mean'], to_rgb=img_norm_cfg['to_rgb'], ratio_range=(1, 4)), dict( type='MinIoURandomCrop', min_ious=(0.1, 0.3, 0.5, 0.7, 0.9), min_crop_size=0.3), dict(type='Resize', img_scale=(300, 300), keep_ratio=False), dict(type='RandomFlip', flip_ratio=0.5), dict( type='PhotoMetricDistortion', brightness_delta=32, contrast_range=(0.5, 1.5), saturation_range=(0.5, 1.5), hue_delta=18), dict(type='Normalize', **img_norm_cfg), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']), ] test_pipeline = [ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(300, 300), flip=False, transforms=[ dict(type='Resize', keep_ratio=False), dict(type='Normalize', **img_norm_cfg), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']), ]) ] data = dict( samples_per_gpu=8, workers_per_gpu=3, train=dict( _delete_=True, type='RepeatDataset', times=5, dataset=dict( type=dataset_type, ann_file=data_root + 'VOC2012/ImageSets/Main/train.txt', # In fact, it can not be changed here img_prefix=data_root + 'VOC2012/', # Because ssd300_voc.py will rewrite pipeline=train_pipeline)), val=dict(pipeline=test_pipeline), test=dict(pipeline=test_pipeline)) # optimizer optimizer = dict(type='SGD', lr=2e-3, momentum=0.9, weight_decay=5e-4) optimizer_config = dict(_delete_=True) custom_hooks = [ dict(type='NumClassCheckHook'), dict(type='CheckInvalidLossHook', interval=50, priority='VERY_LOW') ] # NOTE: `auto_scale_lr` is for automatically scaling LR, # USER SHOULD NOT CHANGE ITS VALUES. # base_batch_size = (8 GPUs) x (8 samples per GPU) auto_scale_lr = dict(base_batch_size=64)
voctower.py
# dataset settings dataset_type = 'VOCDataset' data_root = 'data/TowerVoc/' # Change to your own dataset folder img_norm_cfg = dict( mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) train_pipeline = [ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), dict(type='Resize', img_scale=(640, 640), keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.5), dict(type='Normalize', **img_norm_cfg), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']), ] test_pipeline = [ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(640, 640), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict(type='Normalize', **img_norm_cfg), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']), ]) ] data = dict( samples_per_gpu=4, # Change it to your own batch_size here In fact, it doesn't matter whether you change it or not for the ssd network workers_per_gpu=2, # But some networks will not rewrite this parameter, so it is best to change it for convenience train=dict( type='RepeatDataset', times=3, dataset=dict( type=dataset_type, ann_file=data_root + 'VOC2012/ImageSets/Main/train.txt', # modify path img_prefix=data_root + 'VOC2012/', pipeline=train_pipeline)), val=dict( type=dataset_type, ann_file=data_root + 'VOC2012/ImageSets/Main/val.txt', # modify path img_prefix=data_root + 'VOC2012/', pipeline=test_pipeline), test=dict( type=dataset_type, ann_file=data_root + 'VOC2012/ImageSets/Main/test.txt', # modify path img_prefix=data_root + 'VOC2012/', pipeline=test_pipeline)) evaluation = dict(interval=1, metric='mAP')
After you have changed it yourself, you can print_config to see if the parameters meet the requirements.
In addition to the above, the following two files also need to be modified:
anaconda3\envs\conda_env_name\lib\python3.7\site-packages\mmdet\core\evaluation\class_names.py
anaconda3\envs\conda_env_name\lib\python3.7\site-packages\mmdet\datasets\voc.py
Change the category to your own:
voc.py
class_names.py
It should be noted here that it is useless to modify the code in mmdet in the project directory. When installing the environment above, we have a step of pip install mmdet
, the mmdet we use is actually a python library, not the mmdet under the project, so if the data category you want to train is the same as the PASCAL VOC dataset Different, you need to modify the above two files. In fact, the best way is of course to create a new py file for your own data set, but that will be very troublesome.
The code word is not easy, if it is helpful to you, please like it~