Dataset production
Labelme is usually used to create instance segmentation datasets, and there are also tutorials and codes to convert to COCO datasets. The labelme project address is: https://github.com/wkentaro/labelme/tree/main
Install labelme
conda create --name=labelme python=3 conda activate labelme pip install labelme # or install standalone executable/app from: # https://github.com/wkentaro/labelme/releases
Mark segmented areas
When labelme marks an area, you can use the group option in the labelme label for objects that are occluded. As shown in the figure below, the elephant has two areas, and group is set to 0.
Convert to COCO dataset
Data and scripts for conversion to VOC and COCO formats are provided in the examples/instance_segmentation folder under the labelme project. This article only describes the conversion to COCO format. The file structure is as follows.
For custom data sets, prepare image data and label data according to the above results, that is, the contents of the data_annotated folder. Run the following code to convert to a COCO format data set.
python labelme2coco.py data_annotated/ coco --labels labels.txt
After completion, you will get the following content in the output folder.
A small point, when saving the json file, you can modify the code as follows. The resulting json file looks more beautiful and supports Chinese.
with open(out_ann_file, "w") as f: json.dump(data, f, indent=2, ensure_ascii=False)) #ensure_ascii=False can eliminate the problem of json containing Chinese garbled characters
Mask R-CNN training
The environment configuration of this article is as follows:
- pytorch==1.7.0
- torchvision==0.8.0
- mmcv-full==1.2.7
- mmdet==2.8.0
config file modification
model config
In the configuration section of model
, the only thing that needs to be modified is the num_classes
parameter. Modify the corresponding value according to the data set.
# model settings num_classes=1 model = dict( type='MaskRCNN', pretrained='torchvision://resnet50', backbone=dict( type='ResNet', depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, norm_cfg=dict(type='BN', requires_grad=True), norm_eval=True, style='pytorch'), neck=dict( type='FPN', in_channels=[256, 512, 1024, 2048], out_channels=256, num_outs=5), rpn_head=dict( type='RPNHead', in_channels=256, feat_channels=256, anchor_generator=dict( type='AnchorGenerator', scales=[8], ratios=[0.5, 1.0, 2.0], strides=[4, 8, 16, 32, 64]), bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[.0, .0, .0, .0], target_stds=[1.0, 1.0, 1.0, 1.0]), loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), loss_bbox=dict(type='L1Loss', loss_weight=1.0)), roi_head=dict( type='StandardRoIHead', bbox_roi_extractor=dict( type='SingleRoIExtractor', roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0), out_channels=256, featmap_strides=[4, 8, 16, 32]), bbox_head=dict( type='Shared2FCBBoxHead', in_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=num_classes, bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[0., 0., 0., 0.], target_stds=[0.1, 0.1, 0.2, 0.2]), reg_class_agnostic=False, loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), loss_bbox=dict(type='L1Loss', loss_weight=1.0)), mask_roi_extractor=dict( type='SingleRoIExtractor', roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=0), out_channels=256, featmap_strides=[4, 8, 16, 32]), mask_head=dict( type='FCNMaskHead', num_convs=4, in_channels=256, conv_out_channels=256, num_classes=num_classes, loss_mask=dict( type='CrossEntropyLoss', use_mask=True, loss_weight=1.0)))) # model training and testing settings train_cfg = dict( rpn=dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.7, neg_iou_thr=0.3, min_pos_iou=0.3, match_low_quality=True, ignore_iof_thr=-1), sampler=dict( type='RandomSampler', num=256, pos_fraction=0.5, neg_pos_ub=-1, add_gt_as_proposals=False), allowed_border=-1, pos_weight=-1, debug=False), rpn_proposal=dict( nms_across_levels=False, nms_pre=2000, nms_post=1000, max_num=1000, nms_thr=0.7, min_bbox_size=0), rcnn=dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.5, neg_iou_thr=0.5, min_pos_iou=0.5, match_low_quality=True, ignore_iof_thr=-1), sampler=dict( type='RandomSampler', num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True), mask_size=28, pos_weight=-1, debug=False)) test_cfg = dict( rpn=dict( nms_across_levels=False, nms_pre=1000, nms_post=1000, max_num=1000, nms_thr=0.7, min_bbox_size=0), rcnn=dict( score_thr=0.05, nms=dict(type='nms', iou_threshold=0.5), max_per_img=100, mask_thr_binary=0.5))
data config
In the configuration section of data
, you need to modify the data_root
and classes
parameters to specify the path of the data set and the corresponding category name list. For the training set, validation set and test set, the two parameters ann_file
and img_prefix
need to be adjusted.
dataset_type = 'CocoDataset' img_norm_cfg = dict( mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) train_pipeline = [ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True, with_mask=True), dict(type='Resize', img_scale=(416, 416), keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.5), dict(type='Normalize', **img_norm_cfg), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks']), ] test_pipeline = [ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(416, 416), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict(type='Normalize', **img_norm_cfg), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']), ]) ] data_root = 'datasets/xuzhou2_single_jietou/' classes=["jietou"] data = dict( samples_per_gpu=32, workers_per_gpu=1, #dataset type train=dict( type=dataset_type, ann_file=data_root + 'annotations/instances_jietou_train20231016.json', img_prefix=data_root + 'train/', pipeline=train_pipeline, classes=classes ), val=dict( type=dataset_type, ann_file=data_root + 'annotations/instances_jietou_val20231016.json', img_prefix=data_root + 'val/', pipeline=test_pipeline, classes=classes ), test=dict( type=dataset_type, ann_file=data_root + 'annotations/instances_jietou_val20231016.json', img_prefix=data_root + 'val/', pipeline=test_pipeline, classes=classes ), ) evaluation = dict( interval=10, metric=['bbox', 'segm'] )
Configuration of optimizer and learning rate
Use stochastic gradient descent method to update parameters, and the optimization strategy to modify the learning rate is warmup + cosine decay strategy.
# optimizer optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001) optimizer_config = dict(grad_clip=None) # Learning rate scheduler config used to register LrUpdater hook lr_config = dict( policy='CosineAnnealing', min_lr=0, warmup='linear', warmup_iters=25, warmup_ratio=0.001, warmup_by_epoch=True ) total_epochs = 150
runtime configuration
Modify the weight saving interval to save once every 5 epochs.
checkpoint_config = dict(interval=5) #yapf:disable log_config = dict( interval=1, hooks=[ dict(type='TextLoggerHook'), # dict(type='TensorboardLoggerHook') ]) #yapf:enable dist_params = dict(backend='nccl') log_level = 'INFO' load_from = None resume_from = None workflow = [('train', 1)]
Training
By running the following command, you can start the training of Mask R-CNN.
CUDA_VISIBLE_DEVICES=4,5,6,7 \ bash tools/dist_train.sh configs/aaaa/mask_rcnn_r50_fpn_custom.py 4
Test
Start the single-GPU test by running the test.py file. The command is as follows.
python tools/test.py /path/to/config_file /path/to/checkpoint_file --eval bbox segm
FAQ
Q1: oserror: [errno 39] directory not empty "eval_hook"
By annotating the tmpdir content in the mmdet/core/evaluation/eval_hooks.py file, the specific operation is to set the tmpdir in the multi_gpu_test function to None.
results = multi_gpu_test( runner.model, self.dataloader, #tmpdir=tmpdir, tmpdir=None, gpu_collect=self.gpu_collect)
Reference link
[Instance Segmentation (1)] Detectron2 data set production and registration data set training – Gu Yueju
[Instance Segmentation (2)] Mask2Former Dataset Production and Training – Gu Yueju
[Deep Learning] YOLOv5 instance segmentation, data set production, model training and TensorRT deployment
Use labelme to create instance segmentation data set_labelme instance segmentation_Jiazhou_garland’s blog-CSDN blog