Mask R-CNN trains your own data set

Dataset production

Labelme is usually used to create instance segmentation datasets, and there are also tutorials and codes to convert to COCO datasets. The labelme project address is: https://github.com/wkentaro/labelme/tree/main

Install labelme

conda create --name=labelme python=3
conda activate labelme
pip install labelme

# or install standalone executable/app from:
# https://github.com/wkentaro/labelme/releases

Mark segmented areas

When labelme marks an area, you can use the group option in the labelme label for objects that are occluded. As shown in the figure below, the elephant has two areas, and group is set to 0.
image.png

Convert to COCO dataset

Data and scripts for conversion to VOC and COCO formats are provided in the examples/instance_segmentation folder under the labelme project. This article only describes the conversion to COCO format. The file structure is as follows.
image.png
Image and label files
Dataset category
For custom data sets, prepare image data and label data according to the above results, that is, the contents of the data_annotated folder. Run the following code to convert to a COCO format data set.

python labelme2coco.py data_annotated/ coco --labels labels.txt

After completion, you will get the following content in the output folder.
Converted COCO data
A small point, when saving the json file, you can modify the code as follows. The resulting json file looks more beautiful and supports Chinese.

with open(out_ann_file, "w") as f:
    json.dump(data, f, indent=2, ensure_ascii=False))
    #ensure_ascii=False can eliminate the problem of json containing Chinese garbled characters

Mask R-CNN training

The environment configuration of this article is as follows:

  • pytorch==1.7.0
  • torchvision==0.8.0
  • mmcv-full==1.2.7
  • mmdet==2.8.0

config file modification

model config

In the configuration section of model, the only thing that needs to be modified is the num_classes parameter. Modify the corresponding value according to the data set.

# model settings

num_classes=1

model = dict(
    type='MaskRCNN',
    pretrained='torchvision://resnet50',
    backbone=dict(
        type='ResNet',
        depth=50,
        num_stages=4,
        out_indices=(0, 1, 2, 3),
        frozen_stages=1,
        norm_cfg=dict(type='BN', requires_grad=True),
        norm_eval=True,
        style='pytorch'),
    neck=dict(
        type='FPN',
        in_channels=[256, 512, 1024, 2048],
        out_channels=256,
        num_outs=5),
    rpn_head=dict(
        type='RPNHead',
        in_channels=256,
        feat_channels=256,
        anchor_generator=dict(
            type='AnchorGenerator',
            scales=[8],
            ratios=[0.5, 1.0, 2.0],
            strides=[4, 8, 16, 32, 64]),
        bbox_coder=dict(
            type='DeltaXYWHBBoxCoder',
            target_means=[.0, .0, .0, .0],
            target_stds=[1.0, 1.0, 1.0, 1.0]),
        loss_cls=dict(
            type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
        loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
    roi_head=dict(
        type='StandardRoIHead',
        bbox_roi_extractor=dict(
            type='SingleRoIExtractor',
            roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),
            out_channels=256,
            featmap_strides=[4, 8, 16, 32]),
        bbox_head=dict(
            type='Shared2FCBBoxHead',
            in_channels=256,
            fc_out_channels=1024,
            roi_feat_size=7,
            num_classes=num_classes,
            bbox_coder=dict(
                type='DeltaXYWHBBoxCoder',
                target_means=[0., 0., 0., 0.],
                target_stds=[0.1, 0.1, 0.2, 0.2]),
            reg_class_agnostic=False,
            loss_cls=dict(
                type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
            loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
        mask_roi_extractor=dict(
            type='SingleRoIExtractor',
            roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=0),
            out_channels=256,
            featmap_strides=[4, 8, 16, 32]),
        mask_head=dict(
            type='FCNMaskHead',
            num_convs=4,
            in_channels=256,
            conv_out_channels=256,
            num_classes=num_classes,
            loss_mask=dict(
                type='CrossEntropyLoss', use_mask=True, loss_weight=1.0))))
# model training and testing settings
train_cfg = dict(
    rpn=dict(
        assigner=dict(
            type='MaxIoUAssigner',
            pos_iou_thr=0.7,
            neg_iou_thr=0.3,
            min_pos_iou=0.3,
            match_low_quality=True,
            ignore_iof_thr=-1),
        sampler=dict(
            type='RandomSampler',
            num=256,
            pos_fraction=0.5,
            neg_pos_ub=-1,
            add_gt_as_proposals=False),
        allowed_border=-1,
        pos_weight=-1,
        debug=False),
    rpn_proposal=dict(
        nms_across_levels=False,
        nms_pre=2000,
        nms_post=1000,
        max_num=1000,
        nms_thr=0.7,
        min_bbox_size=0),
    rcnn=dict(
        assigner=dict(
            type='MaxIoUAssigner',
            pos_iou_thr=0.5,
            neg_iou_thr=0.5,
            min_pos_iou=0.5,
            match_low_quality=True,
            ignore_iof_thr=-1),
        sampler=dict(
            type='RandomSampler',
            num=512,
            pos_fraction=0.25,
            neg_pos_ub=-1,
            add_gt_as_proposals=True),
        mask_size=28,
        pos_weight=-1,
        debug=False))
test_cfg = dict(
    rpn=dict(
        nms_across_levels=False,
        nms_pre=1000,
        nms_post=1000,
        max_num=1000,
        nms_thr=0.7,
        min_bbox_size=0),
    rcnn=dict(
        score_thr=0.05,
        nms=dict(type='nms', iou_threshold=0.5),
        max_per_img=100,
        mask_thr_binary=0.5))

data config

In the configuration section of data, you need to modify the data_root and classes parameters to specify the path of the data set and the corresponding category name list. For the training set, validation set and test set, the two parameters ann_file and img_prefix need to be adjusted.

dataset_type = 'CocoDataset'

img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)

train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
    dict(type='Resize', img_scale=(416, 416), keep_ratio=True),
    dict(type='RandomFlip', flip_ratio=0.5),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='Pad', size_divisor=32),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks']),
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(416, 416),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(type='Normalize', **img_norm_cfg),
            dict(type='Pad', size_divisor=32),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img']),
        ])
]

data_root = 'datasets/xuzhou2_single_jietou/'
classes=["jietou"]
data = dict(
    samples_per_gpu=32,
    workers_per_gpu=1,
    #dataset type
    train=dict(
        type=dataset_type,
        ann_file=data_root + 'annotations/instances_jietou_train20231016.json',
        img_prefix=data_root + 'train/',
        pipeline=train_pipeline,
        classes=classes
        ),
    val=dict(
        type=dataset_type,
        ann_file=data_root + 'annotations/instances_jietou_val20231016.json',
        img_prefix=data_root + 'val/',
        pipeline=test_pipeline,
        classes=classes
        ),
    test=dict(
        type=dataset_type,
        ann_file=data_root + 'annotations/instances_jietou_val20231016.json',
        img_prefix=data_root + 'val/',
        pipeline=test_pipeline,
        classes=classes
        ),
    )
evaluation = dict(
                    interval=10,
                    metric=['bbox', 'segm']
                    )

Configuration of optimizer and learning rate

Use stochastic gradient descent method to update parameters, and the optimization strategy to modify the learning rate is warmup + cosine decay strategy.

# optimizer
optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001)
optimizer_config = dict(grad_clip=None)

# Learning rate scheduler config used to register LrUpdater hook
lr_config = dict(
    policy='CosineAnnealing',
    min_lr=0,
    warmup='linear',
    warmup_iters=25,
    warmup_ratio=0.001,
    warmup_by_epoch=True
)
total_epochs = 150

runtime configuration

Modify the weight saving interval to save once every 5 epochs.

checkpoint_config = dict(interval=5)
#yapf:disable
log_config = dict(
    interval=1,
    hooks=[
        dict(type='TextLoggerHook'),
        # dict(type='TensorboardLoggerHook')
    ])
#yapf:enable
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)]

Training

By running the following command, you can start the training of Mask R-CNN.

CUDA_VISIBLE_DEVICES=4,5,6,7 \
bash tools/dist_train.sh configs/aaaa/mask_rcnn_r50_fpn_custom.py 4

Test

Start the single-GPU test by running the test.py file. The command is as follows.

python tools/test.py /path/to/config_file /path/to/checkpoint_file --eval bbox segm

FAQ

Q1: oserror: [errno 39] directory not empty "eval_hook"

By annotating the tmpdir content in the mmdet/core/evaluation/eval_hooks.py file, the specific operation is to set the tmpdir in the multi_gpu_test function to None.

results = multi_gpu_test(
    runner.model,
    self.dataloader,
    #tmpdir=tmpdir,
    tmpdir=None,
    gpu_collect=self.gpu_collect)

Reference link

[Instance Segmentation (1)] Detectron2 data set production and registration data set training – Gu Yueju
[Instance Segmentation (2)] Mask2Former Dataset Production and Training – Gu Yueju
[Deep Learning] YOLOv5 instance segmentation, data set production, model training and TensorRT deployment
Use labelme to create instance segmentation data set_labelme instance segmentation_Jiazhou_garland’s blog-CSDN blog