In MMDetection 3.X, a key modification to the pipeline is the addition of PackDetInputs
, which is conducive to unified detection/semantic segmentation/panoramic segmentation tasks.
From the configuration file, we can see that it contains LoadImageFromFile
, LoadAnnotations
, RandomFlip
, RandomChoice
and PackDetInputs
Five steps.
For source code understanding, please refer to this blogger’s MMDetection 3.x Pipeline source code debugging.
Let’s mainly look at PackDetInputs
. After the transformation of PackDetInputs
, results
has been re-normalized. More standardized input data is conducive to detection/semantics Split/panoramic split. The source code is attached at the end, and its keys include ‘img_id’, ‘img_path’, ‘ori_shape’, ‘img_shape’, ‘scale_factor’, ‘flip’, ‘flip_direction’ by default .
So how do you know these keys? It can be viewed in the comments of the function definition, for example, the RandomFlip function (mmdetection-3.0.0\mmdet\datasets\transforms\transforms.py) can see that Added Keys has
- flip - flip_direction -homography_matrix
class RandomFlip(MMCV_RandomFlip): """Flip the image & amp; bbox & amp; mask & amp; segmentation map. Added or Updated keys: flip, flip_direction, img, gt_bboxes, and gt_seg_map. There are 3 flip mode: - ``prob`` is float, ``direction`` is string: the image will be ``direction``ly flipped with probability of ``prob``. E.g., ``prob=0.5``, ``direction='horizontal'``, then the image will be horizontally flipped with a probability of 0.5. - ``prob`` is float, ``direction`` is list of string: the image will be ``direction[i]``ly flipped with probability of ``prob/len(direction)``. E.g., ``prob=0.5``, ``direction=['horizontal', 'vertical']``, then image will be horizontally flipped with probability of 0.25, vertically with a probability of 0.25. - ``prob`` is list of float, ``direction`` is list of string: given ``len(prob) == len(direction)``, the image will be ``direction[i]``ly flipped with probability of ``prob[i]``. E.g., ``prob=[0.3, 0.5]``, ``direction=['horizontal', 'vertical']``, then image will be horizontally flipped with probability of 0.3, vertically with probability of 0.5. Required Keys: - img - gt_bboxes (BaseBoxes[torch.float32]) (optional) - gt_masks (BitmapMasks | PolygonMasks) (optional) - gt_seg_map (np.uint8) (optional) Modified Keys: - img - gt_bboxes - gt_masks -gt_seg_map Added Keys: - flip - flip_direction -homography_matrix
PackDetInputs definition:
@TRANSFORMS.register_module() class PackDetInputs(BaseTransform): """Pack the inputs data for the detection / semantic segmentation / panoptic segmentation. The ``img_meta`` item is always populated. The contents of the ``img_meta`` dictionary depends on ``meta_keys``. By default this includes: - ``img_id``: id of the image - ``img_path``: path to the image file - ``ori_shape``: original shape of the image as a tuple (h, w, c) - ``img_shape``: shape of the image input to the network as a tuple \ (h, w, c). Note that images may be zero padded on the \ bottom/right if the batch tensor is larger than this shape. - ``scale_factor``: a float indicating the preprocessing scale - ``flip``: a boolean indicating if image flip transform was used - ``flip_direction``: the flipping direction Args: meta_keys (Sequence[str], optional): Meta keys to be converted to ``mmcv.DataContainer`` and collected in ``data[img_metas]``. Default: ``('img_id', 'img_path', 'ori_shape', 'img_shape', 'scale_factor', 'flip', 'flip_direction')`` """ mapping_table = { 'gt_bboxes': 'bboxes', 'gt_bboxes_labels': 'labels', 'gt_masks': 'masks' } def __init__(self, meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', 'scale_factor', 'flip', 'flip_direction')): self.meta_keys = meta_keys def transform(self, results: dict) -> dict: """Method to pack the input data. Args: results (dict): Result dict from the data pipeline. Returns: dict: - 'inputs' (obj:`torch.Tensor`): The forward data of models. - 'data_sample' (obj:`DetDataSample`): The annotation info of the sample. """ packed_results = dict() if 'img' in results: img = results['img'] if len(img.shape) < 3: img = np. expand_dims(img, -1) img = np.ascontiguousarray(img.transpose(2, 0, 1)) packed_results['inputs'] = to_tensor(img) if 'gt_ignore_flags' in results: valid_idx = np.where(results['gt_ignore_flags'] == 0)[0] ignore_idx = np.where(results['gt_ignore_flags'] == 1)[0] data_sample = DetDataSample() instance_data = InstanceData() ignore_instance_data = InstanceData() for key in self.mapping_table.keys(): if key not in results: continue if key == 'gt_masks' or isinstance(results[key], BaseBoxes): if 'gt_ignore_flags' in results: instance_data[ self.mapping_table[key]] = results[key][valid_idx] ignore_instance_data[ self.mapping_table[key]] = results[key][ignore_idx] else: instance_data[self. mapping_table[key]] = results[key] else: if 'gt_ignore_flags' in results: instance_data[self.mapping_table[key]] = to_tensor( results[key][valid_idx]) ignore_instance_data[self.mapping_table[key]] = to_tensor( results[key][ignore_idx]) else: instance_data[self.mapping_table[key]] = to_tensor( results[key]) data_sample.gt_instances = instance_data data_sample.ignored_instances = ignore_instance_data if 'proposals' in results: data_sample.proposals = InstanceData(bboxes=results['proposals']) if 'gt_seg_map' in results: gt_sem_seg_data = dict( sem_seg=to_tensor(results['gt_seg_map'][None, ...].copy())) data_sample.gt_sem_seg = PixelData(**gt_sem_seg_data) img_meta = {} for key in self.meta_keys: img_meta[key] = results[key] data_sample.set_metainfo(img_meta) packed_results['data_samples'] = data_sample return packed_results def __repr__(self) -> str: repr_str = self.__class__.__name__ repr_str + = f'(meta_keys={self.meta_keys})' return repr_str