Automatic annotation of video target semantic segmentation – from image contour extraction to conversion into json tag file

Foreword

Semantic segmentation data annotation is the process of preparing data for training semantic segmentation models. Semantic segmentation is a task in the field of computer vision, where each pixel in an image needs to be assigned a class label to distinguish different objects or regions. When labeling data, you typically assign a unique label to each object or region and create segmentation masks that correspond to image pixels. A mask is a binary image where the pixel values indicate which category each pixel belongs to. For example, create different masks for categories such as background, people, vehicles, etc.

Manual annotation tools:
Image annotation software: You can use specialized image annotation tools, such as LabelImg, Labelbox, VGG Image Annotator (VIA), CVAT, etc., to manually draw regions and assign labels.
Drawing Tools: You can also use general drawing tools, such as Adobe Photoshop or GIMP, to manually draw areas and create masks.

Semi-automatic annotation tools:
GrabCut algorithm: This is a method based on interactive image segmentation that can help generate segmentation masks quickly.
Superpixel segmentation tools: Use tools such as SLIC or QuickShift to generate superpixels and then manually assign labels to different superpixel regions.

Deep learning automatic annotation:
Segmentation model-assisted annotation: You can use pre-trained semantic segmentation models, such as Mask R-CNN, U-Net, etc., to assist annotation. These models can automatically provide initial segmentation results, after which necessary fine-tuning can be performed.

Image semantic segmentation data annotation is a time-consuming and labor-intensive task. Especially when semantic segmentation annotation is performed on targets in videos, it is tedious to split the frames of the video and then annotate the required targets in each frame. Time-consuming and laborious work. But with the emergence of Segment Anything and Segment-and-Track Anything algorithms, the task of segmentation and labeling is no longer so troublesome. Segment-and-Track Anything can track the target in the video and then segment it. Then we can use it? The mask segmented by Segment-and-Track Anything automatically generates label files.

1. Segment-and-Track Anything target tracking and target segmentation

1. Introduction to the algorithm

Meta AI’s SAM) model demonstrates powerful image segmentation capabilities, but there are some challenges in processing video data. Segment-and-Track Anything is extended from the SAM model to enable segmentation and tracking of video data. This innovation enables SAM to not only segment objects in images but also track their changes over time. The application potential of this function is wide and covers various spatio-temporal scenarios, including but not limited to street view, augmented reality, cell image analysis, animation production and aerial video.

In the SAM-Track project, the SAM model achieves powerful target segmentation and tracking capabilities on a single card. It has the potential to process large-scale data and can track more than 200 objects simultaneously, providing users with excellent video editing capabilities.

2. Algorithm application deployment

For algorithm application and deployment, please see my previous blog: Segment-and-Track Anything – general intelligent video segmentation, target tracking, editing algorithm interpretation and source code deployment

3. Moving target tracking and segmentation

First, target segmentation is performed on the first frame of the video, and then Segment-and-Track Anything is used to track and segment the target in the entire video.

The result after segmentation is as follows:

2. Generate tags

1. Semantic segmentation tag format

To generate semantic segmentation labels, you need to understand the format of semantic segmentation json files. Here, labelme is used to annotate json files as an example. The annotated label files are as follows:

{<!-- -->
  "version": "0.2.4",
  "flags": {<!-- -->},
  "shapes": [
    {<!-- -->
      "label": "mat",
      "text": "",
      "points": [
        [
          234.0,
          248.0
        ],
        [
          229.0,
          246.0
        ],
        [
          207.0,
          247.0
        ]
      ],
      "group_id": null,
      "shape_type": "polygon",
      "flags": {<!-- -->}
    },
    {<!-- -->
      "label": "mat",
      "text": "",
      "points": [
        [
          237.0,
          245.0
        ],
        [
          236.0,
          249.0
        ],
        [
          237.0,
          250.0
        ],
        [
          237.0,
          260.0
        ],
        [
          239.0,
          268.0
        ]
      ],
      "group_id": null,
      "shape_type": "polygon",
      "flags": {<!-- -->}
    }
  ],
  "imagePath": "b (14).jpg",
  "imageData": null,
  "imageHeight": 518,
  "imageWidth": 500
}

2. Contour extraction and polygon fitting

To extract the contour, polygon fitting is required after extracting the contour.

def approx_PolyDP(cv_src):
    cv_gray = cv2.cvtColor(cv_src, cv2.COLOR_BGR2GRAY)
    cv_ret, cv_binary = cv2.threshold(cv_gray, 0, 255, cv2.THRESH_BINARY)
    contours, hierarchy = cv2.findContours(cv_binary, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)

    approxs = []
    for contour in contours:
        # Polygon fitting of contours
        # epsilon = 0.04 * cv2.arcLength(contour, True)
        approx = cv2.approxPolyDP(contour, 1, True)
        approxs.append(approx)

    return approxs

3. Generate tag file

After polygon fitting, create a json file

def create_node(label_name,points):
    shape = {<!-- -->
        "label": label_name,
        "text": "",
        "points":
           points
        ,
        "group_id": None,
        "shape_type": "polygon",
        "flags": {<!-- -->}
    }

    return shape

def create_json(img_name,img_w,img_h):
    name, _ = os.path.splitext(img_name)
    data = {<!-- -->
        "version": "0.2.4",
        "flags": {<!-- -->},
        "shapes": [
        ],
        "imagePath": img_name,
        "imageData": None,
        "imageHeight":img_h,
        "imageWidth": img_w
    }

    json_name = name + ".json"

    with open(json_name, "w") as json_file:
        json.dump(data, json_file, indent=4)

def add_shape(json_name,node):
    with open(json_name, "r") as json_file:
        data = json.load(json_file)

    data["shapes"].append(node)

    #Save the updated JSON data
    with open(json_name, "w") as json_file:
        json.dump(data, json_file, indent=4)

def contour_to_json(img_name):
    cv_src = cv2.imread(img_name)
    approxs= approx_PolyDP(cv_src)
    height, width = cv_src.shape[:2]

    points_all = []
    if len(approxs) >= 1:
        for approx in approxs:
            points = []
            for i in range(len(approx)):
                points.append([int(approx[i][0][0]), int(approx[i][0][1])])
        # j = i + 1
        # if j == len(approx):
        # j = 0
        # cv2.line(cv_src, (approx[i][0][0], approx[i][0][1]),
        # (approx[j][0][0], approx[j][0][1]), (255, 0, 0), 1)
        # points_all.append(points)
        #
        # cv2.namedWindow("src",0)
        # cv2.imshow("src",cv_src);
        #cv2.waitKey()

        create_json(img_name,width,height)

        nodes = []

        for p in points_all:
            node = create_node("foot",p)
            nodes.append(node)

        name, _ = os.path.splitext(img_name)
        json_name = name + ".json"
        # add_shape(json_name,nodes[0])
        for n in nodes:
            add_shape(json_name,n)

4. Verify tag file

Use a labeling tool, such as labelme, to open it. The effect is as follows, which means that the test can be: