CVPR 2023|BCP: Bidirectional Copy-Paste for Semi-Supervised Medical Image Segmentation

Click the card below to follow the “CVer” official account

AI/CV heavy dry goods, delivered in the first time

Click to enter->【Medical Imaging】WeChat Technology Exchange Group

Reprinted from: Jishi Platform | Author: GlobalTrack


In semi-supervised medical image segmentation, there is an empirical mismatch problem between labeled and unlabeled data distributions. This paper proposes a simple approach to alleviate this problem-bidirectional copy-pasting of labeled and unlabeled data in a simple Mean Teacher architecture.


Bidirectional Copy-Paste for Semi-Supervised Medical Image Segmentation

Paper link:

Source link:


Segmentation of internal structures from medical images such as CT or MRI is crucial for many clinical applications. Various techniques for medical image segmentation based on supervised learning have been proposed, which usually require a large amount of labeled data. However, due to the tedious and expensive process of manual contour drawing when annotating medical images, semi-supervised segmentation has received increasing attention in recent years and has become ubiquitous in the field of medical image analysis.

Generally, in the field of semi-supervised medical segmentation, labeled and unlabeled data are drawn from the same distribution. But in the real world, it is difficult to estimate accurate distributions from labeled data because they are scarce. Therefore, there is an empirical distribution mismatch between a large amount of unlabeled data and a very small amount of labeled data. Semi-supervised segmentation methods always try to train on labeled and unlabeled data symmetrically in a consistent manner. For example, sub-training is generated as labels, and unlabeled data is supervised in a pseudo-supervised manner. Mean Teacher-based algorithms employ a consistency loss to supervise unlabeled data with strong augmentation, similar to supervising labeled data with GT. ContrastMask applies dense contrastive learning on labeled and unlabeled data. But most existing semi-supervised algorithms use labeled and unlabeled data under different learning paradigms.

CutMix is a simple but powerful data processing method, also known as copy-paste (CP), which has the potential to encourage unlabeled data to learn common semantics from labeled data, because pixels in the same image share more semantics. near. In semi-supervised learning, enforcing consistency between weak-strong augmentation pairs of unlabeled data is widely used, and CP is often used as a strong augmentation. However, the existing CP methods do not consider the unlabeled data with poor CP, or simply copy the object in the labeled data as the foreground and paste it to another data. They neglect to design consistent learning strategies for both labeled and unlabeled data, which hinders their use in reducing distribution gaps. Meanwhile, CP tries to enhance the network generalization ability by increasing the diversity of unlabeled data, but it is difficult to achieve high performance since CutMix images are only supervised by low-precision pseudo-labels.

To alleviate the experience mismatch problem between labeled and unlabeled data, a successful design encourages unlabeled data to learn comprehensive common semantics from labeled data, while promoting distributional align. This paper achieves this by proposing a simple yet very effective bidirectional copy-paste (BCP) method. This method is instantiated in the Mean Teacher framework. Specifically, to train the student network, we augment the input by copy-pasting random crops from labeled images (foreground) to unlabeled images (background). Propagation augments the input by copy-pasting random crops from the five-annotated image (foreground) to the annotated image (background). The student network is supervised by the generated supervision information through bi-directional copy-paste between the unlabeled image pseudo-labels from the teacher network and the label map of the annotated images. These two blended images help the network learn common semantics between labeled and unlabeled data bidirectionally and symmetrically.


This article method

Define a 3D medical image as . The goal of semi-supervised semantic segmentation of medical images is to predict the locations of background and objects in per-voxel label-map indications. The training set contains labeled data and unlabeled data ( ), ie,,.

In the Mean Teacher architecture of this paper, two unlabeled images and two labeled images are randomly selected. Then go from copy-pasting a random block to generate a blended image, from to generate another blended image. Unlabeled images are able to learn comprehensive general semantics from both inward and outward directions from labeled images. The image and are then fed into the student network to predict the segmentation mask and . Segmentation masks are supervised by bidirectional copy-pasting of unlabeled image predictions and annotated image label maps from the teacher network.

Mean Teacher and training strategy

In the BCP framework of this paper, there is a teacher network and a student network. The student network is optimized by SGD, and the teacher network is an exponential moving average of the student network. Our training strategy consists of three steps: first pre-train a model using labeled data, and then use the pre-trained model as a teacher model to generate pseudo-labels for unlabeled images. In each cycle, the student network parameters are first optimized using SGD. Finally the teacher network parameters are updated using the exponential moving average of the student parameters.

Pre-training via copy-paste

In this paper, the labeled data is copy-pasted and augmented to train the supervised model. During the self-training process, the supervised model will generate pseudo-labels for the unlabeled data. This strategy has been proven to be effective in improving segmentation performance.

Bidirectional copy-paste

To perform copy-paste between a bunch of images, you first need to generate a zero-center mask, indicating that voxels are from foreground (0) or background (1) images. The size of the zero-valued area is . The two-way copy and paste process can be described as:

Two-way copy-paste supervision signal

To train the student network, supervision signals are also generated by BCP operations. The unlabeled image and the incoming teacher network compute the probability map:

The initial pseudo-labels are determined by a usual threshold of 0.5 for binarized segmentation tasks, or using argmax operation for multi-label segmentation tasks. The final pseudo-label consists of selecting the largest connected component, which effectively removes outlier pixels. Then a two-way copy-paste unlabeled image pseudo-label and annotated GT label is proposed to obtain the supervision signal.

Loss function

Each input image to the student network consists of components from labeled and unlabeled images. Intuitively, GT masks for labeled images are usually more accurate than pseudo-labels for unlabeled images. Use to control the effect of unlabeled image pixels on the loss function:

Teacher network parameter update:


LA dataset

The Atrial Segmentation Challenge [39] dataset consists of 100 labeled 3D gadolinium-enhanced magnetic resonance image scans (GE MRI).

Here we choose UA-MT, SASSNet, DTC, URPC, MC-Net, SS-Net as comparison models. Experimental results at different label rates are given here. Table 1 shows the relevant experimental results. It can be seen that the method in this paper achieves the highest performance on all four evaluation indicators, which greatly exceeds the comparison model.


Pancreases-NIT dataset

82 manually drawn abdominal CT enhanced volumes. Here V-Net, DAN, ADVNET, UA-MT, SASSNet, DTC and CoraNet are selected as comparison algorithms. Table 2 shows the relevant experimental results. Our method BCP achieves significant improvements on the Dice, Jaccard and 95HD metrics (i.e. over the second best by 3.24%, 4.28% and 1.16 respectively). These results were not subjected to any post-processing for fair comparison.


ACDC dataset

Four classes (i.e., background, right ventricle, left ventricle, and myocardium) segment the dataset, containing scans from 100 patients. Table 3 shows the relevant experimental results. BCP goes beyond SOTA methods. For a setting with a marking ratio of 5%, we obtain a huge performance improvement of up to 21.76% on the Dice metric


Click to enter->【Medical Imaging】WeChat Technology Exchange Group

The latest CVPR 2023 papers and code download

Background reply: CVPR2023, you can download the collection of CVPR 2023 papers and code open source papers

Background reply: Transformer review, you can download the latest 3 Transformer review PDFs

Medical imaging and image segmentation exchange group established
Scan the QR code below, or add WeChat: CVer333, you can add CVer Assistant WeChat, and you can apply to join the CVer-Medical Imaging or Image Segmentation WeChat exchange group. In addition, other vertical directions have been covered: target detection, image segmentation, target tracking, face detection & amp; recognition, OCR, pose estimation, super-resolution, SLAM, medical imaging, Re-ID, GAN, NAS, depth estimation, automatic Driving, reinforcement learning, lane line detection, model pruning & amp; compression, noise removal, fog removal, rain removal, style transfer, remote sensing images, behavior recognition, video understanding, image fusion, image retrieval, paper submission & communication , PyTorch, TensorFlow, Transformer, etc.
Be sure to note: research direction + location + school/company + nickname (such as medical imaging or image segmentation + Shanghai + handover + Kaka), according to the format notes, it can be passed faster and invited into the group

▲Scan code or add WeChat ID: CVer333, enter the exchange group
CVer Computer Vision (Knowledge Planet) is here! If you want to learn about the latest, fastest and best CV/DL/AI paper delivery, high-quality practical projects, AI industry frontiers, learning tutorials from entry to mastery, etc., welcome to scan the QR code below and join CVer Computer Vision, which has gathered thousands of people!

▲ Scan the code to enter the planet
▲Click the card above to follow the CVer official account

Organization is not easy, please like and watch6523a33de8faa0304ccc4c8fad528fd8.gif