YOLOV5 data enhancement has these! —You can also add your own data enhancement methods (cropping, translation, rotation, changing brightness, adding noise)

Table of Contents

1 rectangular (convert rectangle width to height in the same batch to speed up training, minimize redundant black edges, and reduce the amount of calculations.)

2 HSV transformation

3. Random rotation, translation, scaling, cropping, miscutting/non-vertical projection, perspective transformation (starting from 0)

3-1 spin

3-2 Translation (src is the picture on the left, dst is the picture after rotation and scaling on the right, xy is the horizontal and vertical coordinates, M is the translation matrix, the x-axis translation parameter is mainly M[0,2], and the y-axis translation parameter Mainly M[1,2] plays a role)

3-3 Miscut/non-vertical projection

3-4 flip

3-5 Four-picture splicing

3-6 Images merge with each other (it is simply superimposing two images together and distinguishing them through different transparency.)

3-7 Segmentation and filling (after segmenting the target of the image, you need to calculate the target border and fill all the target borders in the picture IOU<0.3 (implementation parameter))


1 rectangular

(Do rectangle width-to-height proportional transformation in the same batch to speed up training Training to minimize redundant black edges and reduce the amount of calculation.)

code:

# File location: utils/datasets.py
# 6. Prepare for Rectangular Training: that is, when processing images of different sizes, minimize redundant black edges to reduce the amount of calculations.
        # The main thing here is to pay attention to the generation of shapes. This step is very important because if the sampling rectangle is trained, the shape of the entire batch must be the same, and the shape that conforms to the entire batch must be calculated.
        # In addition, the data set must be sorted according to the aspect ratio, so as to ensure that the shapes of the pictures in the same batch are almost the same, and the cost of selecting a common shape is relatively small.
        if self.rect:
            #Shape of all training images
            s = self.shapes # wh
            # Calculate aspect ratio
            ar = s[:, 1] / s[:, 0] # aspect ratio
            irect = ar.argsort() # Sort according to aspect ratio
            self.img_files = [self.img_files[i] for i in irect] # Get sorted img_files
            self.label_files = [self.label_files[i] for i in irect] # Get the sorted label_files
            self.labels = [self.labels[i] for i in irect] # Get the sorted labels
            self.shapes = s[irect] # Get the sorted wh
            ar = ar[direct] # Get the sorted wh
 
            # Calculate the unified scale adopted by each batch Set training image shapes
            shapes = [[1, 1]] * nb
            for i in range(nb):
                # Extract images from the same batch
                ari = ar[bi == i]
                mini, maxi = ari.min(), ari.max() # Get the minimum and maximum aspect ratio in the i-th batch
                if maxi < 1:
                    # [H, W] If the height/width is less than 1 (w > h), the width is greater than the height, short and fat, (img_size*maxi, img_size) (scaling ensures that the original image scale remains unchanged)
                    shapes[i] = [maxi, 1]
                elif mini > 1:
                    # [H, W] If the height/width is greater than 1 (w < h), the width is smaller than the height, thin and tall type, (img_size, img_size * (1/mini)) (ensure that the original image scale remains unchanged for scaling)
                    shapes[i] = [1, 1 / mini]
            # Calculate the shape value of each batch input network (set upward to an integer multiple of 32)
            # The height and width of each batch_shapes are required to be an integer multiple of 32, so first divide by 32, round and then multiply by 32 (but if img_size is a multiple of 32, there is no need here)
            self.batch_shapes = np.ceil(np.array(shapes) * img_size / stride + pad).astype(np.int) * stride

2 HSV Transform

  • HSV-Hue augmentation (fraction), Hue

  • HSV-Saturation augmentation (fraction), saturation

  • HSV-Value augmentation (fraction), exposure

  • code:

  • # File location of the calling function: File location: utils/datasets.py
    # Color gamut space enhancement Augment colorspace: H hue, S saturation, V brightness
    # Change hsv by some random values to achieve data enhancement
    augment_hsv(img, hgain=hyp['hsv_h'], sgain=hyp['hsv_s'], vgain=hyp['hsv_v'])
     
    #The location of the called function: utils/augmentations.py
    def augment_hsv(im, hgain=0.5, sgain=0.5, vgain=0.5):
        # HSV color-space augmentation
        if hgain or sgain or vgain:
            r = np.random.uniform(-1, 1, 3) * [hgain, sgain, vgain] + 1 # random gains
            hue, sat, val = cv2.split(cv2.cvtColor(im, cv2.COLOR_BGR2HSV))
            dtype = im.dtype # uint8
     
            x = np.arange(0, 256, dtype=r.dtype)
            lut_hue = ((x * r[0]) % 180).astype(dtype)
            lut_sat = np.clip(x * r[1], 0, 255).astype(dtype)
            lut_val = np.clip(x * r[2], 0, 255).astype(dtype)
     
            im_hsv = cv2.merge((cv2.LUT(hue, lut_hue), cv2.LUT(sat, lut_sat), cv2.LUT(val, lut_val)))
            cv2.cvtColor(im_hsv, cv2.COLOR_HSV2BGR, dst=im) # no return needed

    3. Random rotation, translation, scaling, cropping, miscutting/non-vertical projection, perspective transformation (starting from 0)

  • 3-1 Rotation:(

  • src is the picture on the left, dst is the picture after rotation and scaling on the right, xy is the horizontal and vertical coordinates, and M is the rotation and scaling matrix.

  • The rotation parameters are mainly M[0,1], M[1, 0], and M[0,1], M[1, 0] are opposite numbers of each other.

  • The scaling parameters are mainly M[0,0], M[1,1].

  • )
  • 3-2 Pan (

  • src is the picture on the left, dst is the picture on the right after rotation and scaling, xy is the horizontal Vertical coordinate, M is the translation matrix
  • The x-axis translation parameter is mainly based on M[0,2]
  • The y-axis translation parameter is mainly based on M[1,2]
  • 3-3 Miscut/non-vertical projection (

  • The wrong cut is similar to the deformation caused by fixing one side of the picture and applying a push force to the other parallel side.

  • src is the picture on the left, dst is the picture after miscut transformation on the right, xy is the horizontal and vertical coordinates, M is the miscut matrix

  • The miscut parameters are mainly M[0,1], M[1, 0].

  • )
  • 3-4 Perspective transformation (< /strong>

  • src is the picture on the left, dst is the picture after perspective transformation on the right, xy is the horizontal and vertical coordinates, and M is the transformation matrix

  • The transformation parameters are mainly M[2,0], M[2,1].

  • )

  • # Calling function address: utils/datasets.py
    #Augment
            # random_perspective Augment Random perspective transformation [1280, 1280, 3] => [640, 640, 3]
            # Perform random rotation, translation, scaling, cropping, perspective transformation on the mosaic integrated image, and resize it to the input size img_size
    img4, labels4 = random_perspective(img4, labels4, segments4,
                                               degrees=self.hyp['degrees'], # Rotation
                                               translate=self.hyp['translate'], # Translate
                                               scale=self.hyp['scale'], # Scale
                                               shear=self.hyp['shear'], # Miscut/non-vertical projection
                                               perspective=self.hyp['perspective'], # Perspective transformation
                                               border=self.mosaic_border) # border to remove
     
    # Called function address: utils/augmentations.py
    def random_perspective(im, targets=(), segments=(), degrees=10, translate=.1, scale=.1, shear=10, perspective=0.0,
                           border=(0, 0)):
        # torchvision.transforms.RandomAffine(degrees=(-10, 10), translate=(0.1, 0.1), scale=(0.9, 1.1), shear=(-10, 10))
        # targets = [cls, xyxy]
     
        height = im.shape[0] + border[0] * 2 # shape(h,w,c)
        width = im.shape[1] + border[1] * 2
     
        #Center
        C = np.eye(3)
        C[0, 2] = -im.shape[1] / 2 # x translation (pixels)
        C[1, 2] = -im.shape[0] / 2 # y translation (pixels)
     
        # Perspective # Perspective transformation
        P = np.eye(3)
        P[2, 0] = random.uniform(-perspective, perspective) # x perspective (about y)
        P[2, 1] = random.uniform(-perspective, perspective) # y perspective (about x)
     
        # Rotation and Scale Rotation + Scale
        R = np.eye(3)
        a = random.uniform(-degrees, degrees)
        # a + = random.choice([-180, -90, 0, 90]) # add 90deg rotations to small rotations
        s = random.uniform(1 - scale, 1 + scale)
        # s = 2 ** random.uniform(-scale, scale)
        R[:2] = cv2.getRotationMatrix2D(angle=a, center=(0, 0), scale=s)
     
        # Shear miscut/non-vertical projection
        S = np.eye(3)
        S[0, 1] = math.tan(random.uniform(-shear, shear) * math.pi / 180) # x shear (deg)
        S[1, 0] = math.tan(random.uniform(-shear, shear) * math.pi / 180) # y shear (deg)
     
        # Translation Translation
        T = np.eye(3)
        T[0, 2] = random.uniform(0.5 - translate, 0.5 + translate) * width # x translation (pixels)
        T[1, 2] = random.uniform(0.5 - translate, 0.5 + translate) * height # y translation (pixels)
     
        # Combined rotation matrix
        # Multiply all transformation matrices to get the final transformation matrix
        M = T @ S @ R @ P @ C # order of operations (right to left) is IMPORTANT
        if (border[0] != 0) or (border[1] != 0) or (M != np.eye(3)).any(): # image changed
            if perspective:
                im = cv2.warpPerspective(im, M, dsize=(width, height), borderValue=(114, 114, 114))
            else: #affine
                im = cv2.warpAffine(im, M[:2], dsize=(width, height), borderValue=(114, 114, 114))
     
        # Visualize
        # import matplotlib.pyplot as plt
        # ax = plt.subplots(1, 2, figsize=(12, 6))[1].ravel()
        # ax[0].imshow(im[:, :, ::-1]) # base
        # ax[1].imshow(im2[:, :, ::-1]) # warped
     
        # Transform label coordinates
        n = len(targets)
        if n:
            use_segments = any(x.any() for x in segments)
            new = np.zeros((n, 4))
            if use_segments: # warp segments
                segments = resample_segments(segments) # upsample
                # Among them segment.shape = [n, 2], represents each coordinate point of the object outline
                for i, segment in enumerate(segments):
                    xy = np.ones((len(segment), 3))
                    xy[:, :2] = segment
                    xy = xy @ M.T # transform applies rotation matrix
                    xy = xy[:, :2] / xy[:, 2:3] if perspective else xy[:, :2] # perspective rescale or affine
     
                    # clip
                    new[i] = segment2box(xy, width, height)
     
            else: # warp boxes If it is box coordinates, here each line of targets is [x1, y1, x2, y2], n is the number of lines, indicating the number of target borders:
                xy = np.ones((n * 4, 3))
                xy[:, :2] = targets[:, [1, 2, 3, 4, 1, 4, 3, 2]].reshape(n * 4, 2) # x1y1, x2y2, x1y2, x2y1
                xy = xy @ M.T # transform applies rotation matrix
                # If the perspective transformation parameter perspective is not 0, rescale is required. If the perspective transformation parameter is 0, rescale is not required.
                xy = (xy[:, :2] / xy[:, 2:3] if perspective else xy[:, :2]).reshape(n, 8) # perspective rescale or affine
     
                # create new boxes
                x = xy[:, [0, 2, 4, 6]]
                y = xy[:, [1, 3, 5, 7]]
                new = np.concatenate((x.min(1), y.min(1), x.max(1), y.max(1))).reshape(4, n).T
     
                # clip clip the coordinates to the interval [0, width], [0, height]
                new[:, [0, 2]] = new[:, [0, 2]].clip(0, width)
                new[:, [1, 3]] = new[:, [1, 3]].clip(0, height)
     
            # filter candidates Further filter, leaving those xy with w, h > 2, aspect ratio < 20, and area ratio after transformation > 0.1
            i = box_candidates(box1=targets[:, 1:5].T * s, box2=new.T, area_thr=0.01 if use_segments else 0.10)
            targets = targets[i]
            targets[:, 1:5] = new[i]
     
        return im, targets

    Code Logic Description3-4 flip

  • 3-5 Four picture splicing (

  • Initialize the entire background image, the size is (2 × image_size, 2 × image_size, 3)

  • Randomly pick a center point

  • Based on the center point, place the 4 pictures in the upper left, upper right, lower left, and lower right respectively. This part may be because the center point is smaller than the width and height of the 4 pictures.

  • Therefore, when splicing, you may need to crop and recalculate the offset of the marking border.

  • )

  • Need to crop
  • code:
  •  # Code location: utils/datasets.py
        def load_mosaic(self, index):
            """Use the __getitem__ function in the LoadImagesAndLabels module for mosaic data enhancement
                Stitch four images into a mosaic image loads images in a 4-mosaic
                :param index: the image index to be obtained
                :return: img4: An image after mosaic and random perspective transformation numpy(640, 640, 3)
                         labels4: target corresponding to img4 [M, cls + x1y1x2y2]
                """
            # labels4: used to store label information for spliced images (4 pictures into one) (excluding segments polygons)
            # segments4: used to store label information (including segments polygons) of spliced images (4 pictures into one)
            labels4, segments4 = [], []
            s = self.img_size #General image size
            # Randomly initialize the center point coordinates of the spliced image and randomly select 2 numbers between [0, s*2] as the center coordinates of the spliced image.
            yc, xc = (int(random.uniform(-x, 2 * s + x)) for x in self.mosaic_border) # mosaic center x, y
            # Randomly find three additional images from the dataset for splicing [14, 26, 2, 16] and then randomly select the indices of the three images.
            indices = [index] + random.choices(self.indices, k=3) # 3 additional image indices
            random.shuffle(indices)
            # Traverse four images for splicing 4 images of different sizes => 1 image of [1472, 1472, 3]
            for i, index in enumerate(indices):
                # load image takes one picture at a time and resizes this picture to self.size(h,w)
                img, _, (h, w) = self.load_image(index)
     
                # place img in img4
                if i == 0: # top left original image [375, 500, 3] load_image->[552, 736, 3] hwc
                    # Create mosaic image [1472, 1472, 3]=[h, w, c]
                    img4 = np.full((s * 2, s * 2, img.shape[2]), 114, dtype=np.uint8) # base image with 4 tiles
                    # Calculate the coordinate information in the mosaic image (fill the image into the mosaic image) w=736 h = 552 Mosaic image: (x1a, y1a) upper left corner (x2a, y2a) lower right corner
                    x1a, y1a, x2a, y2a = max(xc - w, 0), max(yc - h, 0), xc, yc # xmin, ymin, xmax, ymax (large image)
                    # Calculate the intercepted image area information (use xc, yc as the lower right corner coordinates of the first image and fill it into the mosaic image, discard the out-of-boundary area) Image: (x1b, y1b) upper left corner (x2b, y2b) lower right corner
                    x1b, y1b, x2b, y2b = w - (x2a - x1a), h - (y2a - y1a), w, h # xmin, ymin, xmax, ymax (small image)
                elif i == 1: # top right
                    # Calculate the coordinate information in the mosaic image (fill the image into the mosaic image)
                    x1a, y1a, x2a, y2a = xc, max(yc - h, 0), min(xc + w, s * 2), yc
                    # Calculate the intercepted image area information (fill xc, yc as the coordinates of the lower left corner of the second image into the mosaic image, and discard the out-of-bounds area)
                    x1b, y1b, x2b, y2b = 0, h - (y2a - y1a), min(w, x2a - x1a), h
                elif i == 2: # bottom left
                    # Calculate the coordinate information in the mosaic image (fill the image into the mosaic image)
                    x1a, y1a, x2a, y2a = max(xc - w, 0), yc, xc, min(s * 2, yc + h)
                    # Calculate the intercepted image area information (fill xc, yc as the coordinates of the upper right corner of the third image into the mosaic image, and discard the out-of-bounds area)
                    x1b, y1b, x2b, y2b = w - (x2a - x1a), 0, w, min(y2a - y1a, h)
                elif i == 3: # bottom right
                    # Calculate the coordinate information in the mosaic image (fill the image into the mosaic image)
                    x1a, y1a, x2a, y2a = xc, yc, min(xc + w, s * 2), min(s * 2, yc + h)
                    # Calculate the intercepted image area information (fill xc, yc as the coordinates of the upper left corner of the fourth image into the mosaic image, and discard the out-of-bounds area)
                    x1b, y1b, x2b, y2b = 0, 0, min(w, x2a - x1a), min(y2a - y1a, h)
     
                # Fill the captured image area into the corresponding position of the mosaic image img4[h, w, c]
                # Cut out the [(x1b, y1b) upper left corner (x2b, y2b) lower right corner] area of the image img and fill it into the [(x1a, y1a) upper left corner (x2a, y2a) lower right corner] area of the mosaic image
                img4[y1a:y2a, x1a:x2a] = img[y1b:y2b, x1b:x2b] # img4[ymin:ymax, xmin:xmax]
                # Calculate pad (the distance between the current image boundary and the mosaic boundary, padw/padh is a negative value if it crosses the boundary) for subsequent label mapping
                padw = x1a - x1b # The difference between the current image and the mosaic image in w dimension
                padh = y1a - y1b # The difference between the current image and the mosaic image in h dimension
     
                # labels: Get all normal label information corresponding to the spliced image (if there are segments, polygons will be converted into rectangular labels)
                # segments: Get all abnormal label information corresponding to the spliced image (including segment polygons and normal gt)
                # Update coordinate values in the new image
                labels, segments = self.labels[index].copy(), self.segments[index].copy()
                if labels.size:
                    labels[:, 1:] = xywhn2xyxy(labels[:, 1:], w, h, padw, padh) # normalized xywh to pixel xyxy format
                    segments = [xyn2xy(x, w, h, padw, padh) for x in segments]
                labels4.append(labels) # Update labels4
                segments4.extend(segments) # Update segments4
     
            # Concat/clip labels4 Compress labels4 ([(2, 5), (1, 5), (3, 5), (1, 5)] => (7, 5)) together
            labels4 = np.concatenate(labels4, 0)
            # To prevent out-of-bounds values (position information) of all elements in label[:, 1:] must be between [0, 2*s]. If it is less than 0, make it equal to 0. If it is greater than 2*s, make it equal to 2*s out. : return
            for x in (labels4[:, 1:], *segments4):
                np.clip(x, 0, 2 * s, out=x) # clip when using random_perspective()
            # img4, labels4 = replicate(img4, labels4) # replicate

    3-6 Image fusion (is simply superimposing two images together and distinguishing them through different transparency.)

  • code:

  • # Calling function address: utils/datasets.py
    if random.random() < hyp['mixup']: # hyp['mixup']=0 The default is 0 and it is closed. The default is 1 and it is 100% open.
        # *load_mosaic(self, random.randint(0, self.n - 1)) Randomly select a picture from the data set and this picture for mixup data enhancement
        # img: Image after fusion of two images numpy (640, 640, 3)
        # labels: label after fusion of two pictures [M + N, cls + x1y1x2y2]
        img, labels = mixup(img, labels, *self.load_mosaic(random.randint(0, self.n - 1)))
     
    # Called function address: utils/augmentations.py
    def mixup(im, labels, im2, labels2):
        # Applies MixUp augmentation https://arxiv.org/pdf/1710.09412.pdf
        r = np.random.beta(32.0, 32.0) # mixup ratio, alpha=beta=32.0
        im = (im * r + im2 * (1 - r)).astype(np.uint8)
        labels = np.concatenate((labels, labels2), 0)
        return im, labels

    < strong>3-7 Segmentation and filling (After segmenting the target of the image, you need to calculate the target border and fill all the target borders in the picture IOU<0.3 (implementation parameter))

  • code:

  • # Calling function address: utils/datasets.py
    img4, labels4, segments4 = copy_paste(img4, labels4, segments4, p=self.hyp['copy_paste'])
     
    # Called function address: utils/augmentations.py
    def copy_paste(im, labels, segments, p=0.5):
        # Implement Copy-Paste augmentation https://arxiv.org/abs/2012.07177, labels as nx5 np.array(cls, xyxy)
        n = len(segments)
        if p and n:
            h, w, c = im.shape # height, width, channels
            im_new = np.zeros(im.shape, np.uint8)
            for j in random.sample(range(n), k=round(p * n)):
                l, s = labels[j], segments[j]
                box = w - l[3], l[2], w - l[1], l[4]
                ioa = bbox_ioa(box, labels[:, 1:5]) # intersection over area
                if (ioa < 0.30).all(): # allow 30% obscuration of existing labels
                    labels = np.concatenate((labels, [[l[0], *box]]), 0)
                    segments.append(np.concatenate((w - s[:, 0:1], s[:, 1:2]), 1))
                    cv2.drawContours(im_new, [segments[j].astype(np.int32)], -1, (255, 255, 255), cv2.FILLED)
     
            result = cv2.bitwise_and(src1=im, src2=im_new)
            result = cv2.flip(result, 1) # augment segments (flip left-right)
            i = result > 0 # pixels to replace
            # i[:, :] = result.max(2).reshape(h, w, 1) # act over ch
            im[i] = result[i] # cv2.imwrite('debug.jpg', im) # debug
     
        return im, labels, segments
  • DIY data enhancement (cropping, panning, rotating, changing brightness, adding noise)

  • Reference (if there is any infringement, please contact to delete):

  • Methods to implement data enhancement in YOLO data set (cropping, translation, rotation, changing brightness, adding noise, etc.)_yolov5 data enhancement_ Luren Jia’ω’ (Kaogong Middle School)’s blog – CSDN blog Method to implement data enhancement in YOLO data set ( Cropping, translation, rotation, changing brightness, adding noise, etc.)_yolov5 data enhancement_Luren Jia’ω”s blog-CSDN blog

  • After enhancement: