A must-have for YOLOv5 to gain points! Improved loss functions EIoU, SIoU, AlphaIoU, FocalEIoU, Wise-IoU

Table of Contents

1. The role of improving the loss function

Second, specific implementation


1. Improving the role of loss function

The role of the YOLOv5 loss function is to measure the difference between the predicted box and the real box, and update the parameters of the model based on these differences. It helps the model learn how to accurately detect and locate target objects, thereby improving detection precision and accuracy.

The loss function in YOLOv5 mainly includes three parts: target classification loss, bounding box coordinate loss and object confidence loss.

  1. Object classification loss: This loss function measures the difference between the object class in the predicted box and the object class in the true box. It uses a cross-entropy loss function to calculate the classification error, causing the model to learn to correctly classify individual target objects.

  2. Bounding box coordinate loss: This loss function measures the difference between the bounding box position in the predicted box and the bounding box position in the ground truth box. Generally, the square loss function or the IOU (intersection over union) loss function is used to measure the position offset of the bounding box so that the model can accurately locate the target object.

  3. Object Confidence Loss: This loss function measures the difference between the object confidence in the predicted box and the object confidence in the true box. Object confidence indicates whether the target object exists in the prediction box, which is a key indicator in the detection algorithm. By optimizing the loss function for object confidence, the model can learn how to accurately determine whether there is a target object in the prediction box.

The loss function of YOLOv5 comprehensively considers three important factors: target classification, bounding box position and object confidence, which together constitute the key elements of target detection. By minimizing the loss function, the model can continuously optimize parameters and improve the accuracy and robustness of target detection.

Two, specific implementation

The default loss function of YOLOv5 is CIoU, and it also comes with GIoU and DIoU.

File path: utils/metrics.py

The function name is: bbox_iou

Original loss function definition:

def bbox_iou(box1, box2, xywh=True, GIoU=False, DIoU=False, CIoU=False, eps=1e-7):
    # Returns Intersection over Union (IoU) of box1(1,4) to box2(n,4)

    # Get the coordinates of bounding boxes
    if xywh: # transform from xywh to xyxy
        (x1, y1, w1, h1), (x2, y2, w2, h2) = box1.chunk(4, -1), box2.chunk(4, -1)
        w1_, h1_, w2_, h2_ = w1 / 2, h1 / 2, w2 / 2, h2 / 2
        b1_x1, b1_x2, b1_y1, b1_y2 = x1 - w1_, x1 + w1_, y1 - h1_, y1 + h1_
        b2_x1, b2_x2, b2_y1, b2_y2 = x2 - w2_, x2 + w2_, y2 - h2_, y2 + h2_
    else: # x1, y1, x2, y2 = box1
        b1_x1, b1_y1, b1_x2, b1_y2 = box1.chunk(4, -1)
        b2_x1, b2_y1, b2_x2, b2_y2 = box2.chunk(4, -1)
        w1, h1 = b1_x2 - b1_x1, (b1_y2 - b1_y1).clamp(eps)
        w2, h2 = b2_x2 - b2_x1, (b2_y2 - b2_y1).clamp(eps)

    #Intersection area
    inter = (b1_x2.minimum(b2_x2) - b1_x1.maximum(b2_x1)).clamp(0) * \
            (b1_y2.minimum(b2_y2) - b1_y1.maximum(b2_y1)).clamp(0)

    #UnionArea
    union = w1 * h1 + w2 * h2 - inter + eps

    #IoU
    iou=inter/union
    if CIoU or DIoU or GIoU:
        cw = b1_x2.maximum(b2_x2) - b1_x1.minimum(b2_x1) # convex (smallest enclosing box) width
        ch = b1_y2.maximum(b2_y2) - b1_y1.minimum(b2_y1) # convex height
        if CIoU or DIoU: # Distance or Complete IoU https://arxiv.org/abs/1911.08287v1
            c2 = cw ** 2 + ch ** 2 + eps # convex diagonal squared
            rho2 = ((b2_x1 + b2_x2 - b1_x1 - b1_x2) ** 2 + (b2_y1 + b2_y2 - b1_y1 - b1_y2) ** 2) / 4 # center dist ** 2
            if CIoU: # https://github.com/Zzh-tju/DIoU-SSD-pytorch/blob/master/utils/box/box_utils.py#L47
                v = (4 / math.pi ** 2) * (torch.atan(w2 / h2) - torch.atan(w1 / h1)).pow(2)
                with torch.no_grad():
                    alpha = v / (v - iou + (1 + eps))
                return iou - (rho2 / c2 + v * alpha) # CIoU
            return iou - rho2/c2 # DIoU
        c_area = cw * ch + eps # convex area
        return iou - (c_area - union) / c_area # GIoU https://arxiv.org/pdf/1902.09630.pdf
    return iou #IoU

Change to: Replace the function mentioned above with the following

def bbox_iou(box1, box2, xywh=True, GIoU=False, DIoU=False, CIoU=False, SIoU=False, EIoU=False, Focal=False, alpha=1, gamma=0.5, eps=1e- 7):
    # Returns Intersection over Union (IoU) of box1(1,4) to box2(n,4)

    # Get the coordinates of bounding boxes
    if xywh: # transform from xywh to xyxy
        (x1, y1, w1, h1), (x2, y2, w2, h2) = box1.chunk(4, -1), box2.chunk(4, -1)
        w1_, h1_, w2_, h2_ = w1 / 2, h1 / 2, w2 / 2, h2 / 2
        b1_x1, b1_x2, b1_y1, b1_y2 = x1 - w1_, x1 + w1_, y1 - h1_, y1 + h1_
        b2_x1, b2_x2, b2_y1, b2_y2 = x2 - w2_, x2 + w2_, y2 - h2_, y2 + h2_
    else: # x1, y1, x2, y2 = box1
        b1_x1, b1_y1, b1_x2, b1_y2 = box1.chunk(4, -1)
        b2_x1, b2_y1, b2_x2, b2_y2 = box2.chunk(4, -1)
        w1, h1 = b1_x2 - b1_x1, (b1_y2 - b1_y1).clamp(eps)
        w2, h2 = b2_x2 - b2_x1, (b2_y2 - b2_y1).clamp(eps)

    #Intersection area
    inter = (b1_x2.minimum(b2_x2) - b1_x1.maximum(b2_x1)).clamp(0) * \
            (b1_y2.minimum(b2_y2) - b1_y1.maximum(b2_y1)).clamp(0)

    #UnionArea
    union = w1 * h1 + w2 * h2 - inter + eps

    #IoU
    # iou = inter / union # ori iou
    iou = torch.pow(inter/(union + eps), alpha) # alpha iou
    if CIoU or DIoU or GIoU or EIoU or SIoU:
        cw = b1_x2.maximum(b2_x2) - b1_x1.minimum(b2_x1) # convex (smallest enclosing box) width
        ch = b1_y2.maximum(b2_y2) - b1_y1.minimum(b2_y1) # convex height
        if CIoU or DIoU or EIoU or SIoU: # Distance or Complete IoU https://arxiv.org/abs/1911.08287v1
            c2 = (cw ** 2 + ch ** 2) ** alpha + eps # convex diagonal squared
            rho2 = (((b2_x1 + b2_x2 - b1_x1 - b1_x2) ** 2 + (b2_y1 + b2_y2 - b1_y1 - b1_y2) ** 2) / 4) ** alpha # center dist ** 2
            if CIoU: # https://github.com/Zzh-tju/DIoU-SSD-pytorch/blob/master/utils/box/box_utils.py#L47
                v = (4 / math.pi ** 2) * (torch.atan(w2 / h2) - torch.atan(w1 / h1)).pow(2)
                with torch.no_grad():
                    alpha_ciou = v / (v - iou + (1 + eps))
                if Focal:
                    return iou - (rho2 / c2 + torch.pow(v * alpha_ciou + eps, alpha)), torch.pow(inter/(union + eps), gamma) # Focal_CIoU
                else:
                    return iou - (rho2 / c2 + torch.pow(v * alpha_ciou + eps, alpha)) # CIoU
            elif EIoU:
                rho_w2 = ((b2_x2 - b2_x1) - (b1_x2 - b1_x1)) ** 2
                rho_h2 = ((b2_y2 - b2_y1) - (b1_y2 - b1_y1)) ** 2
                cw2 = torch.pow(cw ** 2 + eps, alpha)
                ch2 = torch.pow(ch ** 2 + eps, alpha)
                if Focal:
                    return iou - (rho2 / c2 + rho_w2 / cw2 + rho_h2 / ch2), torch.pow(inter/(union + eps), gamma) # Focal_EIou
                else:
                    return iou - (rho2 / c2 + rho_w2 / cw2 + rho_h2 / ch2) # EIou
            elif SIoU:
                # SIoU Loss https://arxiv.org/pdf/2205.12740.pdf
                s_cw = (b2_x1 + b2_x2 - b1_x1 - b1_x2) * 0.5 + eps
                s_ch = (b2_y1 + b2_y2 - b1_y1 - b1_y2) * 0.5 + eps
                sigma = torch.pow(s_cw ** 2 + s_ch ** 2, 0.5)
                sin_alpha_1 = torch.abs(s_cw) / sigma
                sin_alpha_2 = torch.abs(s_ch) / sigma
                threshold = pow(2, 0.5) / 2
                sin_alpha = torch.where(sin_alpha_1 > threshold, sin_alpha_2, sin_alpha_1)
                angle_cost = torch.cos(torch.arcsin(sin_alpha) * 2 - math.pi / 2)
                rho_x = (s_cw / cw) ** 2
                rho_y = (s_ch / ch) ** 2
                gamma = angle_cost - 2
                distance_cost = 2 - torch.exp(gamma * rho_x) - torch.exp(gamma * rho_y)
                omiga_w = torch.abs(w1 - w2) / torch.max(w1, w2)
                omiga_h = torch.abs(h1 - h2) / torch.max(h1, h2)
                shape_cost = torch.pow(1 - torch.exp(-1 * omiga_w), 4) + torch.pow(1 - torch.exp(-1 * omiga_h), 4)
                if Focal:
                    return iou - torch.pow(0.5 * (distance_cost + shape_cost) + eps, alpha), torch.pow(inter/(union + eps), gamma) # Focal_SIou
                else:
                    return iou - torch.pow(0.5 * (distance_cost + shape_cost) + eps, alpha) # SIou
            if Focal:
                return iou - rho2 / c2, torch.pow(inter/(union + eps), gamma) # Focal_DIoU
            else:
                return iou - rho2/c2 # DIoU
        c_area = cw * ch + eps # convex area
        if Focal:
            return iou - torch.pow((c_area - union) / c_area + eps, alpha), torch.pow(inter/(union + eps), gamma) # Focal_GIoU https://arxiv.org/pdf/1902.09630.pdf
        else:
            return iou - torch.pow((c_area - union) / c_area + eps, alpha) # GIoU https://arxiv.org/pdf/1902.09630.pdf
    if Focal:
        return iou, torch.pow(inter/(union + eps), gamma) # Focal_IoU
    else:
        return iou #IoU

Introduction to Alpha-IoU:

The name of the paper is good and reflects the core idea of the paper. The author generalizes the existing IoU-based Loss to a new Power IoU series Loss, which has a Power IoU term and an additional Power regular term with a single Power parameter α, calling this new loss series α-IoU. Loss.

Function characteristics:

In the article, the author generalizes the existing IoU-based Loss to a new Power IoU series Loss, which has a Power IoU term and an additional Power regular term, with a single Power parameter α. Call this new loss series α-IoU Loss. Experiments on multi-object detection benchmarks and models show that α-IoU loss:

Can significantly exceed existing IoU-based losses;

By adjusting α, the detector has greater flexibility in achieving different levels of bbox regression accuracy;

More robust to small data sets and noise.

Experimental results show that α (α>1) increases the loss and gradient of high IoU targets, thereby improving the bbox regression accuracy.

The power parameter α can be used as a hyperparameter to adjust the α-IoU loss to meet different levels of bbox regression accuracy, where α >1 obtains high regression accuracy (i.e. High IoU threshold) by paying more attention to the High IoU target.

**α is not overly sensitive to different models or datasets, and α=3 performs consistently well in most cases. **The α-IoU loss family can be easily used to improve detector performance in clean or noisy environments without introducing additional parameters or increasing training/inference time.
The formula is as follows:

So set alpha to 1. In fact, the original IOU is still used, and the alpha attribute is not added. It is generally set to 3.

Then change iou to what you need, so that the combination becomes alpha-ciou, alpha-Diou, etc.

Notice:

  1. The gamma parameter is the gamma parameter in Focal_EIoU, which is generally 0.5. You can change it if necessary.
  2. The alpha parameter is the alpha parameter in AlphaIoU. The default value is 1. 1 means the same as normal IoU. If you want to use AlphaIoU, the default value of alpha in the paper is 3.
  3. Like Focal_EIoU, I think the idea of AlphaIoU can also be used in other IoU variants. Simply put, if you set alpha to 3, the parameters of other IoU settings (GIoU, DIoU, CIoU, EIoU, SIoU) are False When, it is AlphaIoU. If you set alpha to 3 and CIoU to True, it is
  4. If you want to use that IoU variant, just set the parameter to True.

In addition to the above function replacement, you also need to modify the __call__ function in the ComputeLoss Class in utils/loss.py:

Replace the red box code with:

iou = bbox_iou(pbox, tbox[i], CIoU=True) # iou(prediction, target)
if type(iou) is tuple:
    lbox + = (iou[1].detach().squeeze() * (1 - iou[0].squeeze())).mean()
    iou = iou[0].squeeze()
else:
    lbox + = (1.0 - iou.squeeze()).mean() # iou loss
    iou = iou.squeeze()

The final modification of the parameters can be made in the call to bbox_iou. For example, the above code uses CIoU. If you want to use Focal_EIoU, you can modify it as follows:

iou = bbox_iou(pbox, tbox[i], EIoU=True, Focal=True)