FGSM (Fast Gradient Sign Method) algorithm source code analysis

Paper link: https://arxiv.org/abs/1412.6572
Source code: https://github.com/Harry24k/adversarial-attacks-pytorch/tree/master

Source code

import torch
import torch.nn as nn

from ..attack import Attack


class FGSM(Attack):
    r"""
    FGSM in the paper 'Explaining and harnessing adversarial examples'
    [https://arxiv.org/abs/1412.6572]

    Distance Measure: Linf

    Arguments:
        model (nn.Module): model to attack.
        eps (float): maximum perturbation. (Default: 8/255)

    Shape:
        - images: :math:`(N, C, H, W)` where `N = number of batches`, `C = number of channels`, `H = height` and `W = width`. It must have a range [0, 1].
        - labels: :math:`(N)` where each value :math:`y_i` is :math:`0 \leq y_i \leq` `number of labels`.
        - output: :math:`(N, C, H, W)`.

    Examples::
        >>> attack = torchattacks.FGSM(model, eps=8/255)
        >>> adv_images = attack(images, labels)

    """
    def __init__(self, model, eps=8/255):
        super().__init__("FGSM", model)
        self.eps = eps
        self.supported_mode = ['default', 'targeted']

    def forward(self, images, labels):
        r"""
        Overridden.
        """
        self._check_inputs(images)

        images = images.clone().detach().to(self.device)
        labels = labels.clone().detach().to(self.device)

        if self.targeted:
            target_labels = self.get_target_label(images, labels)

        loss = nn.CrossEntropyLoss()

        images.requires_grad = True
        outputs = self.get_logits(images)

        #Calculate loss
        if self.targeted:
            cost = -loss(outputs, target_labels)
        else:
            cost = loss(outputs, labels)

        #Update adversarial images
        grad = torch.autograd.grad(cost, images,
                                   retain_graph=False, create_graph=False)[0]

        adv_images = images + self.eps*grad.sign()
        adv_images = torch.clamp(adv_images, min=0, max=1).detach()

        return adv_images

Analysis

The full name of FGSM is Fast Gradient Sign Method (fast gradient descent method). In a white box environment, by finding the derivative of the loss cost to the input, and then using the symbolic function sign()Get its specific gradient direction, and then multiply it by a step size eps. The obtained “perturbation” is added to the original input to obtain the sample under the FGSM attack.
You can carefully recall that in the back propagation of neural networks, we update along the direction of gradient descent during the training process.

w,b

The value of w,b. Doing so can make the network converge in the direction of decreasing loss cost. Simply put, the gradient direction represents the direction in which the loss cost changes the fastest. The FGSM algorithm assumes the target loss function

(

)

J(x,y)

J(x,y) and

x is approximately linear, that is,

(

)

≈

J(x,y)≈w^Tx

J(x,y)≈wTx, so change the input in the direction of gradient rise

x can increase the loss, thereby achieving the purpose of making the model classify incorrectly. The specific method is to add a perturbation to the image

\eta

(

▽

(

)

\eta= \epsilon sign(\triangledown_{x}J(\theta,x,y))

η=?sign(▽x?J(θ,x,y)), where

▽

\triangledown_{x}

▽x? is the gradient,

\epsilon

?That is, the step size, which is the maximum value of perturbation for each pixel.
If it is a targetless attack, based on the above, the formula is as follows:

(

▽

(

)

X^{adv}=X + \epsilon sign(\triangledown_xJ(X,y_{true}))

Xadv=X + ?sign(▽x?J(X,ytrue?))where,

y_{true}

ytrue? is a sample

The real label of X.
If there is a target attack, it is necessary to increase the perturbation to approximate the target label, that is, to reduce the loss between the sample and the target label. The formula is as follows:

(

▽

(

)

X^{adv}=X-\epsilon sign(\triangledown_xJ(X,y_{target}))

Xadv=Xsign(▽x?J(X,ytarget?))where,

y_{target}

ytarget? is the target label. There are many ways to select the target label. For example, you can select the label that is the largest difference from the real label, or you can randomly select labels other than the real label.

The forward() function is the attack process. Input the image images and the label y to return the adversarial image adv_images.
images = images.clone().detach().to(self.device): clone() clones the image to a new memory area (pytorch defaults to the same tensors share a memory area); detach() separates the cloned new tensor from the current calculation graph and uses it as a leaf node, so that its gradient can be calculated; to() is to load it into the device. target_labels = self.get_target_label(images, labels): If there is a target attack, get the target label. There are many ways to select the target label. For example, you can select the label that is the largest difference from the real label, or you can randomly select labels other than the real label. loss = nn.CrossEntropyLoss(): Set the loss function to cross-entropy loss. images.requires_grad = True: Set this parameter to True, and pytorch will automatically generate a calculation graph for gradient calculation during program running. outputs = self.get_logits(images): Get the output value of the image in the model. cost = -loss(outputs, target_labels): Calculate the loss when there is a target cost = loss(outputs, labels): Calculate loss without target grad = torch.autograd.grad(cost, images, retain_graph=False, create_graph=False)[0]: cost differentiates images , get the gradient grad. adv_images = images + self.eps*grad.sign(): Add a perturbation to the original image according to the formula to obtain the adversarial image. adv_images = torch.clamp(adv_images, min=0, max=1).detach(): Set the part greater than 1 in images to 1 and the part less than 0 Set to 0 to prevent crossing the boundary.


Thinking
The FGSM algorithm assumes a target loss function
         J
         (
         x
         ,
         y
         )
        J(x,y)
    J(x,y) and
         x
        x
    x are approximately linear, but this linearity assumption is not necessarily correct. If they are not linear, then
         (
         0
         ,
         ?
         s
         i
         g
         n
         (
          ▽
          x
         J
         (
         θ
         ,
         x
         ,
         y
         )
         )
         )
        (0,\epsilon sign(\triangledown_{x}J(\theta,x,y)))
    Is there some perturbation between (0,?sign(▽x?J(θ,x,y))) such that
         J
        J
    J also increased greatly. At this time
         x
        x
    The modification amount of x can be less than
         ?
        \epsilon
    ?. Therefore, some scholars proposed an iterative method to find the disturbance of each pixel, which is the BIM algorithm. For details, you can check the next blog.