Shengsi application case | FGSM network against attacks, easy to adjust model accuracy

In recent years, with the continuous development and evolution of data, computing power, and theory, deep learning has been widely used in many fields such as image, text, voice, and automatic driving. At the same time, people are paying more and more attention to the security issues in the use of various models, because AI models are vulnerable to intentional or unintentional attacks from the outside world and produce erroneous results. In this case, we will use the Fast Gradient Sign Method (FGSM) as an example to demonstrate how such attacks can mislead the model.

01Session preparation

Go to the MindSpore official website (https://www.mindspore.cn/), and click Install above.

Get the install command.

Back in the Notebook, add three commands before the first block of code, and run them one by one.

pip install --upgrade pip
conda install mindspore-gpu=1.9.0 cudatoolkit=10.1 -c mindspore -c conda-forge
pip install mindvision

02 Definition of Adversarial Examples

Szegedy first proposed the concept of adversarial examples in 2013: adding small perturbations that humans cannot perceive to the original samples will degrade the performance of the deep model. This kind of samples is called an adversarial example. As shown in the figure below, after adding noise to the image originally predicted as “panda”, the model predicts it as “gibbon”, and the sample on the right is an adversarial sample:

Image from Explaining and Harnessing Adversarial Examples.

03 Attack method

The attack methods on the model can be classified from the following two aspects at the macro level:

1. How much information does the attacker have:

White-box attack: The attacker has full knowledge and access to the model, including model structure, weights, input, and output, and can interact with the model system during the process of generating adversarial attack data. Since the information of the model is completely controlled by the attacker, the attacker can design a specific attack algorithm according to the characteristics of the attacked model.

Black-box attack: In contrast to white-box attack, the attacker has only limited knowledge about the model. The attacker knows nothing about the structural weights of the model, only some of the input and output.

2. The purpose of the attacker:

Targeted attack: The attacker misleads the model results to a specific classification.

Untargeted attack: The attacker just wants to produce the wrong result and doesn’t care what the new result is.

The gradient sign attack FGSM used in this case is a white-box attack method, which can be either targeted or untargeted.

For more model security functions, please refer to MindArmour, which now supports FGSM, LLC, Substitute Attack and other adversarial sample generation methods, and provides adversarial sample robustness module, Fuzz Testing module, privacy protection and evaluation module to help users enhance the model safety.

Fast Gradient Sign Attack (FGSM)

The training of the classification network will define a loss function, which is used to measure the distance between the model output value and the real label of the sample, calculate the model gradient through backpropagation, update the network parameters through gradient descent, reduce the loss value, and improve the model accuracy.

FGSM (Fast Gradient SignMethod) is a simple and efficient method for generating adversarial samples. Different from the training process of the normal classification network, FGSM calculates the gradient of the loss for the input as ?xJ(θ,x,y), and the gradient represents the sensitivity of the loss to the input change.

Then the above gradient is added to the original input to increase the loss, resulting in a poorer classification effect of the model on the modified input samples, thus achieving the attack effect. Another requirement of adversarial samples is that the difference between the generated sample and the original sample should be as small as possible. Using the sign function can make the image modification as uniform as possible.

The resulting counter-perturbation formula can be expressed as:

Adversarial examples can be formulated as:

  • x: The original input image correctly classified as “Pandas”.

  • y: is the output of x.

  • θ: Model parameters.

  • ε: attack coefficient.

  • J(θ,x,y): The loss of the trained network.

  • ?xJ(θ): Backpropagation gradient.

04 Data processing

In this case, MNIST will be used to train a LeNet network with a standard accuracy, and then the FGSM attack method mentioned above will be run to deceive the network model and allow the model to achieve the effect of misclassification.

The following sample code downloads and decompresses the dataset to the specified location.

from mindvision.dataset import Mnist

# Download and process the MNIST dataset
download_train = Mnist(path="./mnist", split="train", shuffle=True, download=True)
download_eval = Mnist(path="./mnist", split="test", download=True)

dataset_train = download_train. run()
dataset_eval = download_eval. run()

The directory structure of the downloaded dataset file is as follows:

./mnist
├── test
│ ├── t10k-images-idx3-ubyte
│ └── t10k-labels-idx1-ubyte
└── train
    ├── train-images-idx3-ubyte
    └── train-labels-idx1-ubyte

05 Training LeNet network

In the experiment, LeNet is used as a demonstration model to complete image classification. Here, the network is first defined and trained using the MNIST dataset.

Define the LeNet network:

from mindvision.classification.models import lenet

network = lenet(num_classes=10, pretrained=False)

Define the optimizer and loss function:

import mindspore.nn as nn

net_loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction='mean')
net_opt = nn.Momentum(network. trainable_params(), learning_rate=0.01, momentum=0.9)

Define network parameters:

import mindspore as ms
config_ck = ms.CheckpointConfig(save_checkpoint_steps=1875, keep_checkpoint_max=10)
ckpoint = ms.ModelCheckpoint(prefix="checkpoint_lenet", config=config_ck)

Train the LeNet network:

from mindvision.engine.callback import LossMonitor

model = ms.Model(network, loss_fn=net_loss, optimizer=net_opt, metrics={'accuracy'})
model.train(5, dataset_train, callbacks=[ckpoint, LossMonitor(0.01, 1875)])

Testing the network at this time, it can be observed that LeNet has achieved relatively high accuracy:

acc = model.eval(dataset_eval)
print("{}". format(acc))

06 Implement FGSM

After obtaining the accurate LeNet network, the following will use the FSGM attack method to re-test after loading noise in the image.

First obtain the reverse gradient through the loss function:

class WithLossCell(nn. Cell):
    """Wrapper network and loss function"""

    def __init__(self, network, loss_fn):
        super(WithLossCell, self).__init__()
        self._network = network
        self._loss_fn = loss_fn

    def construct(self, data, label):
        out = self._network(data)
        return self._loss_fn(out, label)


class GradWrapWithLoss(nn. Cell):
    """Find the reverse gradient through loss"""

    def __init__(self, network):
        super(GradWrapWithLoss, self).__init__()
        self._grad_all = ops.composite.GradOperation(get_all=True, sens_param=False)
        self._network = network

    def construct(self, inputs, labels):
        gout = self._grad_all(self._network)(inputs, labels)
        return gout[0]

Then implement the FGSM attack according to formula (2):

import numpy as np

class FastGradientSignMethod:
    """Realize FGSM attack"""

    def __init__(self, network, eps=0.07, loss_fn=None):
        # variable initialization
        self._network = network
        self._eps = eps
        with_loss_cell = WithLossCell(self._network, loss_fn)
        self._grad_all = GradWrapWithLoss(with_loss_cell)
        self._grad_all.set_train()


    def _gradient(self, inputs, labels):
        # Find the gradient
        out_grad = self._grad_all(inputs, labels)
        gradient = out_grad.asnumpy()
        gradient = np. sign(gradient)
        return gradient

    def generate(self, inputs, labels):
        # Implement FGSM
        inputs_tensor = ms. Tensor(inputs)
        labels_tensor = ms. Tensor(labels)
        gradient = self._gradient(inputs_tensor, labels_tensor)
        # generate disturbance
        perturbation = self._eps*gradient
        # Generate perturbed images
        adv_x = inputs + perturbation
        return adv_x

    def batch_generate(self, inputs, labels, batch_size=32):
        # process the dataset
        arr_x = inputs
        arr_y = labels
        len_x = len(inputs)
        batches = int(len_x / batch_size)
        res = []
        for i in range(batches):
            x_batch = arr_x[i*batch_size: (i + 1)*batch_size]
            y_batch = arr_y[i*batch_size: (i + 1)*batch_size]
            adv_x = self. generate(x_batch, y_batch)
            res.append(adv_x)
        adv_x = np. concatenate(res, axis=0)
        return adv_x

Process the images of the test set in the MINIST dataset again:

images = []
labels = []
test_images = []
test_labels = []
predict_labels = []

ds_test = dataset_eval.create_dict_iterator(output_numpy=True)

for data in ds_test:
    images = data['image'].astype(np.float32)
    labels = data['label']
    test_images.append(images)
    test_labels.append(labels)
    pred_labels = np.argmax(model.predict(ms.Tensor(images)).asnumpy(), axis=1)
    predict_labels.append(pred_labels)

test_images = np. concatenate(test_images)
predict_labels = np. concatenate(predict_labels)
true_labels = np. concatenate(test_labels)

07 Running Attack

It can be seen from the FGSM attack formula that the greater the attack coefficient ε, the greater the change to the gradient. When ε is zero, the attack effect does not appear.

Now observe the attack effect when ε is zero:

import mindspore.ops as ops

fgsm = FastGradientSignMethod(network, eps=0.0, loss_fn=net_loss)
advs = fgsm.batch_generate(test_images, true_labels, batch_size=32)

adv_predicts = model.predict(ms.Tensor(advs)).asnumpy()
adv_predicts = np.argmax(adv_predicts, axis=1)
accuracy = np. mean(np. equal(adv_predicts, true_labels))
print(accuracy)

Then set ε to 0.5 and try to run the attack:

fgsm = FastGradientSignMethod(network, eps=0.5, loss_fn=net_loss)
advs = fgsm.batch_generate(test_images, true_labels, batch_size=32)

adv_predicts = model.predict(ms.Tensor(advs)).asnumpy()
adv_predicts = np.argmax(adv_predicts, axis=1)
accuracy = np. mean(np. equal(adv_predicts, true_labels))
print(accuracy)

As can be seen from the above printing results, the accuracy of the LeNet model is greatly reduced at this time.

The following demonstrates the actual shape of the attacked photo. It can be seen that the picture has only changed slightly, but the accuracy test has seriously dropped:

import matplotlib.pyplot as plt
%matplotlib inline

adv_examples = np.transpose(advs[:10], [0, 2, 3, 1])
ori_examples = np.transpose(test_images[:10], [0, 2, 3, 1])

plt.figure(figsize=(10, 3), dpi=120)
for i in range(10):
    plt.subplot(3, 10, i + 1)
    plt. axis("off")
    plt.imshow(np.squeeze(ori_examples[i]))
    plt.subplot(3, 10, i + 11)
    plt. axis("off")
    plt.imshow(np.squeeze(adv_examples[i]))
plt. show()

The knowledge points of the article match the official knowledge files, and you can further learn relevant knowledgePython entry skill treeArtificial intelligenceDeep learning 290648 people are studying systematically