Application of computer vision 6-Picasso style image migration using VGG model

Hello everyone, I am Weixue AI. Today I will introduce the application of computer vision 5-Picasso-style image migration using the VGG model. This article will use the VGG model to realize the method of Picasso-style image migration. First, we will briefly explain the principle of image style transfer, and then use the PyTorch framework to implement the algorithm of Picasso-style image transfer step by step. Finally, we will present experimental results to verify the effectiveness of the algorithm.

Directory

I. Introduction

2. The principle of image style transfer

2.1. VGG network
2.2. Loss of content
2.3. Style Loss
2.4. Total loss

3. Algorithm implementation

Four. Summary

1. Introduction

Image style transfer is a technique that applies the artistic style of one image to another, resulting in images with different artistic styles. Among them, CNN-based style transfer technology is a relatively common method. The basic idea of this method is to learn how to separate the content information and style information of the input image by adding a style-related loss function to the convolutional neural network, and recombine the two to generate a new image. Image.

The implementation process of the Picasso style transfer algorithm can be divided into the following steps:

1. Use the convolutional neural network VGG to preprocess the input image and Picasso’s artistic style image. The purpose of this step is to extract the features of the input image and the artistic style image as the basis for subsequent migration.

2. Define two loss functions: content loss and style loss. The content loss is used to preserve the content information of the input image, while the style loss is used to capture the texture, color and detail information in the Picasso style.

3. The weighted sum of the two loss functions is used to obtain the total loss function, and the total loss function is minimized through optimization algorithms such as stochastic gradient descent, so as to achieve the purpose of applying Picasso’s artistic style to the input image.

4. For a new input image, use the trained model for style transfer.

Note: The loss function in the Picasso style transfer algorithm needs to use a pre-trained VGG network, etc. In addition, when implementing the algorithm, the choice of some hyperparameters will also affect the results, such as the weight of content and style loss, learning rate, number of training iterations, etc. Therefore, it is necessary to perform certain parameter adjustment operations when implementing the algorithm to obtain a better migration effect.

Image style conversion:

Second, the principle of image style transfer

2.1. VGG network

We use a pretrained VGG-19 network as a feature extractor, which can capture both content and style features of images. The structure of the VGG-19 network is relatively simple. Its name is a combination of the number of layers and the number of units. There are 19 layers in total (including convolutional layers, pooling layers and fully connected layers), of which 13 are convolutional. Layers, 5 are pooling layers, 1 is a global average pooling layer, and finally connected to a fully connected layer as a classifier.

2.2. Content Loss

Content loss measures the difference between the output image and the feature representation of the content image at a certain layer. We usually use higher-level feature representations that preserve the overall content of the image.

$L_{content}(\vec{p}, \vec{x}, l) = \frac{1}{2} \sum_{i, j} (F_{ij} ^l(\vec{p}) - F_{ij}^l(\vec{x}))^2,$

where $\vec{p}$ is the content image, $\vec{x}$ is the output image, $F_{ij}^l(\cdot)$ is the given image in layer $l$ Characteristic representation.

2.3. Style loss

The style loss measures the difference between the output image and the feature representation of the style image at each layer. We usually use the Gram matrix to measure style features.

$L_{style}(\vec{a}, \vec{x}, l) = \frac{1}{4N_l^2M_l^2} \sum_{i, j}( G_{ij}^l(\vec{a}) - G_{ij}^l(\vec{x}))^2$

where $\vec{a}$ is a style image, $G_{ij}^l(\cdot)$ is the given image in layer , $N_l$ and $M_l$ are layers respectively< img alt="l" class="mathcode" src="//i2.wp.com/latex.csdn.net/eq?l">The number of channels and the size of the feature map.

2.4. Total Loss

Our goal is to minimize the weighted sum of content loss and style loss.

$L(\vec{p}, \vec{a}, \vec{x}) = \alpha L_{content}(\vec{p}, \vec{x }) + \beta L_{style}(\vec{a}, \vec{x}),$

where $\alpha$ and $\beta$ is the weight of content loss and style loss.

3. Algorithm implementation

import torch
import torchvision.transforms as transforms
from PIL import Image

def load_image(image_path, max_size=None, shape=None):
    image = Image.open(image_path)
    if max_size:
        scale = max_size / max(image. size)
        size = tuple([int(dim * scale) for dim in image. size])
        image = image.resize(size, Image.ANTIALIAS)

    if shape:
        image = image.resize(shape, Image.LANCZOS)

    transform = transforms. Compose([
        transforms. ToTensor()
    ])

    image = transform(image)[:3, :, :].unsqueeze(0)
    return image

def deprocess(tensor):
    transform = transforms. Compose([
        transforms.Normalize((-0.485 / 0.229, -0.456 / 0.224, -0.406 / 0.225),
                             (1 / 0.229, 1 / 0.224, 1 / 0.225)),
        transforms. ToPILImage()
    ])
    if tensor.dim() == 4:
        # If we have a batch of images
        output = []
        for image in tensor:
            image = image.clone().detach().cpu()
            image = image. squeeze(0)
            image = transform(image)
            output.append(image)
        return output[0]
    elif tensor.dim() == 3:
        # If we have a single image
        tensor = tensor.clone().detach().cpu()
        tensor = tensor. squeeze(0)
        tensor = transform(tensor)
        return tensor
    else:
        raise ValueError("Expected input tensor to be 3D or 4D")
    return transform(tensor)

import torch.nn as nn
import torchvision.models as models

class StyleTransferModel(nn.Module):
    def __init__(self, content_layers, style_layers):
        super(StyleTransferModel, self).__init__()
        self.vgg = models.vgg19(pretrained=True).features
        self. content_layers = content_layers
        self.style_layers = style_layers

    def forward(self, x):
        content_features = []
        style_features = []
        #print(list(self. vgg. named_children()))
        for name, layer in self.vgg.named_children():
            x = layer(x)
            if name in self. content_layers:
                content_features.append(x)
            if name in self. style_layers:
                style_features.append(x)

        return content_features, style_features

def gram_matrix(tensor):
    _, c, h, w = tensor. size()
    tensor = tensor. view(c, h * w)
    gram = torch.mm(tensor, tensor.t())
    return gram

import torch.optim as optim

def style_transfer(content_image_path, style_image_path, output_image_path, max_size=400, content_weight=1, style_weight=1e6, iterations=600):
    content_image = load_image(content_image_path, max_size=max_size)
    style_image = load_image(style_image_path, shape=content_image. shape[-2:])
    output_image = content_image.clone().requires_grad_(True)

    model = StyleTransferModel(content_layers=['10'], style_layers=['0','2','5','7','12'])
    #model.to(device)

    content_features = model(content_image)[0]
    style_features = model(style_image)[1]
    style_grams = [gram_matrix(feature) for feature in style_features]

    optimizer = optim.Adam([output_image], lr=0.01)
    for i in range(iterations):
        output_features = model(output_image)
        content_output_features = output_features[0]
        style_output_features = output_features[1]

        content_loss = 0.0
        style_loss = 0.0

        for target_feature, output_feature in zip(content_features, content_output_features):
            content_loss += torch.mean((output_feature - target_feature) ** 2)

        for target_gram, output_feature in zip(style_grams, style_output_features):
            output_gram = gram_matrix(output_feature)
            style_loss + = torch.mean((output_gram - target_gram) ** 2) / (output_gram.numel() ** 2)

        total_loss = content_weight * content_loss + style_weight * style_loss

        optimizer. zero_grad()
        total_loss.backward(retain_graph=True)
        optimizer. step()

        if (i + 1) % 5 == 0:
            print(f"Iteration {i + 1}/{iterations}: Loss = {total_loss.item()}")

    output_image = deprocess(output_image)
    print(output_image)
    output_image.save(output_image_path)

content_image_path = "123.png"
style_image_path = "style.png"
output_image_path = "out.png"

style_transfer(content_image_path, style_image_path, output_image_path)

We only need to input the picture 123.png to be migrated, the style of the picture style.png, and the picture can be generated

4. Summary

This paper introduces in detail the principle and implementation method of Picasso-style image transfer based on CNN network, and implements a simple and effective algorithm using the PyTorch framework. Experimental results show that the method can successfully apply the Picasso style to arbitrary images and generate high-quality artwork.

Past works:

Deep Learning Practical Project

1. Deep learning practice 1-(keras framework) enterprise data analysis and prediction

2. Deep learning practice 2-(keras framework) enterprise credit rating and prediction

3. Deep Learning Practice 3-Text Convolutional Neural Network (TextCNN) News Text Classification

4. Deep Learning Combat 4-Convolutional Neural Network (DenseNet) Mathematical Graphics Recognition + Topic Pattern Recognition

5. Deep Learning Practice 5-Convolutional Neural Network (CNN) Chinese OCR Recognition Project

6. Deep learning practice 6-convolutional neural network (Pytorch) + cluster analysis to realize air quality and weather prediction

7. Deep learning practice 7-Sentiment analysis of e-commerce product reviews

8. Deep Learning Combat 8-Life Photo Transformation Comic Photo Application

9. Deep learning practice 9-text generation image-local computer realizes text2img

10. Deep learning practice 10-mathematical formula recognition-converting pictures to Latex (img2Latex)

11. Deep Learning Practice 11 (Advanced Edition) – Fine-tuning Application of BERT Model – Text Classification Case

12. Deep Learning Practice 12 (Advanced Edition) – Using Dewarp to Correct Text Distortion

13. Deep learning practice 13 (advanced version) – text error correction function, good luck for friends who often write typos

14. Deep learning practice 14 (advanced version) – handwritten text OCR recognition, handwritten notes can also be recognized

15. Deep Learning Combat 15 (Advanced Edition)-Let the machine do reading comprehension + you can become a question maker and ask questions

16. Deep learning practice 16 (advanced version) – virtual screenshot recognition text – can do paper contract and form recognition

17. Deep Learning Practice 17 (Advanced Edition) – Construction and Development Case of Intelligent Assistant Editing Platform System

18. Deep Learning Combat 18 (Advanced Edition) – 15 tasks of NLP fusion system, which can realize the NLP tasks you can think of on the market

19. Deep Learning Combat 19 (Advanced Edition) – SpeakGPT’s local implementation deployment test, based on ChatGPT to implement SpeakGPT function on your own platform

20. Deep Learning Combat 20 (Advanced Edition) – File Intelligent Search System, which can search for keywords based on file content and quickly find files

21. Deep Learning Practice 21 (Advanced Edition)-AI Entity Encyclopedia Search, an encyclopedia that can be searched for any noun

22. Deep learning practice 22 (advanced version)-AI comic video generation model, make your own comic video

23. Deep Learning Combat 23 (Advanced Edition) – Semantic Segmentation Combat, to achieve the effect of character image matting (computer vision)

24. Deep Learning Combat 24- Artificial intelligence (Pytorch) builds a transformer model, really runs through the transformer model, and deeply understands the structure of the transformer

25. Deep learning practice 25-artificial intelligence (Pytorch) builds T5 model, really runs through the T5 model, and uses the T5 model to generate digital addition and subtraction results

26. Deep Learning Combat 26-(Pytorch) Building TextCNN to realize the task of multi-label text classification

27. Deep learning practice 27-Pytorch framework + BERT to realize the relationship extraction of Chinese text

28. Deep learning practice 28-AIGC project: use ChatGPT to generate customized PPT files

29. Deep Learning Combat 29-AIGC Project: Use GPT-2 (CPU environment) for text continuation and lyrics generation tasks

(pending upgrade)