Transposed ConvolutionTransposed Convolution

Article directory

foreword
1. Ordinary convolution operation
2. Transpose convolution operation
3. Transposed convolution parameters in Pytorch
4. Pytorch transposed convolution experiment

Note: This blog post is basically all from the little mung bean of the blogger Sunflower. I just moved it here for my own reference and management. I modified a few details myself.

Reference link:
https://blog.csdn.net/tsyccnh/article/details/87357447
https://blog.csdn.net/qq_37541097/article/details/120709865
A guide to convolution arithmetic for deep learning

Foreword

Transposed Convolution (Transposed Convolution) is more common in semantic segmentation or against neural networks (GAN), and its main function is to do upsampling (UpSampling). In some places, transposed convolution is also called fractionally-strided convolution or deconvolution, but deconvolution is misleading and is not recommended. Note for transposed convolutions:

Transposed convolution is not the inverse operation of convolution
Transposed convolution is also convolution

1. Common convolution operation

First review the ordinary convolution, the following figure takes stride=1, padding=0, kernel_size=3 as an example, assuming that the input feature map size is 4×4 (assuming that the input and output are both single-channel), after convolution The resulting feature map size is 2×2. Generally, in the case of convolution, either the feature map becomes smaller (stride > 1) or remains unchanged (stride = 1). Of course, it is also possible to make the feature map larger but meaningless through padding around it.

2. Transpose convolution operation

The transposed convolution just said that the main function is to play the role of upsampling. But Transposed convolution is not the inverse operation of convolution (general convolution operation is irreversible), it can only be restored to the original size (shape) and the value is different from the original. The operation steps of transposed convolution can be classified into the following steps:

Fill s-1 rows and columns 0 between input feature map elements (where s represents the step size of transposed convolution)
Fill k-p-1 rows and column 0 around the input feature map (where k represents the kernel_size of the transposed convolution, and p is the padding of the transposed convolution. Note that the padding here is somewhat different from the convolution operation)
Flip the convolution kernel parameters up and down, left and right (that is, rotate 180° clockwise)
Do normal convolution operation (fill 0, stride 1)

The following assumes that the input feature map size is 2×2 (assuming that the input and output are both single-channel), and a 4×4 feature map is obtained after transposed convolution. The size of the transposed convolution kernel used here is k=3, stride=1, padding=0 (ignoring the paranoid bias).

First fill s-1 row and column 0 between elements (let p_in=s-1)
Then fill the k-p-1 row and column 0 around the feature map (let p_out=k-p-1)
Then flip the convolution kernel parameters up and down, left and right (that is, rotate 180° clockwise)
Finally, do normal convolution (fill 0, step 1)
The following figure shows the situation of different s and p in transposed convolution:
Note: The s, p, and k in the figure below refer to the parameters of normal convolution, not the s, p, and k of transposed convolution


Convolution parameters: s=1, p=0, k=3 Transpose convolution parameters: p_in=0, p_out=2	Convolution parameters: s=2, p=0, k=3 Transpose convolution parameters: p_in=1, p_out=2	Convolution parameters: s=2, p=1, k=3 Transpose convolution parameters: p_in=1, p_out=1

The size of the feature map after the transposed convolution operation can be calculated by the following formula:

(

)

the s

[

]

[

]

the s

[

]

H_{out}=(H_{in}-1)×stride[0]-2×padding[0] + kernel_{size}[0]

Hout?=(Hin1)×stride[0]?2×padding[0] + kernelsize?[0]

(

)

the s

[

]

[

]

the s

[

]

W_{out}=(W_{in}-1)×stride[1]-2×padding[1] + kernel_{size}[1]

Wout?=(Win1)×stride[1]?2×padding[1] + kernelsize?[1]

the s

[

]

stride[0]

stride[0] indicates the stride in the height direction,

[

]

padding[0]

padding[0] indicates the padding in the height direction,

the s

[

]

kernel_size[0]

kernels?ize[0] indicates the kernel_size in the height direction, and index [1] indicates the width direction. From the above formula, it can be seen that the larger the padding, the smaller the height and width of the output feature matrix. It can be understood that the padding is performed during the forward convolution process and then the feature map is obtained. Now use the transposed convolution to restore the original height and width. Finally, the previous padding should be subtracted.

3. Transposed convolution parameters in Pytorch

Pytorch’s official documentation on transposed convolution ConvTranspose2d:
https://pytorch.org/docs/stable/generated/torch.nn.ConvTranspose2d.html
Official words:

Applies a 2D transposed convolution operator over an input image composed of several input planes. This module can be seen as the gradient of Conv2d with respect to its input. It is also known as a fractionally-strided convolution or a deconvolution (although it is not an actual deconvolution operation).

The official introduction to the parameters used in transposed convolution:

In the example above, in_channels, out_channels, kernel_size, stride, padding These parameters are included in the official method:

output_padding: Fill several rows or columns of 0 in the height and width directions of the calculated output feature map (note that this is only one side padding on the top, bottom, left and right sides, not both sides) , if you are interested in doing an experiment yourself), the default is 0 and not used.
groups: A parameter that is only used when group convolution is used. The default value is 1, which is normal convolution.
bias: Whether to use bias bias, the default is True.
dilation: A parameter that is only used when dilated convolution (expanded convolution) is used. The default value is 1, which is normal convolution.

Output feature map width and height calculation formula:

(

)

the s

[

]

[

]

[

]

(

the s

[

]

)

[

]

H_{out}=(H_{in}-1)×stride[0]-2×padding[0] + dilation[0]×(kernel\_size[0]-1) + output\_padding[0] + 1

Hout?=(Hin1)×stride[0]?2×padding[0] + dilation[0]×(kernel_size[0]?1) + output_padding[0] + 1

(

)

the s

[

]

[

]

[

]

(

the s

[

]

)

[

]

W_{out}=(W_{in}-1)×stride[1]-2×padding[1] + dilation[1]×(kernel\_size[1]-1) + output\_padding[1] + 1

Wout?=(Win1)×stride[1]?2×padding[1] + dilation[1]×(kernel_size[1]?1) + output_padding[1] + 1

4. Pytorch transposed convolution experiment

The following uses the Pytorch framework to simulate the transposed convolution operation of s=1, p=0, k=3:

In the code, the transposed_conv_official function uses the official transposed convolution for calculation, and the transposed_conv_self function fills the input feature map by itself according to the above steps and The result obtained by convolution.

import torch
import torch.nn as nn


def transposed_conv_official():
    feature_map = torch.as_tensor([[1, 0],
                                   [2, 1]], dtype=torch.float32).reshape([1, 1, 2, 2])
    print(feature_map)
    trans_conv = nn.ConvTranspose2d(in_channels=1, out_channels=1,
                                    kernel_size=3, stride=1, bias=False)
    trans_conv.load_state_dict({<!-- -->"weight": torch.as_tensor([[1, 0, 1],
                                                           [0, 1, 1],
                                                           [1, 0, 0]], dtype=torch.float32).reshape([1, 1, 3, 3])})
    print(trans_conv. weight)
    output = trans_conv(feature_map)
    print(output)


def transposed_conv_self():
    feature_map = torch.as_tensor([[0, 0, 0, 0, 0, 0],
                                   [0, 0, 0, 0, 0, 0],
                                   [0, 0, 1, 0, 0, 0],
                                   [0, 0, 2, 1, 0, 0],
                                   [0, 0, 0, 0, 0, 0],
                                   [0, 0, 0, 0, 0, 0]], dtype=torch.float32).reshape([1, 1, 6, 6])
    print(feature_map)
    conv = nn.Conv2d(in_channels=1, out_channels=1,
                     kernel_size=3, stride=1, bias=False)
    conv.load_state_dict({<!-- -->"weight": torch.as_tensor([[0, 0, 1],
                                                     [1, 1, 0],
                                                     [1, 0, 1]], dtype=torch.float32).reshape([1, 1, 3, 3])})
    print(conv. weight)
    output = conv(feature_map)
    print(output)


def main():
    transposed_conv_official()
    print("---------------")
    transposed_conv_self()


if __name__ == '__main__':
    main()

Terminal output:

tensor([[[[1., 0.],
          [twenty one.]]]])
Parameter containing:
tensor([[[[1., 0., 1.],
          [0., 1., 1.],
          [1., 0., 0.]]]], requires_grad=True)
tensor([[[[1., 0., 1., 0.],
          [2., 2., 3., 1.],
          [1., 2., 3., 1.],
          [2., 1., 0., 0.]]]], grad_fn=<SlowConvTranspose2DBackward>)
---------------
tensor([[[[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 1., 0., 0., 0.],
          [0., 0., 2., 1., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]]]])
Parameter containing:
tensor([[[[0., 0., 1.],
          [1., 1., 0.],
          [1., 0., 1.]]]], requires_grad=True)
tensor([[[[1., 0., 1., 0.],
          [2., 2., 3., 1.],
          [1., 2., 3., 1.],
          [2., 1., 0., 0.]]]], grad_fn=<ThnnConv2DBackward>)

Through comparison, it can be found that the result of the official transposed convolution is the same as the result of our own transposed convolution. For other cases, you can do your own experiments.