Pytorch Deep Learning——Use of Transforms in Torchvision (ToTensor, Normalize, Resize, Compose, RandomCrop)

1. The role of transforms

Transforms are mainly used to transform images and are common image preprocessing methods.

Second, transforms structure

View through the structure under the transforms.py in pycharm

Three, the use of transforms

First import the module where transforms is located

from torchvision import transforms

Secondly, from the screenshot above, we can know that transforms is a .py file, which contains many classes and methods, so it must be called in the form of . if you want to use it first. If you want to call a class, you need to instantiate it. After instantiation, it is An object, the object can use the methods under this class.

1. ToTensor class

class ToTensor:
    """Convert a PIL Image or ndarray to tensor and scale the values accordingly.

Convert PIL Image or ndarray type to tensor type

from PIL import Image
from torchvision import transforms

img_path = "hymenoptera_data/train/ants_image/20935278_9190345f6b.jpg"
# open an image
img = Image.open(img_path)
# Use transforms to transform the image
# Instantiate the totensor object
to_tens = transforms.ToTensor()
# Convert pil to a Tensor type image
tens_img = to_tens(img) # automatically call the call function
print(tens_img)

Print result:

tensor([[[0.6784, 0.6863, 0.6863, ..., 0.2431, 0.2471, 0.2549],
         [0.6824, 0.6824, 0.6824, ..., 0.2078, 0.2118, 0.2157],
         [0.6745, 0.6745, 0.6784, ..., 0.1882, 0.1882, 0.1843],
         ...,
         [0.5451, 0.5373, 0.5294, ..., 0.1216, 0.1216, 0.1216],
         [0.5412, 0.5333, 0.5294, ..., 0.1294, 0.1294, 0.1333],
         [0.5333, 0.5294, 0.5294, ..., 0.1137, 0.1216, 0.1255]],

        [[0.0588, 0.0588, 0.0588, ..., 0.5176, 0.5216, 0.5294],
         [0.0549, 0.0549, 0.0588, ..., 0.4863, 0.4902, 0.4941],
         [0.0510, 0.0510, 0.0549, ..., 0.4588, 0.4588, 0.4549],
         ...,
         [0.0353, 0.0353, 0.0314, ..., 0.2902, 0.2941, 0.3059],
         [0.0314, 0.0314, 0.0314, ..., 0.2902, 0.3020, 0.3098],
         [0.0196, 0.0235, 0.0314, ..., 0.3059, 0.3137, 0.3176]],

        [[0.5961, 0.5922, 0.5765, ..., 0.3216, 0.3255, 0.3333],
         [0.5804, 0.5725, 0.5647, ..., 0.2275, 0.2314, 0.2353],
         [0.5569, 0.5529, 0.5490, ..., 0.1333, 0.1333, 0.1294],
         ...,
         [0.3725, 0.3373, 0.3020, ..., 0.0824, 0.0863, 0.0941],
         [0.3765, 0.3333, 0.3020, ..., 0.0627, 0.0706, 0.0784],
         [0.3843, 0.3451, 0.3098, ..., 0.0863, 0.0941, 0.0980]]])

Process finished with exit code 0

What attributes does tensor type data contain?

Use the Image class to read the PIL Image type, read the ndarray type with opencv, and use opencv below.

import cv2
from torchvision import transforms

img_path = "hymenoptera_data/train/ants_image/20935278_9190345f6b.jpg"
# Open an image and make it of type ndarray
img = cv2.imread(img_path)
print(type(img)) # <class 'numpy.ndarray'>
# Use transforms to transform the image
# Instantiate the totensor object
to_tens = transforms.ToTensor()
# Convert the ndarray under numpy into a Tensor type image
tens_img = to_tens(img) # automatically call the call function
print(tens_img)

2. Use the tensorboard in the previous article to view

The method of opening tensorboad, enter the terminal tensorboard –logdir=log folder name

from PIL import Image
from torchvision import transforms
from torch.utils.tensorboard import SummaryWriter

img_path = "hymenoptera_data/train/ants_image/20935278_9190345f6b.jpg"
# open an image
img = Image.open(img_path)
# Use transforms to transform the image
# Instantiate the totensor object
to_tens = transforms.ToTensor()
# Convert pil to a Tensor type image
tens_img = to_tens(img) # automatically call the call function
print(tens_img)

# Use the tensorboard in the previous article to view
writer = SummaryWriter("transforms_logs")
writer.add_image("test_transforms",tens_img) # title, image type
writer. close()

Four, other common transforms

1. Normalize

class Normalize(torch.nn.Module):
    """Normalize a tensor image with mean and standard deviation.
    This transform does not support PIL Image.
    Given mean: ``(mean[1],...,mean[n])`` and std: ``(std[1],..,std[n])`` for ``n``
    channels, this transform will normalize each channel of the input
    ``torch.*Tensor`` i.e.,
    ``output[channel] = (input[channel] - mean[channel]) / std[channel]``

    .. note::
        This transform acts out of place, i.e., it does not mutate the input tensor.

    Args:
        mean (sequence): Sequence of means for each channel.
        std (sequence): Sequence of standard deviations for each channel.
        inplace(bool, optional): Bool to make this operation in-place.

    """

It can be seen that two parameters are required when instantiating this class. The first parameter mean represents the average value of all values on each channel, and the second parameter std represents the standard deviation of all values on each channel. And the channels of the picture can be viewed through print(img). The mode is several channels. The mode of the demo below is four channels of rgba, so the mean and standard deviation at this time should correspond to four columns, that is, these two parameters will be used Standardize each layer separately (make the data mean 0 and variance 1) and then output.
Normalized calculation formula: output[channel] = (input[channel] – mean[channel]) / std[channel]
That is, if mean and std are both 0.5, then substituting into the formula can get:
output[channel] = (input[channel] – 0.5) / 0.5
That is: output[channel] = 2*input[channel]-1.

import cv2
from torchvision import transforms
from torch.utils.tensorboard import SummaryWriter
from PIL import Image

writer = SummaryWriter("logs_2") # Define the folder where the log file is stored
img = Image.open("image/python_img.png") # open image
print(img) # <PIL.PngImagePlugin.PngImageFile image mode=RGBA size=1672x932 at 0x22BC33B51D0>
# 1. ToTensor
trans_tens = transforms.ToTensor()
img_tens = trans_tens(img) # Convert img image to tensor type
writer.add_image("ToTensor",img_tens)
# 2. Normalize normalization
# Instantiate the Normalize object
trans_norm = transforms.Normalize([0.5,0.5,0.5,0.5],[0.5,0.5,0.5,0.5]) # The mode of the picture is four channels RGBA, so four columns are required
# Normalize tensor type images
img_norm = trans_norm(img_tens) # Normalize Tensor type images
# use tensorboard
writer.add_image("norm",img_norm) # Visualize pictures, parameters: title, image type, steps
writer. close()

2. Resize

Resize the input image according to the input parameter value (h, w).

class Resize(torch.nn.Module):
    """Resize the input image to the given size.
    If the image is torch Tensor, it is expected
    to have [..., H, W] shape, where ... means an arbitrary number of leading dimensions

    .. warning::
        The output image might be different depending on its type: when downsampling, the interpolation of PIL images
        and tensors is slightly different, because PIL applies antialiasing. This may lead to significant differences
        in the performance of a network. Therefore, it is preferable to train and serve a model with the same input
        types. See also below the ``antialias`` parameter, which can help making the output of PIL images and tensors
        closer.

Depend on

 Args:
        size (sequence or int): Desired output size. If size is a sequence like
            (h, w), output size will be matched to this. If size is an int,
            The smaller edge of the image will be matched to this number.
            i.e., if height > width, then image will be rescaled to
            (size * height / width, size).

It can be seen that if the input parameter is a sequence, that is, two integers of length and width (h, w), the image will be resized according to the length and width.
If the input parameter is an integer x, the short side of the image will be scaled to x, and the aspect ratio will remain unchanged.

import cv2
from torchvision import transforms
from torch.utils.tensorboard import SummaryWriter
from PIL import Image

writer = SummaryWriter("logs_2") # Define the folder where the log file is stored
img = Image.open("image/python_img.png") # open image
print(img) # <PIL.PngImagePlugin.PngImageFile image mode=RGBA size=1672x932 at 0x22BC33B51D0>
# 1. ToTensor
trans_tens = transforms.ToTensor()
img_tens = trans_tens(img) # Convert img image to tensor type
writer.add_image("ToTensor",img_tens)
# 2. Normalize normalization
# Instantiate the Normalize object
trans_norm = transforms.Normalize([0.5,0.5,0.5,0.5],[0.5,0.5,0.5,0.5]) # The mode of the picture is four channels RGBA, so four columns are required
# Normalize tensor type images
img_norm = trans_norm(img_tens) # Normalize Tensor type images
# use tensorboard
writer.add_image("norm",img_norm) # Visualize pictures, parameters: title, image type, steps
# 3. Resize Resize the input image according to the input parameter value (h, w).
trans_resize = transforms. Resize((512,512))
# Reset the img size and return the pIL type
img_resize = trans_resize(img)
print(img_resize) # <PIL.Image.Image image mode=RGBA size=512x512 at 0x1ABF834B850>
# Convert PIL type to Tensor type
img_resize_tens = trans_tens(img_resize)
# Write to tensorboard
writer.add_image("resize",img_resize_tens,0)
writer. close()

3, compose

It can be understood as a combination of multiple Transforms operations. That is, multiple Transforms operations can be performed on the input image at one time
Note: The parameters in compose are the previous output as the subsequent input. For example, the output of the first parameter in compose is PIL type, and the subsequent input is also PIL type, so you can use compose directly, but if the first output is now tensor type, but the second required input is PIL, the type will not match, so an error will be reported.

from torch.utils.tensorboard import SummaryWriter
from torchvision import transforms
from PIL import Image

writer = SummaryWriter("logs_2") # Define the folder where the log file is stored
img = Image.open("image/python_img.png") # open image
# Convert img image to tensor type and return tensor type
trans_tens = transforms.ToTensor()
print(trans_tens)
# Reset the img size and return the pIL type
trans_resize = transforms. Resize(300)
print(trans_resize)
# compose Combine ToTensor and Resize
trans_compose = transforms. Compose([trans_tens,trans_resize])
# Visualize it in tensorboard
compose_resize = trans_compose(img)
writer.add_image("compose",compose_resize,1)
writer. close()

4, RandomCrop

Random cropping: Crop the image to a random size. If a number is specified, it will be cropped according to the size of the number.

class RandomCrop(torch.nn.Module):
    """Crop the given image at a random location.
    If the image is torch Tensor, it is expected
    to have [..., H, W] shape, where ... means an arbitrary number of leading dimensions,
    but if non-constant padding is used, the input is expected to have at most 2 leading dimensions

    Args:
        size (sequence or int): Desired output size of the crop. If size is an
            int instead of sequence like (h, w), a square crop (size, size) is
            made. If provided a sequence of length 1, it will be interpreted as (size[0], size[0]).
        padding (int or sequence, optional): Optional padding on each border
            of the image. Default is None. If a single int is provided this
            is used to pad all borders. If sequence of length 2 is provided this is the padding
            on left/right and top/bottom respectively. If a sequence of length 4 is provided
            This is the padding for the left, top, right and bottom borders respectively.

from torch.utils.tensorboard import SummaryWriter
from torchvision import transforms
from PIL import Image

writer = SummaryWriter("logs_2") # Define the folder where the log file is stored
img = Image.open("image/python_img.png") # open image
# Convert img image to tensor type and return tensor type
trans_tens = transforms.ToTensor()
print(trans_tens)

# RandomCrop random cropping
trans_rand = transforms.RandomCrop(512) # cropped to 512*512
# compose Combine ToTensor and Resize
trans_compose = transforms. Compose([trans_rand,trans_tens])
# Randomly crop 10 images and display them visually in tensorboard
for i in range(10):
    img_trans_rand = trans_compose(img)
    print(img_trans_rand)
    writer.add_image("rands",img_trans_rand,i)
writer. close()