1. The role of transforms
Transforms are mainly used to transform images and are common image preprocessing methods.
Second, transforms structure
View through the structure under the transforms.py in pycharm
Three, the use of transforms
First import the module where transforms is located
from torchvision import transforms
Secondly, from the screenshot above, we can know that transforms is a .py file, which contains many classes and methods, so it must be called in the form of . if you want to use it first. If you want to call a class, you need to instantiate it. After instantiation, it is An object, the object can use the methods under this class.
1. ToTensor class
class ToTensor: """Convert a PIL Image or ndarray to tensor and scale the values accordingly.
Convert PIL Image or ndarray type to tensor type
from PIL import Image from torchvision import transforms img_path = "hymenoptera_data/train/ants_image/20935278_9190345f6b.jpg" # open an image img = Image.open(img_path) # Use transforms to transform the image # Instantiate the totensor object to_tens = transforms.ToTensor() # Convert pil to a Tensor type image tens_img = to_tens(img) # automatically call the call function print(tens_img)
Print result:
tensor([[[0.6784, 0.6863, 0.6863, ..., 0.2431, 0.2471, 0.2549], [0.6824, 0.6824, 0.6824, ..., 0.2078, 0.2118, 0.2157], [0.6745, 0.6745, 0.6784, ..., 0.1882, 0.1882, 0.1843], ..., [0.5451, 0.5373, 0.5294, ..., 0.1216, 0.1216, 0.1216], [0.5412, 0.5333, 0.5294, ..., 0.1294, 0.1294, 0.1333], [0.5333, 0.5294, 0.5294, ..., 0.1137, 0.1216, 0.1255]], [[0.0588, 0.0588, 0.0588, ..., 0.5176, 0.5216, 0.5294], [0.0549, 0.0549, 0.0588, ..., 0.4863, 0.4902, 0.4941], [0.0510, 0.0510, 0.0549, ..., 0.4588, 0.4588, 0.4549], ..., [0.0353, 0.0353, 0.0314, ..., 0.2902, 0.2941, 0.3059], [0.0314, 0.0314, 0.0314, ..., 0.2902, 0.3020, 0.3098], [0.0196, 0.0235, 0.0314, ..., 0.3059, 0.3137, 0.3176]], [[0.5961, 0.5922, 0.5765, ..., 0.3216, 0.3255, 0.3333], [0.5804, 0.5725, 0.5647, ..., 0.2275, 0.2314, 0.2353], [0.5569, 0.5529, 0.5490, ..., 0.1333, 0.1333, 0.1294], ..., [0.3725, 0.3373, 0.3020, ..., 0.0824, 0.0863, 0.0941], [0.3765, 0.3333, 0.3020, ..., 0.0627, 0.0706, 0.0784], [0.3843, 0.3451, 0.3098, ..., 0.0863, 0.0941, 0.0980]]]) Process finished with exit code 0
What attributes does tensor type data contain?
Use the Image class to read the PIL Image type, read the ndarray type with opencv, and use opencv below.
import cv2 from torchvision import transforms img_path = "hymenoptera_data/train/ants_image/20935278_9190345f6b.jpg" # Open an image and make it of type ndarray img = cv2.imread(img_path) print(type(img)) # <class 'numpy.ndarray'> # Use transforms to transform the image # Instantiate the totensor object to_tens = transforms.ToTensor() # Convert the ndarray under numpy into a Tensor type image tens_img = to_tens(img) # automatically call the call function print(tens_img)
2. Use the tensorboard in the previous article to view
The method of opening tensorboad, enter the terminal tensorboard –logdir=log folder name
from PIL import Image from torchvision import transforms from torch.utils.tensorboard import SummaryWriter img_path = "hymenoptera_data/train/ants_image/20935278_9190345f6b.jpg" # open an image img = Image.open(img_path) # Use transforms to transform the image # Instantiate the totensor object to_tens = transforms.ToTensor() # Convert pil to a Tensor type image tens_img = to_tens(img) # automatically call the call function print(tens_img) # Use the tensorboard in the previous article to view writer = SummaryWriter("transforms_logs") writer.add_image("test_transforms",tens_img) # title, image type writer. close()
Four, other common transforms
1. Normalize
class Normalize(torch.nn.Module): """Normalize a tensor image with mean and standard deviation. This transform does not support PIL Image. Given mean: ``(mean[1],...,mean[n])`` and std: ``(std[1],..,std[n])`` for ``n`` channels, this transform will normalize each channel of the input ``torch.*Tensor`` i.e., ``output[channel] = (input[channel] - mean[channel]) / std[channel]`` .. note:: This transform acts out of place, i.e., it does not mutate the input tensor. Args: mean (sequence): Sequence of means for each channel. std (sequence): Sequence of standard deviations for each channel. inplace(bool, optional): Bool to make this operation in-place. """
It can be seen that two parameters are required when instantiating this class. The first parameter mean represents the average value of all values on each channel, and the second parameter std represents the standard deviation of all values on each channel. And the channels of the picture can be viewed through print(img). The mode is several channels. The mode of the demo below is four channels of rgba, so the mean and standard deviation at this time should correspond to four columns, that is, these two parameters will be used Standardize each layer separately (make the data mean 0 and variance 1) and then output.
Normalized calculation formula: output[channel] = (input[channel] – mean[channel]) / std[channel]
That is, if mean and std are both 0.5, then substituting into the formula can get:
output[channel] = (input[channel] – 0.5) / 0.5
That is: output[channel] = 2*input[channel]-1.
import cv2 from torchvision import transforms from torch.utils.tensorboard import SummaryWriter from PIL import Image writer = SummaryWriter("logs_2") # Define the folder where the log file is stored img = Image.open("image/python_img.png") # open image print(img) # <PIL.PngImagePlugin.PngImageFile image mode=RGBA size=1672x932 at 0x22BC33B51D0> # 1. ToTensor trans_tens = transforms.ToTensor() img_tens = trans_tens(img) # Convert img image to tensor type writer.add_image("ToTensor",img_tens) # 2. Normalize normalization # Instantiate the Normalize object trans_norm = transforms.Normalize([0.5,0.5,0.5,0.5],[0.5,0.5,0.5,0.5]) # The mode of the picture is four channels RGBA, so four columns are required # Normalize tensor type images img_norm = trans_norm(img_tens) # Normalize Tensor type images # use tensorboard writer.add_image("norm",img_norm) # Visualize pictures, parameters: title, image type, steps writer. close()
2. Resize
Resize the input image according to the input parameter value (h, w).
class Resize(torch.nn.Module): """Resize the input image to the given size. If the image is torch Tensor, it is expected to have [..., H, W] shape, where ... means an arbitrary number of leading dimensions .. warning:: The output image might be different depending on its type: when downsampling, the interpolation of PIL images and tensors is slightly different, because PIL applies antialiasing. This may lead to significant differences in the performance of a network. Therefore, it is preferable to train and serve a model with the same input types. See also below the ``antialias`` parameter, which can help making the output of PIL images and tensors closer.
Depend on
Args: size (sequence or int): Desired output size. If size is a sequence like (h, w), output size will be matched to this. If size is an int, The smaller edge of the image will be matched to this number. i.e., if height > width, then image will be rescaled to (size * height / width, size).
It can be seen that if the input parameter is a sequence, that is, two integers of length and width (h, w), the image will be resized according to the length and width.
If the input parameter is an integer x, the short side of the image will be scaled to x, and the aspect ratio will remain unchanged.
import cv2 from torchvision import transforms from torch.utils.tensorboard import SummaryWriter from PIL import Image writer = SummaryWriter("logs_2") # Define the folder where the log file is stored img = Image.open("image/python_img.png") # open image print(img) # <PIL.PngImagePlugin.PngImageFile image mode=RGBA size=1672x932 at 0x22BC33B51D0> # 1. ToTensor trans_tens = transforms.ToTensor() img_tens = trans_tens(img) # Convert img image to tensor type writer.add_image("ToTensor",img_tens) # 2. Normalize normalization # Instantiate the Normalize object trans_norm = transforms.Normalize([0.5,0.5,0.5,0.5],[0.5,0.5,0.5,0.5]) # The mode of the picture is four channels RGBA, so four columns are required # Normalize tensor type images img_norm = trans_norm(img_tens) # Normalize Tensor type images # use tensorboard writer.add_image("norm",img_norm) # Visualize pictures, parameters: title, image type, steps # 3. Resize Resize the input image according to the input parameter value (h, w). trans_resize = transforms. Resize((512,512)) # Reset the img size and return the pIL type img_resize = trans_resize(img) print(img_resize) # <PIL.Image.Image image mode=RGBA size=512x512 at 0x1ABF834B850> # Convert PIL type to Tensor type img_resize_tens = trans_tens(img_resize) # Write to tensorboard writer.add_image("resize",img_resize_tens,0) writer. close()
3, compose
It can be understood as a combination of multiple Transforms operations. That is, multiple Transforms operations can be performed on the input image at one time
Note: The parameters in compose are the previous output as the subsequent input. For example, the output of the first parameter in compose is PIL type, and the subsequent input is also PIL type, so you can use compose directly, but if the first output is now tensor type, but the second required input is PIL, the type will not match, so an error will be reported.
from torch.utils.tensorboard import SummaryWriter from torchvision import transforms from PIL import Image writer = SummaryWriter("logs_2") # Define the folder where the log file is stored img = Image.open("image/python_img.png") # open image # Convert img image to tensor type and return tensor type trans_tens = transforms.ToTensor() print(trans_tens) # Reset the img size and return the pIL type trans_resize = transforms. Resize(300) print(trans_resize) # compose Combine ToTensor and Resize trans_compose = transforms. Compose([trans_tens,trans_resize]) # Visualize it in tensorboard compose_resize = trans_compose(img) writer.add_image("compose",compose_resize,1) writer. close()
4, RandomCrop
Random cropping: Crop the image to a random size. If a number is specified, it will be cropped according to the size of the number.
class RandomCrop(torch.nn.Module): """Crop the given image at a random location. If the image is torch Tensor, it is expected to have [..., H, W] shape, where ... means an arbitrary number of leading dimensions, but if non-constant padding is used, the input is expected to have at most 2 leading dimensions Args: size (sequence or int): Desired output size of the crop. If size is an int instead of sequence like (h, w), a square crop (size, size) is made. If provided a sequence of length 1, it will be interpreted as (size[0], size[0]). padding (int or sequence, optional): Optional padding on each border of the image. Default is None. If a single int is provided this is used to pad all borders. If sequence of length 2 is provided this is the padding on left/right and top/bottom respectively. If a sequence of length 4 is provided This is the padding for the left, top, right and bottom borders respectively.
from torch.utils.tensorboard import SummaryWriter from torchvision import transforms from PIL import Image writer = SummaryWriter("logs_2") # Define the folder where the log file is stored img = Image.open("image/python_img.png") # open image # Convert img image to tensor type and return tensor type trans_tens = transforms.ToTensor() print(trans_tens) # RandomCrop random cropping trans_rand = transforms.RandomCrop(512) # cropped to 512*512 # compose Combine ToTensor and Resize trans_compose = transforms. Compose([trans_rand,trans_tens]) # Randomly crop 10 images and display them visually in tensorboard for i in range(10): img_trans_rand = trans_compose(img) print(img_trans_rand) writer.add_image("rands",img_trans_rand,i) writer. close()