Preprocessing of YOLOv5 classification model (2) ToTensor and Normalize
flyfish
1. The initial data is a floating point number
import torch import numpy as np from torchvision import transforms mean = (0.485, 0.456, 0.406) std = (0.229, 0.224, 0.225) data0 = np.random.random((4, 5, 3)) # H x W x C data0 = np.round(data0,4) print(data0.shape) print(data0) data1 = transforms.ToTensor()(data0) print(data1.shape) # C x H x W print(data1) data2 = transforms.Normalize(mean, std)(data1) print(data2)
ToTensor
means that the data dimension changes from H x W x C
to C x H x W
. If the value is a floating point number, the value does not change.
Normalize
is (data - mean) / std
Use numpy to implement verification
data1 = np.transpose(data0, (2, 0, 1)) print(data1.shape) _std = np.array(std).reshape((3, 1, 1)) _mean = np.array(mean).reshape((3, 1, 1)) data2 = (data1 - _mean) / _std print(data2)
The shape and content of the original data can be the height, width, channels of the image
(4, 5, 3)
[[[0.8284 0.3419 0.6621] [0.59 0.2306 0.4112] [0.0636 0.406 0.2778] [0.9551 0.2097 0.7681] [0.3097 0.642 0.1968]] [[0.722 0.9844 0.4942] [0.1847 0.2435 0.3691] [0.658 0.5643 0.9468] [0.4002 0.7807 0.4393] [0.2461 0.9049 0.0585]] [[0.2606 0.067 0.6186] [0.284 0.8524 0.2102] [0.0447 0.0209 0.1313] [0.0587 0.594 0.1016] [0.6942 0.4514 0.7125]] [[0.8787 0.7917 0.1181] [0.9044 0.7948 0.3599] [0.1706 0.7463 0.899] [0.0758 0.2224 0.5447] [0.3336 0.6096 0.3065]]]
Shape and content after ToTensor
torch.Size([3, 4, 5])
tensor([[[0.8284, 0.5900, 0.0636, 0.9551, 0.3097], [0.7220, 0.1847, 0.6580, 0.4002, 0.2461], [0.2606, 0.2840, 0.0447, 0.0587, 0.6942], [0.8787, 0.9044, 0.1706, 0.0758, 0.3336]], [[0.3419, 0.2306, 0.4060, 0.2097, 0.6420], [0.9844, 0.2435, 0.5643, 0.7807, 0.9049], [0.0670, 0.8524, 0.0209, 0.5940, 0.4514], [0.7917, 0.7948, 0.7463, 0.2224, 0.6096]], [[0.6621, 0.4112, 0.2778, 0.7681, 0.1968], [0.4942, 0.3691, 0.9468, 0.4393, 0.0585], [0.6186, 0.2102, 0.1313, 0.1016, 0.7125], [0.1181, 0.3599, 0.8990, 0.5447, 0.3065]]], dtype=torch.float64)
Shape and content after Normalize
tensor([[[ 1.4996, 0.4585, -1.8402, 2.0528, -0.7655], [1.0349, -1.3114, 0.7555, -0.3703, -1.0432], [-0.9799, -0.8777, -1.9227, -1.8616, 0.9135], [1.7192, 1.8314, -1.3729, -1.7869, -0.6611]], [[-0.5094, -1.0063, -0.2232, -1.0996, 0.8304], [2.3589, -0.9487, 0.4835, 1.4496, 2.0040], [-1.7366, 1.7696, -1.9424, 0.6161, -0.0205], [1.4987, 1.5125, 1.2960, -1.0429, 0.6857]], [[ 1.1382, 0.0231, -0.5698, 1.6093, -0.9298], [0.3920, -0.1640, 2.4036, 0.1480, -1.5444], [0.9449, -0.8702, -1.2209, -1.3529, 1.3622], [-1.2796, -0.2049, 2.1911, 0.6164, -0.4422]]], dtype=torch.float64)
Use numpy to achieve verification results
(3, 4, 5) [[[ 1.49956332 0.45851528 -1.84017467 2.05283843 -0.76550218] [1.0349345 -1.31135371 0.75545852 -0.37030568 -1.04323144] [-0.97991266 -0.87772926 -1.92270742 -1.86157205 0.91353712] [1.71921397 1.83144105 -1.37292576 -1.78689956 -0.66113537]] [[-0.509375 -1.00625 -0.22321429 -1.09955357 0.83035714] [2.35892857 -0.94866071 0.48348214 1.44955357 2.00401786] [-1.73660714 1.76964286 -1.94241071 0.61607143 -0.02053571] [1.49866071 1.5125 1.29598214 -1.04285714 0.68571429]] [[ 1.13822222 0.02311111 -0.56977778 1.60933333 -0.92977778] [0.392 -0.164 2.40355556 0.148 -1.54444444] [0.94488889 -0.87022222 -1.22088889 -1.35288889 1.36222222] [-1.27955556 -0.20488889 2.19111111 0.61644444 -0.44222222]]]
The two are the same except for the number of decimal places retained.
2. The initial data is image data
The initial data in the above example is a floating point number, while the real data is an image
That is, numpy.ndarray (H x W x C), the range is [0, 255] Next, start processing the image data
import torch import numpy as np from torchvision import transforms mean = (0.485, 0.456, 0.406) std = (0.229, 0.224, 0.225) data0 =np.random.randint(0,255,size = [4,5,3],dtype = np.uint8) print(data0.shape) print(data0) data1 = transforms.ToTensor()(data0) print(data1.shape) # C x H x W print(data1) data2 = transforms.Normalize(mean, std)(data1) print(data2) # test data1 = data0 / 255 data1 = np.transpose(data1, (2, 0, 1)) print(data1.shape) _std = np.array(std).reshape((3, 1, 1)) _mean = np.array(mean).reshape((3, 1, 1)) data2 = (data1 - _mean) / _std print(data2)
result
(4, 5, 3) [[[234 149 252] [210 229 46] [146 237 43] [42 103 219] [221 104 73]] [[ 6 18 197] [81 67 235] [70 170 110] [242 38 157] [201 204 98]] [[ 28 130 183] [82 83 234] [138 80 97] [14 119 183] [100 158 13]] [[122 17 245] [40 62 203] [250 165 40] [219 131 107] [126 214 139]]] torch.Size([3, 4, 5]) tensor([[[0.9176, 0.8235, 0.5725, 0.1647, 0.8667], [0.0235, 0.3176, 0.2745, 0.9490, 0.7882], [0.1098, 0.3216, 0.5412, 0.0549, 0.3922], [0.4784, 0.1569, 0.9804, 0.8588, 0.4941]], [[0.5843, 0.8980, 0.9294, 0.4039, 0.4078], [0.0706, 0.2627, 0.6667, 0.1490, 0.8000], [0.5098, 0.3255, 0.3137, 0.4667, 0.6196], [0.0667, 0.2431, 0.6471, 0.5137, 0.8392]], [[0.9882, 0.1804, 0.1686, 0.8588, 0.2863], [0.7725, 0.9216, 0.4314, 0.6157, 0.3843], [0.7176, 0.9176, 0.3804, 0.7176, 0.0510], [0.9608, 0.7961, 0.1569, 0.4196, 0.5451]]]) tensor([[[ 1.8893, 1.4783, 0.3823, -1.3987, 1.6667], [-2.0152, -0.7308, -0.9192, 2.0263, 1.3242], [-1.6384, -0.7137, 0.2453, -1.8782, -0.4054], [-0.0287, -1.4329, 2.1633, 1.6324, 0.0398]], [[ 0.5728, 1.9734, 2.1134, -0.2325, -0.2150], [-1.7206, -0.8627, 0.9405, -1.3704, 1.5357], [0.2402, -0.5826, -0.6352, 0.0476, 0.7304], [-1.7381, -0.9503, 0.8529, 0.2577, 1.7108]], [[ 2.5877, -1.0027, -1.0550, 2.0125, -0.5321], [1.6291, 2.2914, 0.1128, 0.9319, -0.0964], [1.3851, 2.2740, -0.1138, 1.3851, -1.5779], [2.4657, 1.7337, -1.1073, 0.0605, 0.6182]]]) (3, 4, 5) [[[ 1.88928847 1.47829437 0.38231013 -1.39866427 1.66666667] [-2.01515541 -0.73079887 -0.91917116 2.0262865 1.32417159] [-1.63841082 -0.71367412 0.2453121 -1.87815738 -0.40542855] [-0.02868396 -1.43291378 2.16328453 1.63241716 0.03981505]] [[ 0.57282913 1.97338936 2.11344538 -0.232493 -0.21498599] [-1.72058824 -0.8627451 0.94047619 -1.37044818 1.53571429] [0.24019608 -0.58263305 -0.63515406 0.04761905 0.73039216] [-1.73809524 -0.95028011 0.85294118 0.25770308 1.71078431]] [[ 2.58771242 -1.00270153 -1.05498911 2.01254902 -0.53211329] [1.62910675 2.29141612 0.11276688 0.931939 -0.09638344] [1.38509804 2.27398693 -0.11381264 1.38509804 -1.57786492] [2.46570806 1.73368192 -1.10727669 0.0604793 0.61821351]]]
Same result
explain
The random.randint() method returns an integer within the specified range.
randint(start, stop) is equivalent to randrange(start, stop + 1).
When the image data is numpy.ndarray (H x W x C)
, the range is [0, 255] unsigned integer
ToTensor
means that the data dimension changes from H x W x C
to C x H x W
, and the data is scaled to [0.0, 1.0]
Therefore, during verification, you must first divide by 255
and scale the data to [0.0, 1.0]