Preprocessing of YOLOv5 classification model (2) ToTensor and Normalize

Preprocessing of YOLOv5 classification model (2) ToTensor and Normalize

flyfish

1. The initial data is a floating point number

import torch
import numpy as np
from torchvision import transforms

mean = (0.485, 0.456, 0.406)
std = (0.229, 0.224, 0.225)



data0 = np.random.random((4, 5, 3)) # H x W x C
data0 = np.round(data0,4)
print(data0.shape)
print(data0)

data1 = transforms.ToTensor()(data0)
print(data1.shape) # C x H x W
print(data1)
data2 = transforms.Normalize(mean, std)(data1)
print(data2)

ToTensor means that the data dimension changes from H x W x C to C x H x W. If the value is a floating point number, the value does not change.
Normalize is (data - mean) / std

Use numpy to implement verification

data1 = np.transpose(data0, (2, 0, 1))
print(data1.shape)
_std = np.array(std).reshape((3, 1, 1))
_mean = np.array(mean).reshape((3, 1, 1))

data2 = (data1 - _mean) / _std

print(data2)

The shape and content of the original data can be the height, width, channels of the image
(4, 5, 3)

[[[0.8284 0.3419 0.6621]
  [0.59 0.2306 0.4112]
  [0.0636 0.406 0.2778]
  [0.9551 0.2097 0.7681]
  [0.3097 0.642 0.1968]]

 [[0.722 0.9844 0.4942]
  [0.1847 0.2435 0.3691]
  [0.658 0.5643 0.9468]
  [0.4002 0.7807 0.4393]
  [0.2461 0.9049 0.0585]]

 [[0.2606 0.067 0.6186]
  [0.284 0.8524 0.2102]
  [0.0447 0.0209 0.1313]
  [0.0587 0.594 0.1016]
  [0.6942 0.4514 0.7125]]

 [[0.8787 0.7917 0.1181]
  [0.9044 0.7948 0.3599]
  [0.1706 0.7463 0.899]
  [0.0758 0.2224 0.5447]
  [0.3336 0.6096 0.3065]]]

Shape and content after ToTensor

torch.Size([3, 4, 5])

tensor([[[0.8284, 0.5900, 0.0636, 0.9551, 0.3097],
         [0.7220, 0.1847, 0.6580, 0.4002, 0.2461],
         [0.2606, 0.2840, 0.0447, 0.0587, 0.6942],
         [0.8787, 0.9044, 0.1706, 0.0758, 0.3336]],

        [[0.3419, 0.2306, 0.4060, 0.2097, 0.6420],
         [0.9844, 0.2435, 0.5643, 0.7807, 0.9049],
         [0.0670, 0.8524, 0.0209, 0.5940, 0.4514],
         [0.7917, 0.7948, 0.7463, 0.2224, 0.6096]],

        [[0.6621, 0.4112, 0.2778, 0.7681, 0.1968],
         [0.4942, 0.3691, 0.9468, 0.4393, 0.0585],
         [0.6186, 0.2102, 0.1313, 0.1016, 0.7125],
         [0.1181, 0.3599, 0.8990, 0.5447, 0.3065]]], dtype=torch.float64)

Shape and content after Normalize

tensor([[[ 1.4996, 0.4585, -1.8402, 2.0528, -0.7655],
         [1.0349, -1.3114, 0.7555, -0.3703, -1.0432],
         [-0.9799, -0.8777, -1.9227, -1.8616, 0.9135],
         [1.7192, 1.8314, -1.3729, -1.7869, -0.6611]],

        [[-0.5094, -1.0063, -0.2232, -1.0996, 0.8304],
         [2.3589, -0.9487, 0.4835, 1.4496, 2.0040],
         [-1.7366, 1.7696, -1.9424, 0.6161, -0.0205],
         [1.4987, 1.5125, 1.2960, -1.0429, 0.6857]],

        [[ 1.1382, 0.0231, -0.5698, 1.6093, -0.9298],
         [0.3920, -0.1640, 2.4036, 0.1480, -1.5444],
         [0.9449, -0.8702, -1.2209, -1.3529, 1.3622],
         [-1.2796, -0.2049, 2.1911, 0.6164, -0.4422]]], dtype=torch.float64)

Use numpy to achieve verification results

(3, 4, 5)
[[[ 1.49956332 0.45851528 -1.84017467 2.05283843 -0.76550218]
  [1.0349345 -1.31135371 0.75545852 -0.37030568 -1.04323144]
  [-0.97991266 -0.87772926 -1.92270742 -1.86157205 0.91353712]
  [1.71921397 1.83144105 -1.37292576 -1.78689956 -0.66113537]]

 [[-0.509375 -1.00625 -0.22321429 -1.09955357 0.83035714]
  [2.35892857 -0.94866071 0.48348214 1.44955357 2.00401786]
  [-1.73660714 1.76964286 -1.94241071 0.61607143 -0.02053571]
  [1.49866071 1.5125 1.29598214 -1.04285714 0.68571429]]

 [[ 1.13822222 0.02311111 -0.56977778 1.60933333 -0.92977778]
  [0.392 -0.164 2.40355556 0.148 -1.54444444]
  [0.94488889 -0.87022222 -1.22088889 -1.35288889 1.36222222]
  [-1.27955556 -0.20488889 2.19111111 0.61644444 -0.44222222]]]

The two are the same except for the number of decimal places retained.

2. The initial data is image data

The initial data in the above example is a floating point number, while the real data is an image
That is, numpy.ndarray (H x W x C), the range is [0, 255] Next, start processing the image data

import torch
import numpy as np
from torchvision import transforms

mean = (0.485, 0.456, 0.406)
std = (0.229, 0.224, 0.225)


data0 =np.random.randint(0,255,size = [4,5,3],dtype = np.uint8)

print(data0.shape)
print(data0)

data1 = transforms.ToTensor()(data0)
print(data1.shape) # C x H x W
print(data1)
data2 = transforms.Normalize(mean, std)(data1)
print(data2)


# test
data1 = data0 / 255
data1 = np.transpose(data1, (2, 0, 1))
print(data1.shape)
_std = np.array(std).reshape((3, 1, 1))
_mean = np.array(mean).reshape((3, 1, 1))

data2 = (data1 - _mean) / _std

print(data2)

result

(4, 5, 3)
[[[234 149 252]
  [210 229 46]
  [146 237 43]
  [42 103 219]
  [221 104 73]]

 [[ 6 18 197]
  [81 67 235]
  [70 170 110]
  [242 38 157]
  [201 204 98]]

 [[ 28 130 183]
  [82 83 234]
  [138 80 97]
  [14 119 183]
  [100 158 13]]

 [[122 17 245]
  [40 62 203]
  [250 165 40]
  [219 131 107]
  [126 214 139]]]
torch.Size([3, 4, 5])
tensor([[[0.9176, 0.8235, 0.5725, 0.1647, 0.8667],
         [0.0235, 0.3176, 0.2745, 0.9490, 0.7882],
         [0.1098, 0.3216, 0.5412, 0.0549, 0.3922],
         [0.4784, 0.1569, 0.9804, 0.8588, 0.4941]],

        [[0.5843, 0.8980, 0.9294, 0.4039, 0.4078],
         [0.0706, 0.2627, 0.6667, 0.1490, 0.8000],
         [0.5098, 0.3255, 0.3137, 0.4667, 0.6196],
         [0.0667, 0.2431, 0.6471, 0.5137, 0.8392]],

        [[0.9882, 0.1804, 0.1686, 0.8588, 0.2863],
         [0.7725, 0.9216, 0.4314, 0.6157, 0.3843],
         [0.7176, 0.9176, 0.3804, 0.7176, 0.0510],
         [0.9608, 0.7961, 0.1569, 0.4196, 0.5451]]])
tensor([[[ 1.8893, 1.4783, 0.3823, -1.3987, 1.6667],
         [-2.0152, -0.7308, -0.9192, 2.0263, 1.3242],
         [-1.6384, -0.7137, 0.2453, -1.8782, -0.4054],
         [-0.0287, -1.4329, 2.1633, 1.6324, 0.0398]],

        [[ 0.5728, 1.9734, 2.1134, -0.2325, -0.2150],
         [-1.7206, -0.8627, 0.9405, -1.3704, 1.5357],
         [0.2402, -0.5826, -0.6352, 0.0476, 0.7304],
         [-1.7381, -0.9503, 0.8529, 0.2577, 1.7108]],

        [[ 2.5877, -1.0027, -1.0550, 2.0125, -0.5321],
         [1.6291, 2.2914, 0.1128, 0.9319, -0.0964],
         [1.3851, 2.2740, -0.1138, 1.3851, -1.5779],
         [2.4657, 1.7337, -1.1073, 0.0605, 0.6182]]])
(3, 4, 5)
[[[ 1.88928847 1.47829437 0.38231013 -1.39866427 1.66666667]
  [-2.01515541 -0.73079887 -0.91917116 2.0262865 1.32417159]
  [-1.63841082 -0.71367412 0.2453121 -1.87815738 -0.40542855]
  [-0.02868396 -1.43291378 2.16328453 1.63241716 0.03981505]]

 [[ 0.57282913 1.97338936 2.11344538 -0.232493 -0.21498599]
  [-1.72058824 -0.8627451 0.94047619 -1.37044818 1.53571429]
  [0.24019608 -0.58263305 -0.63515406 0.04761905 0.73039216]
  [-1.73809524 -0.95028011 0.85294118 0.25770308 1.71078431]]

 [[ 2.58771242 -1.00270153 -1.05498911 2.01254902 -0.53211329]
  [1.62910675 2.29141612 0.11276688 0.931939 -0.09638344]
  [1.38509804 2.27398693 -0.11381264 1.38509804 -1.57786492]
  [2.46570806 1.73368192 -1.10727669 0.0604793 0.61821351]]]

Same result
explain
The random.randint() method returns an integer within the specified range.
randint(start, stop) is equivalent to randrange(start, stop + 1).
When the image data is numpy.ndarray (H x W x C), the range is [0, 255] unsigned integer
ToTensor means that the data dimension changes from H x W x C to C x H x W, and the data is scaled to [0.0, 1.0]
Therefore, during verification, you must first divide by 255 and scale the data to [0.0, 1.0]