Combined with PixelShuffle operation, understanding of high-dimensional tensor reshape and transpose dimension exchange

PixelShuffle and tensor dimension reshaping operations

  • Dimensionality swap from matrix transpose to high-dimensional tensor
    • About PixelShuffle
    • Understanding matrix transposition from the base, strides, and address properties of Array
    • Extend the above understanding to high-dimensional tensor arrays and PixelShuffle operations
    • The case of direct reshape without replacing dimensions
    • Summarize

Dimension exchange from matrix transpose to high-dimensional tensor

When reading the code of the paper, there was some confusion about the implementation of PixelShuffle. In some codes on the Internet, the operation of pixelshuffle can be realized by using simple reshape and corresponding transpose (for numpy) or permute (for pytorch) functions, while for high There is still no way to understand the exchange results between the dimensions of the tensor to ensure that the data after dimension exchange is correct and the data you want, so here we take numpy as an example, starting from the transposition of a two-dimensional matrix, Understand dimensionality swapping of high-dimensional tensors. Record a blog for your own learning and memory

About PixelShuffle

PixelShuffle is a commonly used upsampling operation in low-level tasks. For a detailed introduction, please refer to PixelShuffle.
The pixelshuffle process implemented with numpy looks like this:

import numpy as np

a = np.arange(36).reshape([4, 3, 3]) # array(4, 3, 3) corresponds to (C,H,W)
b = a. reshape([2, 2, 3, 3]) # array(2, 2, 3, 3)
c = b.transpose([2, 0, 3, 1]) # array(3, 2, 3, 3)
d = c.reshape([6, 6]) # upsampled array(6, 6) ie (2*H, 2*W)

And why the above dimension reshaping operation can achieve the effect of upsampling as explained in PixelShuffle. How do we make sure that the tensor data after dimension exchange and shape reshaping meets our expectations?

Understand matrix transposition from the base, strides, and address attributes of Array

  1. The base attribute of ndarray refers to which array is transformed from the current array, for example
a = np.arange(36).reshape((4, 3, 3))
print(f"base of a:{<!-- -->a.base}\\
")

b = a.reshape((2, 2, 3, 3))
print(f"base of b:{<!-- -->b.base}\\
")

c = b.transpose((2, 0, 3, 1))
print(f"base of c:{<!-- -->c.base}\\
")

# output:
"""
base of a:[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
 24 25 26 27 28 29 30 31 32 33 34 35]
base of b:[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
 24 25 26 27 28 29 30 31 32 33 34 35]
 base of c:[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
 24 25 26 27 28 29 30 31 32 33 34 35]
"""

It can be seen that a, b, and c are all transformed from the most original one-dimensional array. If an array variable is obtained through multiple transformations, the base property returns the initial base array.
2. The stride attribute of ndarray represents that when each dimension is valued, a value is taken for every element in the original continuous memory space. Here, the original continuous memory space refers to a.base Arrays represented by:

print(f"stride of a:{<!-- -->a.strides}\\
")
print(f"stride of b:{<!-- -->b. strides}\\
")

"""
outputs:
stride of a:(36, 12, 4) # integer, 4bytes one element
stride of b:(72, 36, 12, 4)

# In fact, if it is converted to the number of real elements,
# a:(9, 3, 1)
# b:(18, 9, 3, 1)
"""

For a, the shape is (4,3,3), and strides[2] is 1, that is to say, the second dimension of a takes a value every 1 element in the original space, and the 1st dimension takes every 3 elements Take a value to fill, and fill every 9 elements in the 0th dimension (filling here is just a visual expression, in essence, these reshape operations will not change the original memory space, which is a shallow copy); b and so on:

print(f"a:{<!-- -->a}\\
")
print(f"address of a:{<!-- -->a.__array_interface__['data']}\\
")
print(f"b:{<!-- -->b}\\
")
print(f"address of b:{<!-- -->b.__array_interface__['data']}\\
")
"""
outputs:
a:[[[ 0 1 2]
  [ 3 4 5 ]
  [ 6 7 8]]

 [[ 9 10 11]
  [12 13 14]
  [15 16 17]]

 [[18 19 20]
  [21 22 23]
  [24 25 26]]

 [[27 28 29]
  [30 31 32]
  [33 34 35]]]

address of a:(2994087337488, False)

b:[[[[ 0 1 2]
   [ 3 4 5 ]
   [ 6 7 8]]

  [[ 9 10 11]
   [12 13 14]
   [15 16 17]]]


 [[[18 19 20]
   [21 22 23]
   [24 25 26]]

  [[27 28 29]
   [30 31 32]
   [33 34 35]]]]

address of b:(2994087337488, False)

"""

You can see that the address has not changed. My personal understanding here is to start filling from the rightmost dimension, take the value according to the corresponding strides, get enough elements of the current dimension, and then take the value according to the strides and the number of elements of the -2 dimension, and so on
3. Transpose:
Starting from the two-dimensional matrix transpose operation, the transpose operation essentially just changes the shape and strides attributes:

mat_a = np.arange(12).reshape(3, 4)
print(f"mat_a shape:{<!-- -->mat_a.shape}\\
")
print(f"mat_a strides:{<!-- -->mat_a.strides}\\
")
print(f"mat_a:{<!-- -->mat_a}\\
")
mat_b = mat_a.transpose((1, 0))
print(f"mat_b shape:{<!-- -->mat_b.shape}\\
")
print(f"mat_b strides:{<!-- -->mat_b. strides}\\
")
print(f"mat_b:{<!-- -->mat_b}\\
")

"""
outputs:
mat_a shape: (3, 4)

mat_a strides: (16, 4)

mat_a:[[ 0 1 2 3]
 [ 4 5 6 7 ]
 [ 8 9 10 11]]

mat_b shape: (4, 3)

mat_b strides: (4, 16)

mat_b:[[ 0 4 8]
 [ 1 5 9 ]
 [ 2 6 10 ]
 [ 3 7 11]]

"""

When the matrix is transposed, the strides also follow the matrix for the corresponding transposition. In this way, for mat_b, for the first dimension, every 4 elements take a value, take 3 times, and get [0, 4, 8], Then start the next element of the 0th dimension, the difference between the starting element and 0 is 1 element, and then take a value every 4 elements from the starting element and take 3 times to get [1,5,9] and so on. Finally, we get the familiar transposed matrix b

The above understanding is extended to high-dimensional tensor arrays and PixelShuffle operations

For the replacement operation between the dimensions of high-dimensional arrays, we must remember that the strides attribute has also been replaced accordingly. Starting from the strides attribute, we can clearly understand what the elements in the array after the dimension replacement will look like. , whether it meets the result we want:

a = np.arange(36).reshape((4, 3, 3))
print(f"stride of a:{<!-- -->a.strides}\\
")
print(f"shape of a:{<!-- -->a.shape}\\
")
print(f"base of a:{<!-- -->a.base}\\
")
print(f"a:{<!-- -->a}\\
")
print(f"address of a:{<!-- -->a.__array_interface__['data']}\\
")

b = a.reshape((2, 2, 3, 3))
print(f"stride of b:{<!-- -->b. strides}\\
")
print(f"shape of b:{<!-- -->b.shape}\\
")
print(f"base of b:{<!-- -->b.base}\\
")
print(f"b:{<!-- -->b}\\
")
print(f"address of b:{<!-- -->b.__array_interface__['data']}\\
")

c = b.transpose((2, 0, 3, 1))
print(f"stride of c:{<!-- -->c.strides}\\
")
print(f"shape of c:{<!-- -->c.shape}\\
")
print(f"base of c:{<!-- -->c.base}\\
")
print(f"c:{<!-- -->c}\\
")
print(f"address of c:{<!-- -->c.__array_interface__['data']}\\
")

d = c.reshape((6, 6))
print(f"stride of d:{<!-- -->d.strides}\\
")
print(f"shape of d:{<!-- -->d.shape}\\
")
print(f"base of d:{<!-- -->d.base}\\
")
print(f"d:{<!-- -->d}\\
")
print(f"address of d:{<!-- -->d.__array_interface__['data']}\\
")

"""
outputs:
stride of a:(36, 12, 4)

shape of a:(4, 3, 3)

base of a:[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
 24 25 26 27 28 29 30 31 32 33 34 35]

a:[[[ 0 1 2]
  [ 3 4 5 ]
  [ 6 7 8]]

 [[ 9 10 11]
  [12 13 14]
  [15 16 17]]

 [[18 19 20]
  [21 22 23]
  [24 25 26]]

 [[27 28 29]
  [30 31 32]
  [33 34 35]]]

address of a:(2959935178752, False)

stride of b:(72, 36, 12, 4)

shape of b:(2, 2, 3, 3)

base of b:[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
 24 25 26 27 28 29 30 31 32 33 34 35]

b:[[[[ 0 1 2]
   [ 3 4 5 ]
   [ 6 7 8]]

  [[ 9 10 11]
   [12 13 14]
   [15 16 17]]]


 [[[18 19 20]
   [21 22 23]
   [24 25 26]]

  [[27 28 29]
   [30 31 32]
   [33 34 35]]]]

address of b:(2959935178752, False)

stride of c:(12, 72, 4, 36)

shape of c:(3, 2, 3, 2)

base of c:[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
 24 25 26 27 28 29 30 31 32 33 34 35]

c:[[[[ 0 9]
   [ 1 10 ]
   [ 2 11]]

  [[18 27]
   [19 28]
   [20 29]]]


 [[[ 3 12]
   [ 4 13 ]
   [ 5 14]]

  [[21 30]
   [22 31]
   [23 32]]]


 [[[ 6 15]
   [ 7 16 ]
   [ 8 17]]

  [[24 33]
   [25 34]
   [26 35]]]]

address of c:(2959935178752, False)

stride of d:(24, 4)

shape of d:(6, 6)

base of d:[[[[ 0 9]
   [ 1 10 ]
   [ 2 11]]

  [[18 27]
   [19 28]
   [20 29]]]


 [[[ 3 12]
   [ 4 13 ]
   [ 5 14]]

  [[21 30]
   [22 31]
   [23 32]]]


 [[[ 6 15]
   [ 7 16 ]
   [ 8 17]]

  [[24 33]
   [25 34]
   [26 35]]]]

d:[[ 0 9 1 10 2 11]
 [18 27 19 28 20 29]
 [ 3 12 4 13 5 14]
 [21 30 22 31 23 32]
 [ 6 15 7 16 8 17]
 [24 33 25 34 26 35]]

address of d:(2959935177312, False)

"""

What needs to be noted here is that the address of d has changed, indicating that d is obtained by deep copying. The reason can refer to the analysis of view() and reshape() in pytorch. My understanding of numpy’s reason for deep copying after dimension replacement and reshape operation should be the same as in pytorch. Since c has been dimensionally replaced, it does not meet the continuity conditions described in the previous reference blog post. Therefore, when reshaping c, will open up a memory space, and the elements in c will be vectorized according to the row and continuous storage, and then reshape operation on this continuous storage area. At this time, the base of d also changes, and becomes an array stored in a new continuous memory space.

Since the shape of d is (6,6), which is the upsampling result after the PixelShuffle operation, then the strides can be calculated from the rightmost dimension, and the strides can be derived: ([24, 4]) (byte representation), ( [6, 1]) (represented by the number of elements). Fill the d array according to the strides value, and get the print result of the above d, which is in line with the expected output result of the PixelShuffle design.

The case of direct reshape without replacing dimensions

First of all, it is wrong to do this for pixel shuffle, just run the code to deepen the understanding of reshape and strides

e = a.reshape((3, 2, 3, 2))
print(f"stride of e:{<!-- -->e.strides}\\
")
print(f"shape of e:{<!-- -->e.shape}\\
")
print(f"base of e:{<!-- -->e.base}\\
")
print(f"e:{<!-- -->e}\\
")
print(f"address of e:{<!-- -->e.__array_interface__['data']}\\
")

f = e.reshape((6, 6))
print(f"stride of f:{<!-- -->f.strides}\\
")
print(f"shape of f:{<!-- -->f.shape}\\
")
print(f"base of f:{<!-- -->f.base}\\
")
print(f"f:{<!-- -->f}\\
")
print(f"address of f:{<!-- -->f.__array_interface__['data']}\\
")
"""
outputs:
#Note that the blogger runs twice, so the address has changed, but for variables, the address of a is the same as that of e and f
stride of e: (48, 24, 8, 4)

shape of e: (3, 2, 3, 2)

base of e:[0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
 24 25 26 27 28 29 30 31 32 33 34 35]

e:[[[[ 0 1]
   [ twenty three]
   [ 4 5]]

  [[ 6 7]
   [ 8 9 ]
   [10 11]]]


 [[[12 13]
   [14 15]
   [16 17]]

  [[18 19]
   [20 21]
   [22 23]]]


 [[[24 25]
   [26 27]
   [28 29]]

  [[30 31]
   [32 33]
   [34 35]]]]

address of e:(1635184876144, False)

stride of f:(24, 4)

shape of f:(6, 6)

base of f:[0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
 24 25 26 27 28 29 30 31 32 33 34 35]

f:[[ 0 1 2 3 4 5]
 [ 6 7 8 9 10 11]
 [12 13 14 15 16 17]
 [18 19 20 21 22 23]
 [24 25 26 27 28 29]
 [30 31 32 33 34 35]]

address of f:(1635184876144, False)


Process finished with exit code 0

"""

It can be seen from the above results that when the dimension replacement is not performed, but only the reshape operation is performed, the array stored in the original continuous memory space is always valued according to the corresponding strides. In this case, strides can be calculated as follows: The rightmost dimension (i.e. -1 dimension) always takes the value of every other element (satisfies the continuity condition) (without dimension replacement) , the dimensions are derived from the right to the left according to the shape, and the strides corresponding to other dimensions are sequentially launched. For example, the shape of f is (6,6), we first know the strides f:([xx, 4]), because in the case of only reshape, the strides of -1 dimension must be 4 bytes (for int, that is 1 elements), and the -1 dimension has 6 elements, then for the 0th dimension, every 6 elements take a value of 24 bytes, so strides f:([24, 4])

Summary

The above is the understanding of numpy-based dimension replacement plus reshape operation. Later, when you need to design an operator that involves dimension replacement and other operations like pixelshuffle, you can refer to these steps to derive the result of dimension replacement plus reshape.