Numpy Data Analysis 02 – Slicing and Indexing

Slicing and indexing of Numpy arrays

1. Slicing and indexing of one-dimensional arrays

Colon split slicing parameters [start:stop:step]
  • The contents of an ndarray object can be accessed and modified by indexing or slicing, just like the slicing operation of lists in Python.
  • ndarray arrays can be indexed based on subscripts from 0-n

Note: The difference is that the array slice is a view of the original array (which means that if you make any changes, the original array will be changed). This also means that if we don’t want to change the original array, we need to make an explicit copy to get a copy of it (.copy()).

Copying the original list through slicing and copying is copy assignment, and direct equality is reference assignment.

Explanation of colon:: If only one parameter is placed,

  • Like [2], a single element corresponding to the index will be returned
  • If it is [2:], it means that all items starting from this index will be extracted.
  • If two parameters are used, such as [2:7], then the items between the two indexes (excluding the stop index) are extracted

Why slices and ranges ignore the last element
Computer scientist Edsger W. Dijkstra (Edsger W. Dijkstra), the explanation of delattr style should be better:

  • When there is only information about the last position (for example, the index of the last position is 3), we can quickly see how many elements there are in the slice and range: range(3) and my_list[:3]
  • When the starting position information is visible, we can quickly calculate the length of the slice and interval by subtracting the first subscript (stop-start) from a number.
  • Doing this also allows us to use any subscript to split the sequence into two non-overlapping parts, just write my_list[:x] and my_list[x:].

for example:

ar = np.array([10,20,30,40,50,60])
# Start cutting at index 2
print('ar[2:]',ar[2:])
# End the split before index 3
print('ar[:3]',ar[:3])
# Use subscript 2 to split the array into two non-overlapping parts
print('ar[:2]',ar[:2])
print('ar[2:]',ar[2:])

Output result:

2. Slicing and indexing of two-dimensional arrays

The same applies to the above index extraction method.

Note: Slicing can also use the ellipsis “…”. If the ellipsis is used in the row position, the return value will include all row elements; otherwise, all column elements will be included.

3. Advanced operations of index

3.1 Integer array index

Advanced indexing methods can also be used in numpy, such as integer array index and Boolean index. The two indexes will be introduced below.
# Create a two-dimensional array
x = np.array([
    [1,2],
    [2,4],
    [5,6]
])
# [0,1,2] represents the row index; [0,1,0] represents the column index
y = x[[0,1,2],[0,1,0]]
#y obtains the data of (0,0), (1,1) and (2,0) in x respectively
y

turn out:

  • The four corner elements in the 4*3 array are obtained. Their corresponding row indices are [0,0] and [3,3], and the column indices are [0,2] and [0,2].
b = np.array([
    [0,1,2],
    [3,4,5],
    [6,7,8],
    [9,10,11]
])
a = b[[0,0,3,3],[0,2,0,2]]
print('a:',a)
r = np.array([[0,0],[3,3]]).reshape(4)
print('r:',r)
l = np.array([[0,2],[0,2]]).reshape(4)
print('l:',l)
s = b[r,l].reshape((2,2))
print('s:',s)

The output is:

Equivalent to:

a = b[[0,0,3,3],[0,2,0,2]].reshape((2,2))
a
  • practise
    Create an 8*8 chess board matrix (black blocks are 0, white blocks are 1):
    1.[0 1 0 1 0 1 0 1]
    2.[1 0 1 0 1 0 1 0]
    3.[0 1 0 1 0 1 0 1]
    4.[1 0 1 0 1 0 1 0]
    5.[0 1 0 1 0 1 0 1]
    6.[1 0 1 0 1 0 1 0]
    7.[0 1 0 1 0 1 0 1]
    8.[1 0 1 0 1 0 1 0]

?first step:

# First create an array of all 0s
Z = np.zeros((8,8),dtype=int)
Z

Output result:

Step two:

# Change the value of some elements to 1
Z[1::2,::2] = 1
Z

Output result:

third step:

# Then change the remaining elements to 1
Z[::2,1::2] = 1
Z

Output result:

3.2 Boolean array index

When the output result requires Boolean operations (such as comparison operations), another advanced indexing method, Boolean array index, will be used. The following example returns all elements in an array greater than 6:

# Returns an array consisting of all numbers greater than 6
x = np.array([[0,1,2],[3,4,5],[6,7,8],[9,10,11]])
x[x>6]

  • Boolean indexing is implemented by matching a matrix with the same number of rows or columns as the one-dimensional array through the Boolean value of each element in the one-dimensional array. This function is actually to extract the corresponding rows or columns with a Boolean value of True in the one-dimensional array.
  • Note: The length of the one-dimensional array must be consistent with the length of the dimension or axis you want to slice.

Exercise:
1. Extract all odd numbers in the array
2. Modify the odd value to -1

Filter out data within a specified range:

  1. &and
  2. | or


The forms of True and False indicate required and unnecessary data

You can also slice with two one-dimensional boolean arrays

From the above results, we can know that the prerequisite for slicing is that the number of True in the two one-dimensional Boolean arrays needs to be equal.
If the index shape does not match, the index array cannot be broadcast with that shape. When accessing numpy multidimensional arrays, the arrays used for indexing need to be of the same shape (rows and columns). Numpy will be able to broadcast, so if you want to achieve selection of different dimensions, you can select the array in two steps.

  1. For example, when I need to select columns 1, 3, and 4 of the first and last rows, select the rows first and then the columns.
  2. Select to read the first and last rows of array a3_4, save them to temp, and then filter the corresponding columns.

3.3 Changes in array index and slice values will modify the original array

ar = np.arange(10)
print(ar)
ar[5] = 100
ar[7:9] = 200
print(ar)
# When a scalar is assigned to an index/slice, the original array will be automatically mutated/propagated

Output result:

  • Copy operation can be used
# Copy
ar = np.arange(10)
b = ar.copy()
# Or b=np.array(ar)
b[7:9] = 200
print('ar:',ar)
print('b:',b)

The output is:

4. Numpy broadcast mechanism

4.1 Numpy’s broadcast mechanism

Broadcast is numpy’s way of performing numerical calculations on arrays of different shapes. Arithmetic operations on arrays are usually performed on corresponding elements.
If the shapes of the two arrays a and b are the same, that is, a.shape == b.shape is satisfied, then the result of a*b is the multiplication of the corresponding bits of the a and b arrays. This requires the same number of dimensions and the same length of each dimension.


But what if there are two arrays with different shapes? Can’t operations be performed between them? Of course not. In order to keep the array shape the same, Numpy designed a broadcast mechanism. The core of this mechanism is to repeat a smaller array a certain number of times horizontally or vertically so that it has the same shape as the larger array. same dimensions.

The image below shows how array b is made compatible with array a through broadcasting:

Adding a 4*3 two-dimensional array to a one-dimensional array of length 3 is equivalent to repeating the operation on array b four times in two dimensions.

4.2 Broadcasting Rules

  • Let all input arrays be aligned with the array with the longest shape, and the missing part of the shape is made up by adding 1 to the front.
  • The shape of the output array is the maximum value in each dimension of the shape of the input array.
  • If a dimension of the input array has the same length as the corresponding dimension of the output array or its length is 1, this array can be used for calculation, otherwise an error occurs.
  • When the length of a dimension of the input array is 1, the first set of values in this dimension will be used when operating along this dimension.
M = np.ones((2,3))
print(M)
a = np.arange(3)
print(a)
# The shapes of the two arrays are M.shape(2,3), a.shape=(3,)
# As you can see, according to rule 1, the dimension of array a is smaller, so add 1 to its left side to become M.shape -> (2,3), a.shape -> (1,3)
# According to rule 2, the first dimension does not match, so this dimension is extended to match the array: M.shape -> (2,3) , a.shape -> (2,3)
# Now the shapes of the two arrays match, you can see that their final shapes are (2,3)
M+a

Output result:

a = np.arange(3).reshape((3,1))
print('a:',a)
b = np.arange(3)
print('b:',b)
a + b

#The shapes of the two arrays are: a.shape(3,1),b.shape(3,)
# According to rule 1, the shape of b needs to be completed with 1: a.shape -> (3,1), b.shape -> (1,3)
# According to rule 2, the dimensions of the two arrays need to be updated to match each other: a.shape -> (3,3), b.shape -> (3,3)
# Because the results match, the two shapes are compatible

Output result:

4.3 Another simple understanding of broadcast rules

  • Right-align the dimensions of the two arrays, and then compare the values in the corresponding dimensions.
  • If the values are equal or one of them is 1 or empty, broadcast operation can be performed.
  • The output dimension size is the numerical value obtained from the numerical value. Otherwise, array operations cannot be performed.

Align right 1:

The size of array a is (2,3)
The size of array b is (1,)
First align right:
    twenty three
       1
----------
    twenty three 
So the output size of the last two array operations is: (2,3)

Example:

# The size of array a is (2,3)
a = np.arange(6).reshape(2,3)
print('a:',a)
#The size of array b is (1,)
b = np.array([5])
print('b:',b)
c = a*b
#The size of the output is (2,3)
c

Output result:

Align right 2:

The size of array a is (2,1,3)
The size of array b is (4,1)
First align right:
  2 1 3
     4 1
----------
  2 4 3
So the output size of the last two array operations is (2,4,3)

Example:

# The array size is (2,1,3)
a = np.arange(6).reshape(2,1,3)
print('a:',a)
print('--'*10)
#The array size is (4,1)
b = np.arange(4).reshape(4,1)
print('b:',b)
print('--'*10)
c = a + b
print(c,c.shape)

Output result:

Here you can see:

  • After the two arrays are right-aligned, the values in the corresponding dimensions are either equal, 1, or missing and the larger value is taken.
  • Otherwise, an error will be reported, and operations cannot be performed on the following two arrays.

Examples that cannot be matched:

The size of array a is (2,1,3)
The size of array b is (4,2)
First align right:
  2 1 3
     4 2
----------
2 and 3 do not match, and no operation can be performed at this time, unless 2 is replaced by 1 or 3 to match.

Example:

# The size of array a is (2,1,3)
a = np.arange(6).reshape(2,1,3)
print('a:',a)
print('--'*10)
#The size of array b is (4,2)
b = np.arange(8).reshape(4,2)
print('b:',b)
print('--'*10)
# Run error
a + b

Output result:

Summary