Data analysis—-Numpy

1. Introduction

NumPy is an open source numerical computing extension for Python. This tool can be used to store and process large matrices. It is much more efficient than Python’s own nested list structure (which can also be used to represent matrices). It supports a large number of dimensional array and matrix operations, and also provides a large number of array operations. mathematical function library. (Excerpted from Baidu Encyclopedia)

2. Import numpy package

You need to import the numpy package before use, as follows:

import numpy
import numpy as np #Set np as the abbreviation of numpy
from numpy import *

3. Array

A collection of similar data elements arranged in an order manner is called an array
Ordered: indexing possible
Same kind: The data types of the elements in the array are the same (this is also the difference between arrays and lists)

4. Index of array

1. One-dimensional array

(1) Index the nth element
For example, take the third element in the array

import numpy as np
a=np.array([0,1,2,3,4,5]) #Define a one-dimensional array
print(a[2])

(2) Modify the nth element
For example, change the third element to 10

import numpy as np
a=np.array([0,1,2,3,4,5])
a[2]=10
print(a)

The result output is [0,1,10,3,4,5]
(3) Fancy indexing of one-dimensional array
a. Enter the fancy index at the specified location
grammar:
l=[index 1, index 2,…, index n]
print(a[l])
For example, take the 1st, 3rd, and 5th elements of array b

import numpy as np
b=np.array([0,10,20,30,40,50,60,70,80,90])
l=[0,2,4]
print(a[l])

The output result is [0,20,40]
b. Boolean fancy index
For example, take the 1st, 2nd, and 4th elements in array a

import numpy as np
a=np.array([0,1,2,3,4,5])
l=np.array([1,1,0,1,0,0],dtype=bool) #Convert the array type to Boolean
print(a[l])

The output result is [0,1,3]

2. Multidimensional array

(1) Index the element in row x and column y

For example, take the elements in row 1 and column 2

import numpy as np
a=np.array([[0,1,2,3],[4,5,6,7]]) #Define a two-dimensional array. The number of elements in each row of a multi-dimensional array must be equal.
print(a[0,1])

Note: Row and column indexes start from 0
(2) Take the nth row
For example, take the second row of array a

print(a[1])

(3) Fancy indexing of multi-dimensional arrays
a. Enter the fancy index at the specified location
Syntax: a[(row 1, row 2, …, row n), (column 1, column 2, …, column n)]
For example, take the elements in row 2, column 3 and row 4, column 4 of array a

import numpy as np
a=np.array([[0,1,2,3],
            [4,5,6,7],
            [8,9,10,11],
            [12,13,14,15]])
print(a[(1,3),(2,3)])

The result is [6,15]
b. Boolean fancy index
For example, take the 1st and 4th elements in column 3 of array a

import numpy as np
a=np.array([[0,1,2,3],
            [4,5,6,7],
            [8,9,10,11],
            [12,13,14,15]])
mask=np.array([1,0,0,1],dtype=bool)
a[mask,2]

The output result is [2,14]

5. Array slicing

Slicing of arrays supports both positive and negative indexes

1. One-dimensional array

a[starting point:end point (not included):step]

2. Multidimensional array

a[row starting point; row end point (not included): step size, column starting point; column end point (not included): step size]
For example:

import numpy as np
a=np.array([[0,1,2,3],
            [4,5,6,7],
            [8,9,10,11],
            [12,13,14,15]])
print(a[1::,:2:])

The result is [[ 4 5] [ 8 9] [12 13]]

Note: Fancy indexing is different from slicing. Slicing uses a reference mechanism in memory. Changing the slice will also change the original array. Fancy indexing returns a copy of the original object instead of a reference.

6. Common operations on arrays

1.Type conversion

Syntax: (1) a=np.array([1,2,3,4],dtype=data type)
(2) a=a.astype(data type)

2. Generate an array of all 0s

Syntax: np.zeros(number)

import numpy as np
c=np.zeros(4)
print(c)

Since the default data type of the array is float, the result is:
[0. 0. 0. 0.]
If you want integer form, change the second line to:

c=np.zeros(4,dtype=int)

Generate an array of all ones and change zeros to ones

3.fill method

Syntax: a.fill (specified value)
For example, change all elements in array c to 5

import numpy as np
c=np.zeros(4,dtype=int)
c.fill(5)
print(c)

The result is [5,5,5,5]
Note: If the incoming parameter is different from the array type, it needs to be converted to the same type

4. Generate arithmetic sequence

Grammar: (1) a=np.arange (starting point, end point (not included), step size)
(2)a=np.linspace (starting point, end point (inclusive), number)

5. Generate random numbers

Syntax: a=np.random.rand(number)

6.View type

Take the following code as an example:

import numpy as np
a=np.array([[0,1,2,3],
            [4,5,6,7],
            [8,9,10,11],
            [12,13,14,15]])
print(type(a))
print(a.dtype)

The output is:

int32
It can be seen that type() looks at the data type of the variable, and dtype looks at the data type of the elements in the array~~ (I don’t know how to say it, but that’s probably what it means)~~

7. View shapes

(1) One-dimensional array

import numpy as np
b=np.array([0,10,20,30,40,50,60,70,80,90])
print(b.shape) #It can also be written as shape(b)

Output result: (10,)
Indicates that there are 10 elements in a row
(2)Multidimensional array

import numpy as np
a=np.array([[0,1,2],
            [4,5,6],
            [8,9,10],
            [12,13,14]])
print(a.shape)

Output result: (4, 3)
Indicates 4 rows and 3 columns

8. Check the number of elements in the array

How to use it:

import numpy as np
b=np.array([0,10,20,30,40,50,60,70,80,90])
print(b.size)

9. View array dimensions

import numpy as np
b=np.array([0,10,20,30,40,50,60,70,80,90])
print(b.ndim)

10. Sorting

(1) sort function

import numpy as np
a=np.array([6,8,4,2,9])
b=np.sort(a)
print(b)
print(a)

Output result:
[2, 4,6,8,9]
[6, 8, 4 ,2, 9]
Note: The original array remains unchanged
(2)argsort function
Returns the index position in the array arranged from small to large

a=np.array([6,8,4,2,9])
b=np.argsort(a)
print(b)
print(a[b])

The output is:
[3 ,2 ,0 ,1, 4] #This is the index
[2 ,4 ,6 ,8 ,9]

In addition to the above, there are:

Function Syntax
Sum np.sum(a)/a.sum()
Find the maximum value np.max(a)/a.max()
Find the minimum value np.min(a)/a.min()
Find Mean np.mean(a)/a.mean()
Find the standard deviation np.std(a) /a.std()
Correlation coefficient matrix np.cov(a,b)
Get the absolute value np.abs(a)
Find the exponent np.exp(a)
Find the median value np.median(a)
Find the cumulative sum np.cumsum(a)

There is much more than these. . . .

7. Common operations on multi-dimensional arrays

1. Array shape

(1)shape
Syntax: a.shape=row, column

import numpy as np
b=np.array([0,10,20,30,40,50,60,70,80,90])
b.shape=2,5
print(b)

The output is:
[[0, 10,20,30,40]
[50,60, 70,80, 90]]
(2)reshape
Syntax: a2=a1.reshape(row, column)

import numpy as np
b=np.array([0,10,20,30,40,50,60,70,80,90])
c=b.reshape(2,5)
print(c)
print(b)

The output is:
[[ 0 10 20 30 40]
[50 60 70 80 90]]
[0 10 20 30 40 50 60 70 80 90]
It can be seen that reshape does not change the original array

2. Transpose

That is, the rows and columns are interchanged, such as 2 rows and 3 columns become 3 rows and 2 columns.

import numpy as np
a=np.array([[0,1,2],
            [4,5,6],
            [8,9,10],
            [12,13,14]])
print(a.T) #It can also be written as a.transpose()

The output is:
[[ 0 4 8 12]
[1 5 9 13]
[2 6 10 14]]

3. Array splicing

(1)concatenate
Syntax: np.concatenate((array1,array2,…,arrayn),axis=0)
axis=0: vertical splicing
axis=1: horizontal splicing
Default axis=0

import numpy as np
a=np.array([[0,1,2,3,4],[5,6,7,8,9]])
b=np.array([[10,11,12,13,14],[15,16,17,18,19]])
c=np.concatenate((a,b),axis=0)
d=np.concatenate((a,b),axis=1)
print(c)
print(d)

The output is:
[[ 0 1 2 3 4]
[5 6 7 8 9]
[10 11 12 13 14]
[15 16 17 18 19]]
[[ 0 1 2 3 4 10 11 12 13 14]
[5 6 7 8 9 15 16 17 18 19]]
(2) Connect two-dimensional arrays into three-dimensional arrays
Syntax: array=np.array((array1), array2,…, arrayn))

import numpy as np
a=np.array([[0,1,2,3,4],[5,6,7,8,9]])
b=np.array([[10,11,12,13,14],[15,16,17,18,19]])
e=np.array((a,b))
print(e)

The output is:
[[[ 0 1 2 3 4]
[5 6 7 8 9]]

[[10 11 12 13 14]
[15 16 17 18 19]]]
(3) vstack, hstack, dstack
Vertical stacking: np.vstack((array 1, array 2, …, array n))
Horizontal splicing: np.hstack((array 1, array 2,…, array n))
Connect into a three-dimensional array: np.dstack((array 1, array 2, …, array n))