numpy and pandas – SyntaxBug

1.The difference between numpy and pandas

Numpy is mostly used to create arrays and perform operations based on matrices.

pandas is mostly used for processing tables and complex data processing

2. How to use pandas to import and export excel and csv files

2.1 Import

import pandas as pd
data_1=pd.read_csv('File name.csv')
data_2=pd.read_excel('File name.excel')

If there is Chinese content in the data, then

data_1=pd.read_csv('File name.csv',encoding='gb2312')
data=_2pd.read_excel('File name.excel',encoding='gb2312')

2.2 Export

data_1.to_csv('File name.csv')
data_2.to_excel('File name.excel')

If you do not want to export the row names or column names of the data, then

data.to_excel('File name.excel',index=False,header=None)

index=True means that row names will not be exported, header=None means that column names will not be exported

3. How to create array and Dataframe?

3.1 Create array

3.1.1 Convert other types of data

import numpy as np
list_1=[1,2,3,4]
tuple_1=[1,2,3,4]
a=np.array(list_1)
b=np.array(tuple_1)

The result is

[1 2 3 4]
[1 2 3 4]

You can also send multiple lines at the same time

c=np.array([list_1,tuple_1])

The result is

[[1 2 3 4]
 [1 2 3 4]]

3.1.2 Direct input

c=np.array([[1,2,3,4],
          [1,2,3,4]])

The result is

[[1 2 3 4]
 [1 2 3 4]]

3.1.3 Generate a matrix whose data is integer

d=np.ones((3,4))

(3,4) refers to three rows and four columns

The result is

[[1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]]

After adding dtype=’int’

d=np.ones((3,4),dtype='int')

The result is

[[1 1 1 1]
 [1 1 1 1]
 [1 1 1 1]]

dtype is the data type, the default is ‘float64’

3.1.4 Generating a matrix of numbers almost 0

e=np.empty((3,4))

The result is

[[3.70908653e + 006 7.89614618e + 150 1.75556682e + 194 3.93173170e + 092]
 [3.29368397e + 180 2.13780983e + 161 7.89614586e + 150 1.88557870e + 122]
 [8.50322469e + 102 3.18287651e-023 1.51359104e-013 5.34039795e-307]]

3.1.5 Generate a matrix with random element sizes between 0 and 1

f=np.random.random((2,4))

The result is

[[0.83300786 0.42734569 0.59351207 0.45089377]
 [0.96777171 0.87880651 0.93020442 0.66950248]]

3.1.6 Generate a matrix whose diagonal elements are 1 and the remaining elements are 0

g=np.eye(4)

The result is

[[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]]

3.1.7arange

h=np.arange(12).reshape(3,4)

The result is

[[ 0 1 2 3]
 [4 5 6 7]
 [8 9 10 11]]

3.2 Create DataFrame

3.2.1 Conversion List

3.2.1.1 One Dimension

import pandas as pd
list_1=[1,2,3,4]
df_1=pd.DataFrame(list_1)

The result is

 0
0 1
1 2
twenty three
3 4

3.2.1.2 Two Dimensions

list_2=[[1,2,3,4],[1,2,3,4]]
df_2=pd.DataFrame(list_2)

The result is

 0 1 2 3
0 1 2 3 4
1 1 2 3 4

3.2.1.3 Set row names and column names yourself

data_1=[['Blazar',1],['Mebius',2],['Z',3]]
df_1=pd.DataFrame(data_1,columns=['ultraman','cool'],index=[2023,2006,2020])

columns=[] Set the name of the column

index=[] sets the name of the row

The result is

 ultraman cool
2023 Blazar 1
2006 Mebius 2
2020 Z 3

3.2.2 Conversion Dictionary

3.2.2.1 Automatically add NaN

data_1=[{<!-- -->'a':1,'b':2},{<!-- -->'a':11,'b ':22,'c':33}]
df_1=pd.DataFrame(data_1)

Use the dictionary key as the column header

If there is no element at the corresponding position, it is NaN.

The result is

 a b c
0 1 2 NaN
1 5 10 20.0

3.2.2.2 Get elements under a specific header

data_2=[{<!-- -->'a':1,'b':2,'d':3},{<!-- -->'a ':11,'b':22,'c':33}]
df_2=pd.DataFrame(data_2,index=['first','second'],columns=['a','b','d'])

The result is

 a b d
first 1 2 3.0
second 11 22 NaN

4. How to extract data from a certain row and column of Dataframe?

4.1 By row/column name

import pandas as pd
dates=pd.date_range('20231029',periods=6)
df=pd.DataFrame(np.arange(24).reshape(6,4),index=dates,columns=['A','B','C','D' ])

The output is

 A B C D
2023-10-29 0 1 2 3
2023-10-30 4 5 6 7
2023-10-31 8 9 10 11
2023-11-01 12 13 14 15
2023-11-02 16 17 18 19
2023-11-03 20 21 22 23

The data of the row/column can be extracted based on the name of the row/column.

List:

print(df['A'])
# Equivalent to
print(df.A)

The result is

2023-10-29 0
2023-10-30 4
2023-10-31 8
2023-11-01 12
2023-11-02 16
2023-11-03 20
Freq: D, Name: A, dtype: int32

OK:

print(df.loc['20231031'])
print(df.loc['20231031',['A','B']])

The result is

A 8
B 9
C 10
D 11
Name: 2023-10-31 00:00:00, dtype: int32
A 8
B 9
Name: 2023-10-31 00:00:00, dtype: int32

4.2 Selection by location

print(df.iloc[3:5,1:3]) # Rows 4 and 5, columns 2 and 3
print(df.iloc[[1,3,5],1:3]) # Rows 2, 4, 6, columns 3 and 4

: Take the left but not the right

If there is no number on one side of : , it means taking all of them on that side.

The result is

 B C
2023-11-01 13 14
2023-11-02 17 18
             B C
2023-10-30 5 6
2023-11-01 13 14
2023-11-03 21 22

5. How to delete data?

DataFrame.drop(labels=None,axis=0, index=None, columns=None, inplace=False)
labels: rows or columns to be deleted, given as a list
axis: The default is 0, which means rows are to be deleted. When deleting columns, axis must be specified as 1.
index: Directly specify the rows to be deleted. To delete multiple rows, you can use a list as a parameter.
Columns: Directly specify the columns to be deleted. To delete multiple columns, you can use a list as a parameter.
inplace: Default is False, the deletion operation does not change the original data; when inplace = True, the original data is changed

5.1 Specify through the parameters labels and axis

dates=pd.date_range('20231029',periods=6)
df=pd.DataFrame(np.arange(24).reshape(6,4),index=dates,columns=['A','B','C','D' ])
df_1=df.drop('A',axis=1,inplace=False)
df_2=df.drop('20231030',axis=0,inplace=False)

The result is

 B C D
2023-10-29 1 2 3
2023-10-30 5 6 7
2023-10-31 9 10 11
2023-11-01 13 14 15
2023-11-02 17 18 19
2023-11-03 21 22 23
             A B C D
2023-10-29 0 1 2 3
2023-10-31 8 9 10 11
2023-11-01 12 13 14 15
2023-11-02 16 17 18 19
2023-11-03 20 21 22 23

5.2 Specify through index and columns

df_3=df.drop(columns=['A','C'],inplace=False)
df_4=df.drop(index=['20231101','20231102'],inplace=False)

The result is

 B D
2023-10-29 1 3
2023-10-30 5 7
2023-10-31 9 11
2023-11-01 13 15
2023-11-02 17 19
2023-11-03 21 23
             A B C D
2023-10-29 0 1 2 3
2023-10-30 4 5 6 7
2023-10-31 8 9 10 11
2023-11-03 20 21 22 23

6. How to check, delete and fill missing values in data

Create a matrix

dates=pd.date_range('20231029',periods=6)
df=pd.DataFrame(np.arange(24).reshape(6,4),index=dates,columns=['A','B','C','D' ])
df.iloc[1,2]=np.nan
df.iloc[0,1]=np.nan

The output is

 A B C D
2023-10-29 0 NaN 2.0 3
2023-10-30 4 5.0 NaN 7
2023-10-31 8 9.0 10.0 11
2023-11-01 12 13.0 14.0 15
2023-11-02 16 17.0 18.0 19
2023-11-03 20 21.0 22.0 23

6.1 Inspection

6.1.1 Return results in matrix form

print(df.isnull())

Return True if NaN

 A B C D
2023-10-29 False True False False
2023-10-30 False False True False
2023-10-31 False False False False
2023-11-01 False False False False
2023-11-02 False False False False
2023-11-03 False False False False

6.1.2 In the form of an element

Returns True if at least one is None

print(np.any(df.isnull()))

The result is

True

6.2 Delete

print(df.dropna(axis=0,how='any')) # Delete the row where the missing data is located
print(df.dropna(axis=1,how='any')) # Delete the column where the missing data is located

Determine whether the data is lost and delete the row/column where the lost data is located.

how=any/all’ determines whether to delete NaN or delete all NaN

The result is

 A B C D
2023-10-31 8 9.0 10.0 11
2023-11-01 12 13.0 14.0 15
2023-11-02 16 17.0 18.0 19
2023-11-03 20 21.0 22.0 23
             A D
2023-10-29 0 3
2023-10-30 4 7
2023-10-31 8 11
2023-11-01 12 15
2023-11-02 16 19
2023-11-03 20 23

6.3 Filling

6.3.1 Fill with specified value

print(df.fillna(value=0))

The result is

 A B C D
2023-10-29 0 0.0 2.0 3
2023-10-30 4 5.0 0.0 7
2023-10-31 8 9.0 10.0 11
2023-11-01 12 13.0 14.0 15
2023-11-02 16 17.0 18.0 19
2023-11-03 20 21.0 22.0 23

6.3.2 Replace missing values with the value before them

If axis =1, then the previous value in the horizontal direction replaces the missing value in the back. If axis=0, then the upper value replaces the lower missing value.

print(df.ffill(axis=1))

The result is

 A B C D
2023-10-29 0.0 0.0 2.0 3.0
2023-10-30 4.0 5.0 5.0 7.0
2023-10-31 8.0 9.0 10.0 11.0
2023-11-01 12.0 13.0 14.0 15.0
2023-11-02 16.0 17.0 18.0 19.0
2023-11-03 20.0 21.0 22.0 23.0

6.3.3 Fill different columns with different values (rows are also used similarly)

trans={<!-- -->'B':99,'C':88}
print(df.fillna(value=trans))

The result is

 A B C D
2023-10-29 0 99.0 2.0 3
2023-10-30 4 5.0 88.0 7
2023-10-31 8 9.0 10.0 11
2023-11-01 12 13.0 14.0 15
2023-11-02 16 17.0 18.0 19
2023-11-03 20 21.0 22.0 23

At the same time, limit can be used to limit the number of times each column is replaced.

6.3.4 Use the mean( ) method to fill in missing values

If only one column is specified, all missing values will be filled with the mean of that column.

print(df.fillna(df.mean()['A']))

The result is

 A B C D
2023-10-29 0 10.0 2.0 3
2023-10-30 4 5.0 10.0 7
2023-10-31 8 9.0 10.0 11
2023-11-01 12 13.0 14.0 15
2023-11-02 16 17.0 18.0 19
2023-11-03 20 21.0 22.0 23

If multiple columns are specified, each column will be filled with the average value of that column.

print(df.fillna(df.mean()['A':'B']))

The result is

 A B C D
2023-10-29 0 13.0 2.0 3
2023-10-30 4 5.0 13.2 7
2023-10-31 8 9.0 10.0 11
2023-11-01 12 13.0 14.0 15
2023-11-02 16 17.0 18.0 19
2023-11-03 20 21.0 22.0 23

If not specified, all columns will be filled with the average value of the column.

print(df.fillna(df.mean()))

7. How to find the sum, average, maximum and minimum value of data

import numpy as np
d=np.random.random((2,4))

d is a random number with data size between 0 and 1

The output is

[[0.3069221 0.5959544 0.13120364 0.24391419]
 [0.47460634 0.29857938 0.45013492 0.59576954]]

print(np.sum(d,axis=0)) # The sum of a certain column
print(np.sum(d,axis=1)) #The sum of a certain row
print(np.sum(d)) #The sum of all data
print(np.average(d,axis=0)) #The average of a certain column
print(np.average(d,axis=1)) #The average value of a certain row
print(np.average(d)) #Average of all data

The output is

[0.78152843 0.89453377 0.58133856 0.83968373]
[1.27799433 1.81909017]
3.0970845024684874
[0.39076422 0.44726689 0.29066928 0.41984187]
[0.31949858 0.45477254]
0.38713556280856093

print(np.max(d,axis=0)) #The maximum value of a column
print(np.max(d,axis=1)) #The maximum value of a certain row
print(np.max(d)) #The maximum value of all data
print(np.min(d,axis=0)) #The minimum value of a certain column
print(np.min(d,axis=1)) #The minimum value of a certain row
print(np.min(d)) #Minimum value of all data

The output is

[0.47460634 0.5959544 0.45013492 0.59576954]
[0.5959544 0.59576954]
0.5959543966467485
[0.3069221 0.29857938 0.13120364 0.24391419]
[0.13120364 0.29857938]
0.13120364028532616

Index of maximum and minimum values

A=np.arange(2,14).reshape(3,4)
print(np.argmin(A)) # Index of the minimum value in the matrix
print(np.argmax(A)) # Index of the maximum value in the matrix

The result is

0
11

8. How to merge two Dataframes

8.1concat

8.1.1 columns have the same names

df_1=pd.DataFrame(np.ones((3,4))*0,columns=['a','b','c','d'])
df_2=pd.DataFrame(np.ones((3,4))*1,columns=['a','b','c','d'])
df_3=pd.DataFrame(np.ones((3,4))*2,columns=['a','b','c','d'])
# axis=0 is to merge up and down axis=1 is to merge left and right ignore_index=True is to remove the original index
res=pd.concat([df_1,df_2,df_3],axis=0,ignore_index=True)

The result is

 a b c d
0 0.0 0.0 0.0 0.0
1 0.0 0.0 0.0 0.0
2 0.0 0.0 0.0 0.0
3 1.0 1.0 1.0 1.0
4 1.0 1.0 1.0 1.0
5 1.0 1.0 1.0 1.0
6 2.0 2.0 2.0 2.0
7 2.0 2.0 2.0 2.0
8 2.0 2.0 2.0 2.0

The names of columns 8.1.2 are different

df_4=pd.DataFrame(np.ones((3,4))*0,columns=['a','b','c','d'], index=[1,2,3])
df_5=pd.DataFrame(np.ones((3,4))*1,columns=['b','c','d','e'],index=[ 2,3,4])
res_1=pd.concat([df_4,df_5],join='outer')
res_2=pd.concat([df_4,df_5],join='inner',ignore_index=True)

join,[inner’, outer’]

join=’outer’ changes the elements of non-existent columns to NaN

join=’inner’ only retains elements with the same name in the data

The result is

 a b c d e
1 0.0 0.0 0.0 0.0 NaN
2 0.0 0.0 0.0 0.0 NaN
3 0.0 0.0 0.0 0.0 NaN
2 NaN 1.0 1.0 1.0 1.0
3 NaN 1.0 1.0 1.0 1.0
4 NaN 1.0 1.0 1.0 1.0
     b c d
0 0.0 0.0 0.0
1 0.0 0.0 0.0
2 0.0 0.0 0.0
3 1.0 1.0 1.0
4 1.0 1.0 1.0
5 1.0 1.0 1.0

8.2merge

In merge, on=’ ’ is the parameter based on which the merge is based.

8.21 is based on a parameter

left=pd.DataFrame({<!-- -->'key':['K0','K1','K2','K3'],
                   'A':['A0','A1','A2','A3'],
                   'B':['B0','B1','B2','B3']})
right=pd.DataFrame({<!-- -->'key':['K0','K1','K2','K3'],
                   'C':['C0','C1','C2','C3'],
                   'D':['D0','D1','D2','D3']})

At this time, left and right are respectively

 key A B
0 K0 A0 B0
1 K1 A1 B1
2 K2 A2 B2
3 K3 A3 B3
  key C D
0 K0 C0 D0
1 K1 C1 D1
2 K2 C2 D2
3 K3 C3 D3

merge

res_1=pd.merge(left,right,on='key')

The result is

 key A B C D
0 K0 A0 B0 C0 D0
1 K1 A1 B1 C1 D1
2 K2 A2 B2 C2 D2
3 K3 A3 B3 C3 D3

If the two key contents are not exactly the same

For example, change right to

right=pd.DataFrame({<!-- -->'key':['K0','K1','K2','K4'],
                   'C':['C0','C1','C2','C3'],
                   'D':['D0','D1','D2','D3']})

res_2=pd.merge(left,right,on='key',how='inner')
res_3=pd.merge(left,right,on='key',how='outer')

The result is

 key A B C D
0 K0 A0 B0 C0 D0
1 K1 A1 B1 C1 D1
2 K2 A2 B2 C2 D2
  key A B C D
0 K0 A0 B0 C0 D0
1 K1 A1 B1 C1 D1
2 K2 A2 B2 C2 D2
3 K3 A3 B3 NaN NaN
4 K4 NaN NaN C3 D3

8.2.2 is based on two parameters

left=pd.DataFrame({<!-- -->'key1':['K0','K0','K1','K2'],
                     'key2':['K0','K1','K0','K1'],
                   'A':['A0','A1','A2','A3'],
                   'B':['B0','B1','B2','B3']})
right=pd.DataFrame({<!-- -->'key1':['K0','K1','K1','K2'],
                    'key2':['K0','K0','K0','K0'],
                   'C':['C0','C1','C2','C3'],
                   'D':['D0','D1','D2','D3']})

The output is

 key1 key2 A B
0 K0 K0 A0 B0
1 K0 K1 A1 B1
2 K1 K0 A2 B2
3 K2 K1 A3 B3
  key1 key2 C D
0 K0 K0 C0 D0
1 K1 K0 C1 D1
2 K1 K0 C2 D2
3 K2 K0 C3 D3

how=[left’ right’ outer’ inner’]

left: k1k2 combination of left data right: k1k2 combination of right data outer: all k1k2 combinations inner: common k1k2 combinations

res=pd.merge(left,right,on=['key1','key2'],how='right')

The result is

 key1 key2 A B C D
0 K0 K0 A0 B0 C0 D0
1 K1 K0 A2 B2 C1 D1
2 K1 K0 A2 B2 C2 D2
3 K2 K0 NaN NaN C3 D3

8.2.3 Show how data is composed

Default indicator=False

df_1=pd.DataFrame({<!-- -->'col1':[0,1],'col_left':['a','b']})
df_2=pd.DataFrame({<!-- -->'col1':[1,2,2],'col_right':[2,2,2]})

The output is

 col1 col_left
0 0 a
1 1 b
   col1 col_right
0 1 2
1 2 2
2 2 2

merge

res=pd.merge(df_1,df_2,on='col1',how='outer',indicator=True)

The output is

 col1 col_left col_right _merge
0 0 a NaN left_only
1 1 b 2.0 both
2 2 NaN 2.0 right_only
3 2 NaN 2.0 right_only

You can change the displayed name, such as changing ‘_merge’ to ‘indicator_column’

res=pd.merge(df_1,df_2,on='col1',how='outer',indicator='indicator_column')

The output is

 col1 col_left col_right indicator_column
0 0 a NaN left_only
1 1 b 2.0 both
2 2 NaN 2.0 right_only
3 2 NaN 2.0 right_only

8.2.4 Consider row name merging

left_1=pd.DataFrame({<!-- -->'A':['A0','A1','A2'],
                   'B':['B0','B1','B2']},
                  index=['K0','K1','K2'])
right_1=pd.DataFrame({<!-- -->'C':['C0','C2','C3'],
                    'D':['D0','D2','D3']},
                   index=['K0','K2','K3'])

The output is

 A B
K0 A0 B0
K1 A1 B1
K2 A2 B2
     C D
K0 C0 D0
K2 C2 D2
K3 C3 D3

merge

res_5=pd.merge(left_1,right_1,left_index=True,right_index=True,how='outer')

The result is

 A B C D
K0 A0 B0 C0 D0
K1 A1 B1 NaN NaN
K2 A2 B2 C2 D2
K3 NaN NaN C3 D3

8.25 Make columns with the same name in the original data input with different names

boys=pd.DataFrame({<!-- -->'k':['K0','K1','K2'],'age':[ 1,2,3]})
girls=pd.DataFrame({<!-- -->'k':['K0','K0','K3'],'age':[4,5 ,6]})

The output is

 k age
0 K0 1
1 K1 2
2 K2 3
    age
0 K0 4
1 K0 5
2 K3 6

merge

res=pd.merge(boys,girls,on='k',suffixes=['_boy','_girl'],how='outer')

The result is

k age_boy age_girl
0 K0 1.0 4.0
1 K0 1.0 5.0
2 K1 2.0 NaN
3 K2 3.0 NaN
4 K3 NaN 6.0

9. How to sort data

df=pd.DataFrame({<!-- -->'A':1.,
                 'B':pd.Timestamp('20231025'),
                 'C':np.array([3]*4,dtype='int64'),
                 'D':pd.Categorical(['test','train','test','train']),
                 'E':[2,5,1,3]})

The output is

 A B C D E
0 1.0 2023-10-25 3 test 2
1 1.0 2023-10-25 3 train 5
2 1.0 2023-10-25 3 test 1
3 1.0 2023-10-25 3 train 3

9.1 Sort by row name or column name

axis=0/1 sort on rows/columns ascending=True/False forward/reverse order

print(df.sort_index(axis=1,ascending=False))
print(df.sort_index(axis=0, ascending=False))

The output is

 E D C B A
0 2 test 3 2023-10-25 1.0
1 5 train 3 2023-10-25 1.0
2 1 test 3 2023-10-25 1.0
3 3 train 3 2023-10-25 1.0
     A B C D E
3 1.0 2023-10-25 3 train 3
2 1.0 2023-10-25 3 test 1
1 1.0 2023-10-25 3 train 5
0 1.0 2023-10-25 3 test 2

9.2 Sort by the value of a specific row/column

print(df.sort_values(by='E'))

The result is

 A B C D E
2 1.0 2023-10-25 3 test 1
0 1.0 2023-10-25 3 test 2
3 1.0 2023-10-25 3 train 3
1 1.0 2023-10-25 3 train 5

9.3 Extract a sorted column

print(df.sort_values(by='E'))['E']

The result is

2 1
0 2
3 3
1 5
Name: E, dtype: int64

10. How to implement group summation of Dataframe

df = pd.DataFrame({<!-- -->"Fruits":["apple","banana","apple","pear"," apple","banana"],"Numbers_1":[5,8,9,3,4,5],'Numbers_2':[11,22,33,44,55,66] })

The output is

 Fruits Numbers_1 Numbers_2
0 apple 5 11
1 banana 8 22
2 apples 9 33
3 pear 3 44
4 apple 4 55
5 banana 5 66

10.1 Use the sum() function directly

df_1=df.groupby(['Fruits'])['Numbers_1'].sum()

The result is

Fruits
Apple 18
banana 13
pear 3
Name: Numbers_1, dtype: int64

The result obtained has only the index, no column names, and is of Series type.

10.2 Using the aggregate function agg

df_2=df.groupby(['Fruits']).agg({<!-- -->'Numbers_2':'sum'})

agg({‘column name/row name’:’function name’})

The result is

Numbers_2
Fruits
Apple 99
banana 88
pear 44
Index(['Numbers_2'], dtype='object')

The result has both index and column name, which is of DataFrame type.

11 Drawing

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pylab import mpl
# Change the default font to Chinese
mpl.rcParams['font.sans-serif'] = ['SimHei']

11.1 Scatter plot

# Scatter plot scatter() function
# scatter(x, y, scale, color, marker, label)
n=1024
x1=np.random.normal(0,1,n)
y1=np.random.normal(0,1,n)
# If there is no marker, use small dots to mark it.
plt.scatter(x1,y1,color='blue',marker='*',label='normal distribution')
plt.title('Standard Normal Distribution',fontsize=20)
# Solve the problem of negative signs not being displayed
plt.rcParams['axes.unicode_minus']=False
# Set text
plt.text(2.5,2.5,'Mean: 0\\
Standard deviation: 1')
#Set the coordinate axis range
plt.xlim(-4,4)
plt.ylim(-4,4)
# Set axis label text
plt.xlabel('abscissa x',fontsize=14)
plt.ylabel('vertical coordinate y',fontsize=14)
# Evenly distributed
x2 = np.random.uniform(-4, 4, (1, n))
y2 = np.random.uniform(-4, 4, (1, n))
plt.scatter(x2,y2,color='yellow',label='uniformly distributed')
plt.legend()
plt.show()

The output is

11.2 Line chart

# Line chart plot function()
# plot(x, y, cplor, marker, label, linewidth, markersize)
#Generate random number sequence
n=24
y3=np.random.randint(27,37,n)
y4=np.random.randint(40,60,n)
plt.plot(y3,label='Temperature')
plt.plot(y4,label='Humidity')
# Axis range
plt.xlim(0,23)
plt.ylim(20,70)
# Axis label text
plt.xlabel('hour',fontsize=12)
plt.ylabel('Measurement value',fontsize=12)
plt.title('24-hour temperature statistics',fontsize=16)
plt.legend()
plt.show()

The output is

11.3 Histogram

11.3.1 Ordinary bar chart

# bar(x, left, height, width, facecolor, edgecolor, label)
y1=[32, 25, 16, 30, 24, 45, 40, 33, 28, 17, 24, 20]
y2=[-23, -35, -26, -35, -45, -43, -35, -32, -23, -17, -22, -28]
# Stripe left coordinate
plt.bar(range(len(y1)),y1,width=0.8,facecolor='green',edgecolor='white',label='statistic 1')
plt.bar(range(len(y2)),y2,width=0.8,facecolor='red',edgecolor='white',label='statistic 2')
plt.title('Bar chart',fontsize=20)
plt.legend()
plt.show()

The output is

11.3.2 Side-by-side histogram

label_list=['2020','2021','2022','2023']
num_list_1=[20,30,15,35]
num_list_2=[15,30,40,20]
x=range(len(num_list_1))
rects1=plt.bar(x=x,height=num_list_1,width=0.4,alpha=0.8,color='red',label='Part of')
rects2=plt.bar(x=[i + 0.4 for i in x],height=num_list_2,width=0.4,color='green',label='Second Department')
plt.ylim(0,50) #y-axis value range
plt.ylabel('Quantity')
#Set the x-axis scale display value, Parameter 1: Midpoint coordinate Parameter 2: Display value
plt.xticks([index + 0.2 for index in x],label_list)
plt.xlabel('Year')
plt.title('XX company')
plt.legend() # Set caption
# Edit text
for rect in rects1:
    height=rect.get_height()
    plt.text(rect.get_x() + rect.get_width()/2,height + 1,str(height),ha='center',va='bottom')
for rect in rects2:
    height=rect.get_height()
    plt.text(rect.get_x() + rect.get_width()/2,height + 1,str(height),ha='center',va='bottom')
plt.show()

The output is

11.3.3 Stacked Column Chart

label_list=['2020','2021','2022','2023']
num_list_1=[20,30,15,35]
num_list_2=[15,30,40,20]
x=range(len(num_list_1))
rects_1=plt.bar(x=x,height=num_list_1,width=0.45,alpha=0.8,color='red',label='part of it')
rects_2=plt.bar(x=x,height=num_list_2,width=0.45,color='green',label='Second Department',bottom=num_list_1)
plt.ylim(0,80)
plt.ylabel('Quantity')
plt.xticks(x,label_list)
plt.xlabel('Year')
plt.title('XX company')
plt.legend()
plt.show()

The output is

11.3.4 Mixed Bar and Line Chart

x=[2,4,6,8]
y=[450,500,200,1000]
# Draw histogram
plt.bar(x=x,height=y,label='Book Library Encyclopedia',color='steelblue',alpha=0.8)
# Display specific values on the histogram. The ha parameter controls the horizontal alignment and va controls the vertical alignment.
for x1,yy in zip(x,y):
    plt.text(x1,yy + 1,str(yy),ha='center',va='bottom',fontsize=20,rotation=0)
# Set title
plt.title('80 Novel Activity')
# Set names for the two axes
plt.xlabel=('Release date')
plt.ylabel=('Number of novels')
# Show legend
plt.legend()
# Draw a line chart
plt.plot(x,y,'r',marker='*',ms=10,label='a')
plt.xticks(rotation=45)
plt.legend(loc='upper left')
plt.savefig('a.jpg')
plt.show()

The output is

11.4 Pie Chart

pythonlabels=['Entertainment','Parenting','Food','Mortgage','Transportation','Others']
sizes=[2,5,12,70,2,9]
# Whether each pie chart moves away from the center
explode=(0,0,0,0.1,0,0) # The 4th pie chart moves away from the center
colors=['r','g','y','b','r'] # Customize the color list, and finally colors=colors in pie
# autopct controls the percentage settings in the pie chart
# '%1.1f' is to retain one valid value after the decimal point '%1.2f%%' is to retain two decimal points, add the percent sign %
# startangle is the starting drawing angle. The default is to draw counterclockwise from the positive direction of the x-axis. If startangle=90 is set, it will be drawn from the positive direction of the y-axis.
# counterclock specifies the pointer direction, the default is True counterclockwise, False is clockwise
# labeldistance label drawing position, if "1", it will be drawn inside the pie chart, the default value is 1.1
# radius controls the radius of the pie chart, floating point type, optional parameter, defaults to None and is set to 1
# pctdistance specifies the position scale of autopct, the default value is 0.6
# textprops sets the format of labels and proportional text, dictionary type such as textprops={'fontsize':20,'color':'black'}
#Add legend
# loc = 'upper right' is located in the upper right corner
#ncol=2 divide into two columns
# borderaxespad = 0.3 legend padding
# bbox_to_anchor=[0.5, 0.5] # Margin top right
plt.legend(loc='upper right',fontsize=10,bbox_to_anchor=(1.1,1.05),borderaxespad=0.3)
#Make the length and width of the pie chart equal (it seems to work without it)
plt.axis('equal')
plt.pie(sizes,explode=explode,labels=labels,autopct='%1.1f%%',shadow=False,startangle=150)
plt.title('Pie Chart Example-Household Expenditure in October')
plt.show()

The output is