1.The difference between numpy and pandas
Numpy is mostly used to create arrays and perform operations based on matrices.
pandas is mostly used for processing tables and complex data processing
2. How to use pandas to import and export excel and csv files
2.1 Import
import pandas as pd data_1=pd.read_csv('File name.csv') data_2=pd.read_excel('File name.excel')
If there is Chinese content in the data, then
data_1=pd.read_csv('File name.csv',encoding='gb2312') data=_2pd.read_excel('File name.excel',encoding='gb2312')
2.2 Export
data_1.to_csv('File name.csv') data_2.to_excel('File name.excel')
If you do not want to export the row names or column names of the data, then
data.to_excel('File name.excel',index=False,header=None)
index=True means that row names will not be exported, header=None means that column names will not be exported
3. How to create array and Dataframe?
3.1 Create array
3.1.1 Convert other types of data
import numpy as np list_1=[1,2,3,4] tuple_1=[1,2,3,4] a=np.array(list_1) b=np.array(tuple_1)
The result is
[1 2 3 4] [1 2 3 4]
You can also send multiple lines at the same time
c=np.array([list_1,tuple_1])
The result is
[[1 2 3 4] [1 2 3 4]]
3.1.2 Direct input
c=np.array([[1,2,3,4], [1,2,3,4]])
The result is
[[1 2 3 4] [1 2 3 4]]
3.1.3 Generate a matrix whose data is integer
d=np.ones((3,4))
(3,4) refers to three rows and four columns
The result is
[[1. 1. 1. 1.] [1. 1. 1. 1.] [1. 1. 1. 1.]]
After adding dtype=’int’
d=np.ones((3,4),dtype='int')
The result is
[[1 1 1 1] [1 1 1 1] [1 1 1 1]]
dtype is the data type, the default is ‘float64’
3.1.4 Generating a matrix of numbers almost 0
e=np.empty((3,4))
The result is
[[3.70908653e + 006 7.89614618e + 150 1.75556682e + 194 3.93173170e + 092] [3.29368397e + 180 2.13780983e + 161 7.89614586e + 150 1.88557870e + 122] [8.50322469e + 102 3.18287651e-023 1.51359104e-013 5.34039795e-307]]
3.1.5 Generate a matrix with random element sizes between 0 and 1
f=np.random.random((2,4))
The result is
[[0.83300786 0.42734569 0.59351207 0.45089377] [0.96777171 0.87880651 0.93020442 0.66950248]]
3.1.6 Generate a matrix whose diagonal elements are 1 and the remaining elements are 0
g=np.eye(4)
The result is
[[1. 0. 0. 0.] [0. 1. 0. 0.] [0. 0. 1. 0.] [0. 0. 0. 1.]]
3.1.7arange
h=np.arange(12).reshape(3,4)
The result is
[[ 0 1 2 3] [4 5 6 7] [8 9 10 11]]
3.2 Create DataFrame
3.2.1 Conversion List
3.2.1.1 One Dimension
import pandas as pd list_1=[1,2,3,4] df_1=pd.DataFrame(list_1)
The result is
0 0 1 1 2 twenty three 3 4
3.2.1.2 Two Dimensions
list_2=[[1,2,3,4],[1,2,3,4]] df_2=pd.DataFrame(list_2)
The result is
0 1 2 3 0 1 2 3 4 1 1 2 3 4
3.2.1.3 Set row names and column names yourself
data_1=[['Blazar',1],['Mebius',2],['Z',3]] df_1=pd.DataFrame(data_1,columns=['ultraman','cool'],index=[2023,2006,2020])
columns=[] Set the name of the column
index=[] sets the name of the row
The result is
ultraman cool 2023 Blazar 1 2006 Mebius 2 2020 Z 3
3.2.2 Conversion Dictionary
3.2.2.1 Automatically add NaN
data_1=[{<!-- -->'a':1,'b':2},{<!-- -->'a':11,'b ':22,'c':33}] df_1=pd.DataFrame(data_1)
Use the dictionary key as the column header
If there is no element at the corresponding position, it is NaN.
The result is
a b c 0 1 2 NaN 1 5 10 20.0
3.2.2.2 Get elements under a specific header
data_2=[{<!-- -->'a':1,'b':2,'d':3},{<!-- -->'a ':11,'b':22,'c':33}] df_2=pd.DataFrame(data_2,index=['first','second'],columns=['a','b','d'])
The result is
a b d first 1 2 3.0 second 11 22 NaN
4. How to extract data from a certain row and column of Dataframe?
4.1 By row/column name
import pandas as pd dates=pd.date_range('20231029',periods=6) df=pd.DataFrame(np.arange(24).reshape(6,4),index=dates,columns=['A','B','C','D' ])
The output is
A B C D 2023-10-29 0 1 2 3 2023-10-30 4 5 6 7 2023-10-31 8 9 10 11 2023-11-01 12 13 14 15 2023-11-02 16 17 18 19 2023-11-03 20 21 22 23
The data of the row/column can be extracted based on the name of the row/column.
List:
print(df['A']) # Equivalent to print(df.A)
The result is
2023-10-29 0 2023-10-30 4 2023-10-31 8 2023-11-01 12 2023-11-02 16 2023-11-03 20 Freq: D, Name: A, dtype: int32
OK:
print(df.loc['20231031']) print(df.loc['20231031',['A','B']])
The result is
A 8 B 9 C 10 D 11 Name: 2023-10-31 00:00:00, dtype: int32 A 8 B 9 Name: 2023-10-31 00:00:00, dtype: int32
4.2 Selection by location
print(df.iloc[3:5,1:3]) # Rows 4 and 5, columns 2 and 3 print(df.iloc[[1,3,5],1:3]) # Rows 2, 4, 6, columns 3 and 4
: Take the left but not the right
If there is no number on one side of : , it means taking all of them on that side.
The result is
B C 2023-11-01 13 14 2023-11-02 17 18 B C 2023-10-30 5 6 2023-11-01 13 14 2023-11-03 21 22
5. How to delete data?
DataFrame.drop(labels=None,axis=0, index=None, columns=None, inplace=False)
labels: rows or columns to be deleted, given as a list
axis: The default is 0, which means rows are to be deleted. When deleting columns, axis must be specified as 1.
index: Directly specify the rows to be deleted. To delete multiple rows, you can use a list as a parameter.
Columns: Directly specify the columns to be deleted. To delete multiple columns, you can use a list as a parameter.
inplace: Default is False, the deletion operation does not change the original data; when inplace = True, the original data is changed
5.1 Specify through the parameters labels and axis
dates=pd.date_range('20231029',periods=6) df=pd.DataFrame(np.arange(24).reshape(6,4),index=dates,columns=['A','B','C','D' ]) df_1=df.drop('A',axis=1,inplace=False) df_2=df.drop('20231030',axis=0,inplace=False)
The result is
B C D 2023-10-29 1 2 3 2023-10-30 5 6 7 2023-10-31 9 10 11 2023-11-01 13 14 15 2023-11-02 17 18 19 2023-11-03 21 22 23 A B C D 2023-10-29 0 1 2 3 2023-10-31 8 9 10 11 2023-11-01 12 13 14 15 2023-11-02 16 17 18 19 2023-11-03 20 21 22 23
5.2 Specify through index and columns
df_3=df.drop(columns=['A','C'],inplace=False) df_4=df.drop(index=['20231101','20231102'],inplace=False)
The result is
B D 2023-10-29 1 3 2023-10-30 5 7 2023-10-31 9 11 2023-11-01 13 15 2023-11-02 17 19 2023-11-03 21 23 A B C D 2023-10-29 0 1 2 3 2023-10-30 4 5 6 7 2023-10-31 8 9 10 11 2023-11-03 20 21 22 23
6. How to check, delete and fill missing values in data
Create a matrix
dates=pd.date_range('20231029',periods=6) df=pd.DataFrame(np.arange(24).reshape(6,4),index=dates,columns=['A','B','C','D' ]) df.iloc[1,2]=np.nan df.iloc[0,1]=np.nan
The output is
A B C D 2023-10-29 0 NaN 2.0 3 2023-10-30 4 5.0 NaN 7 2023-10-31 8 9.0 10.0 11 2023-11-01 12 13.0 14.0 15 2023-11-02 16 17.0 18.0 19 2023-11-03 20 21.0 22.0 23
6.1 Inspection
6.1.1 Return results in matrix form
print(df.isnull())
Return True if NaN
A B C D 2023-10-29 False True False False 2023-10-30 False False True False 2023-10-31 False False False False 2023-11-01 False False False False 2023-11-02 False False False False 2023-11-03 False False False False
6.1.2 In the form of an element
Returns True if at least one is None
print(np.any(df.isnull()))
The result is
True
6.2 Delete
print(df.dropna(axis=0,how='any')) # Delete the row where the missing data is located print(df.dropna(axis=1,how='any')) # Delete the column where the missing data is located
Determine whether the data is lost and delete the row/column where the lost data is located.
how=any/all’ determines whether to delete NaN or delete all NaN
The result is
A B C D 2023-10-31 8 9.0 10.0 11 2023-11-01 12 13.0 14.0 15 2023-11-02 16 17.0 18.0 19 2023-11-03 20 21.0 22.0 23 A D 2023-10-29 0 3 2023-10-30 4 7 2023-10-31 8 11 2023-11-01 12 15 2023-11-02 16 19 2023-11-03 20 23
6.3 Filling
6.3.1 Fill with specified value
print(df.fillna(value=0))
The result is
A B C D 2023-10-29 0 0.0 2.0 3 2023-10-30 4 5.0 0.0 7 2023-10-31 8 9.0 10.0 11 2023-11-01 12 13.0 14.0 15 2023-11-02 16 17.0 18.0 19 2023-11-03 20 21.0 22.0 23
6.3.2 Replace missing values with the value before them
If axis =1, then the previous value in the horizontal direction replaces the missing value in the back. If axis=0, then the upper value replaces the lower missing value.
print(df.ffill(axis=1))
The result is
A B C D 2023-10-29 0.0 0.0 2.0 3.0 2023-10-30 4.0 5.0 5.0 7.0 2023-10-31 8.0 9.0 10.0 11.0 2023-11-01 12.0 13.0 14.0 15.0 2023-11-02 16.0 17.0 18.0 19.0 2023-11-03 20.0 21.0 22.0 23.0
6.3.3 Fill different columns with different values (rows are also used similarly)
trans={<!-- -->'B':99,'C':88} print(df.fillna(value=trans))
The result is
A B C D 2023-10-29 0 99.0 2.0 3 2023-10-30 4 5.0 88.0 7 2023-10-31 8 9.0 10.0 11 2023-11-01 12 13.0 14.0 15 2023-11-02 16 17.0 18.0 19 2023-11-03 20 21.0 22.0 23
At the same time, limit can be used to limit the number of times each column is replaced.
6.3.4 Use the mean( ) method to fill in missing values
If only one column is specified, all missing values will be filled with the mean of that column.
print(df.fillna(df.mean()['A']))
The result is
A B C D 2023-10-29 0 10.0 2.0 3 2023-10-30 4 5.0 10.0 7 2023-10-31 8 9.0 10.0 11 2023-11-01 12 13.0 14.0 15 2023-11-02 16 17.0 18.0 19 2023-11-03 20 21.0 22.0 23
If multiple columns are specified, each column will be filled with the average value of that column.
print(df.fillna(df.mean()['A':'B']))
The result is
A B C D 2023-10-29 0 13.0 2.0 3 2023-10-30 4 5.0 13.2 7 2023-10-31 8 9.0 10.0 11 2023-11-01 12 13.0 14.0 15 2023-11-02 16 17.0 18.0 19 2023-11-03 20 21.0 22.0 23
If not specified, all columns will be filled with the average value of the column.
print(df.fillna(df.mean()))
7. How to find the sum, average, maximum and minimum value of data
import numpy as np d=np.random.random((2,4))
d is a random number with data size between 0 and 1
The output is
[[0.3069221 0.5959544 0.13120364 0.24391419] [0.47460634 0.29857938 0.45013492 0.59576954]]
print(np.sum(d,axis=0)) # The sum of a certain column print(np.sum(d,axis=1)) #The sum of a certain row print(np.sum(d)) #The sum of all data print(np.average(d,axis=0)) #The average of a certain column print(np.average(d,axis=1)) #The average value of a certain row print(np.average(d)) #Average of all data
The output is
[0.78152843 0.89453377 0.58133856 0.83968373] [1.27799433 1.81909017] 3.0970845024684874 [0.39076422 0.44726689 0.29066928 0.41984187] [0.31949858 0.45477254] 0.38713556280856093
print(np.max(d,axis=0)) #The maximum value of a column print(np.max(d,axis=1)) #The maximum value of a certain row print(np.max(d)) #The maximum value of all data print(np.min(d,axis=0)) #The minimum value of a certain column print(np.min(d,axis=1)) #The minimum value of a certain row print(np.min(d)) #Minimum value of all data
The output is
[0.47460634 0.5959544 0.45013492 0.59576954] [0.5959544 0.59576954] 0.5959543966467485 [0.3069221 0.29857938 0.13120364 0.24391419] [0.13120364 0.29857938] 0.13120364028532616
Index of maximum and minimum values
A=np.arange(2,14).reshape(3,4) print(np.argmin(A)) # Index of the minimum value in the matrix print(np.argmax(A)) # Index of the maximum value in the matrix
The result is
0 11
8. How to merge two Dataframes
8.1concat
8.1.1 columns have the same names
df_1=pd.DataFrame(np.ones((3,4))*0,columns=['a','b','c','d']) df_2=pd.DataFrame(np.ones((3,4))*1,columns=['a','b','c','d']) df_3=pd.DataFrame(np.ones((3,4))*2,columns=['a','b','c','d']) # axis=0 is to merge up and down axis=1 is to merge left and right ignore_index=True is to remove the original index res=pd.concat([df_1,df_2,df_3],axis=0,ignore_index=True)
The result is
a b c d 0 0.0 0.0 0.0 0.0 1 0.0 0.0 0.0 0.0 2 0.0 0.0 0.0 0.0 3 1.0 1.0 1.0 1.0 4 1.0 1.0 1.0 1.0 5 1.0 1.0 1.0 1.0 6 2.0 2.0 2.0 2.0 7 2.0 2.0 2.0 2.0 8 2.0 2.0 2.0 2.0
The names of columns 8.1.2 are different
df_4=pd.DataFrame(np.ones((3,4))*0,columns=['a','b','c','d'], index=[1,2,3]) df_5=pd.DataFrame(np.ones((3,4))*1,columns=['b','c','d','e'],index=[ 2,3,4]) res_1=pd.concat([df_4,df_5],join='outer') res_2=pd.concat([df_4,df_5],join='inner',ignore_index=True)
join,[inner’, outer’]
join=’outer’ changes the elements of non-existent columns to NaN
join=’inner’ only retains elements with the same name in the data
The result is
a b c d e 1 0.0 0.0 0.0 0.0 NaN 2 0.0 0.0 0.0 0.0 NaN 3 0.0 0.0 0.0 0.0 NaN 2 NaN 1.0 1.0 1.0 1.0 3 NaN 1.0 1.0 1.0 1.0 4 NaN 1.0 1.0 1.0 1.0 b c d 0 0.0 0.0 0.0 1 0.0 0.0 0.0 2 0.0 0.0 0.0 3 1.0 1.0 1.0 4 1.0 1.0 1.0 5 1.0 1.0 1.0
8.2merge
In merge, on=’ ’ is the parameter based on which the merge is based.
8.21 is based on a parameter
left=pd.DataFrame({<!-- -->'key':['K0','K1','K2','K3'], 'A':['A0','A1','A2','A3'], 'B':['B0','B1','B2','B3']}) right=pd.DataFrame({<!-- -->'key':['K0','K1','K2','K3'], 'C':['C0','C1','C2','C3'], 'D':['D0','D1','D2','D3']})
At this time, left and right are respectively
key A B 0 K0 A0 B0 1 K1 A1 B1 2 K2 A2 B2 3 K3 A3 B3 key C D 0 K0 C0 D0 1 K1 C1 D1 2 K2 C2 D2 3 K3 C3 D3
merge
res_1=pd.merge(left,right,on='key')
The result is
key A B C D 0 K0 A0 B0 C0 D0 1 K1 A1 B1 C1 D1 2 K2 A2 B2 C2 D2 3 K3 A3 B3 C3 D3
If the two key contents are not exactly the same
For example, change right to
right=pd.DataFrame({<!-- -->'key':['K0','K1','K2','K4'], 'C':['C0','C1','C2','C3'], 'D':['D0','D1','D2','D3']})
res_2=pd.merge(left,right,on='key',how='inner') res_3=pd.merge(left,right,on='key',how='outer')
The result is
key A B C D 0 K0 A0 B0 C0 D0 1 K1 A1 B1 C1 D1 2 K2 A2 B2 C2 D2 key A B C D 0 K0 A0 B0 C0 D0 1 K1 A1 B1 C1 D1 2 K2 A2 B2 C2 D2 3 K3 A3 B3 NaN NaN 4 K4 NaN NaN C3 D3
8.2.2 is based on two parameters
left=pd.DataFrame({<!-- -->'key1':['K0','K0','K1','K2'], 'key2':['K0','K1','K0','K1'], 'A':['A0','A1','A2','A3'], 'B':['B0','B1','B2','B3']}) right=pd.DataFrame({<!-- -->'key1':['K0','K1','K1','K2'], 'key2':['K0','K0','K0','K0'], 'C':['C0','C1','C2','C3'], 'D':['D0','D1','D2','D3']})
The output is
key1 key2 A B 0 K0 K0 A0 B0 1 K0 K1 A1 B1 2 K1 K0 A2 B2 3 K2 K1 A3 B3 key1 key2 C D 0 K0 K0 C0 D0 1 K1 K0 C1 D1 2 K1 K0 C2 D2 3 K2 K0 C3 D3
how=[left’ right’ outer’ inner’]
left: k1k2 combination of left data right: k1k2 combination of right data outer: all k1k2 combinations inner: common k1k2 combinations
res=pd.merge(left,right,on=['key1','key2'],how='right')
The result is
key1 key2 A B C D 0 K0 K0 A0 B0 C0 D0 1 K1 K0 A2 B2 C1 D1 2 K1 K0 A2 B2 C2 D2 3 K2 K0 NaN NaN C3 D3
8.2.3 Show how data is composed
Default indicator=False
df_1=pd.DataFrame({<!-- -->'col1':[0,1],'col_left':['a','b']}) df_2=pd.DataFrame({<!-- -->'col1':[1,2,2],'col_right':[2,2,2]})
The output is
col1 col_left 0 0 a 1 1 b col1 col_right 0 1 2 1 2 2 2 2 2
merge
res=pd.merge(df_1,df_2,on='col1',how='outer',indicator=True)
The output is
col1 col_left col_right _merge 0 0 a NaN left_only 1 1 b 2.0 both 2 2 NaN 2.0 right_only 3 2 NaN 2.0 right_only
You can change the displayed name, such as changing ‘_merge’ to ‘indicator_column’
res=pd.merge(df_1,df_2,on='col1',how='outer',indicator='indicator_column')
The output is
col1 col_left col_right indicator_column 0 0 a NaN left_only 1 1 b 2.0 both 2 2 NaN 2.0 right_only 3 2 NaN 2.0 right_only
8.2.4 Consider row name merging
left_1=pd.DataFrame({<!-- -->'A':['A0','A1','A2'], 'B':['B0','B1','B2']}, index=['K0','K1','K2']) right_1=pd.DataFrame({<!-- -->'C':['C0','C2','C3'], 'D':['D0','D2','D3']}, index=['K0','K2','K3'])
The output is
A B K0 A0 B0 K1 A1 B1 K2 A2 B2 C D K0 C0 D0 K2 C2 D2 K3 C3 D3
merge
res_5=pd.merge(left_1,right_1,left_index=True,right_index=True,how='outer')
The result is
A B C D K0 A0 B0 C0 D0 K1 A1 B1 NaN NaN K2 A2 B2 C2 D2 K3 NaN NaN C3 D3
8.25 Make columns with the same name in the original data input with different names
boys=pd.DataFrame({<!-- -->'k':['K0','K1','K2'],'age':[ 1,2,3]}) girls=pd.DataFrame({<!-- -->'k':['K0','K0','K3'],'age':[4,5 ,6]})
The output is
k age 0 K0 1 1 K1 2 2 K2 3 age 0 K0 4 1 K0 5 2 K3 6
merge
res=pd.merge(boys,girls,on='k',suffixes=['_boy','_girl'],how='outer')
The result is
k age_boy age_girl 0 K0 1.0 4.0 1 K0 1.0 5.0 2 K1 2.0 NaN 3 K2 3.0 NaN 4 K3 NaN 6.0
9. How to sort data
df=pd.DataFrame({<!-- -->'A':1., 'B':pd.Timestamp('20231025'), 'C':np.array([3]*4,dtype='int64'), 'D':pd.Categorical(['test','train','test','train']), 'E':[2,5,1,3]})
The output is
A B C D E 0 1.0 2023-10-25 3 test 2 1 1.0 2023-10-25 3 train 5 2 1.0 2023-10-25 3 test 1 3 1.0 2023-10-25 3 train 3
9.1 Sort by row name or column name
axis=0/1 sort on rows/columns ascending=True/False forward/reverse order
print(df.sort_index(axis=1,ascending=False)) print(df.sort_index(axis=0, ascending=False))
The output is
E D C B A 0 2 test 3 2023-10-25 1.0 1 5 train 3 2023-10-25 1.0 2 1 test 3 2023-10-25 1.0 3 3 train 3 2023-10-25 1.0 A B C D E 3 1.0 2023-10-25 3 train 3 2 1.0 2023-10-25 3 test 1 1 1.0 2023-10-25 3 train 5 0 1.0 2023-10-25 3 test 2
9.2 Sort by the value of a specific row/column
print(df.sort_values(by='E'))
The result is
A B C D E 2 1.0 2023-10-25 3 test 1 0 1.0 2023-10-25 3 test 2 3 1.0 2023-10-25 3 train 3 1 1.0 2023-10-25 3 train 5
9.3 Extract a sorted column
print(df.sort_values(by='E'))['E']
The result is
2 1 0 2 3 3 1 5 Name: E, dtype: int64
10. How to implement group summation of Dataframe
df = pd.DataFrame({<!-- -->"Fruits":["apple","banana","apple","pear"," apple","banana"],"Numbers_1":[5,8,9,3,4,5],'Numbers_2':[11,22,33,44,55,66] })
The output is
Fruits Numbers_1 Numbers_2 0 apple 5 11 1 banana 8 22 2 apples 9 33 3 pear 3 44 4 apple 4 55 5 banana 5 66
10.1 Use the sum() function directly
df_1=df.groupby(['Fruits'])['Numbers_1'].sum()
The result is
Fruits Apple 18 banana 13 pear 3 Name: Numbers_1, dtype: int64
The result obtained has only the index, no column names, and is of Series type.
10.2 Using the aggregate function agg
df_2=df.groupby(['Fruits']).agg({<!-- -->'Numbers_2':'sum'})
agg({‘column name/row name’:’function name’})
The result is
Numbers_2 Fruits Apple 99 banana 88 pear 44 Index(['Numbers_2'], dtype='object')
The result has both index and column name, which is of DataFrame type.
11 Drawing
import pandas as pd import numpy as np import matplotlib.pyplot as plt from pylab import mpl # Change the default font to Chinese mpl.rcParams['font.sans-serif'] = ['SimHei']
11.1 Scatter plot
# Scatter plot scatter() function # scatter(x, y, scale, color, marker, label) n=1024 x1=np.random.normal(0,1,n) y1=np.random.normal(0,1,n) # If there is no marker, use small dots to mark it. plt.scatter(x1,y1,color='blue',marker='*',label='normal distribution') plt.title('Standard Normal Distribution',fontsize=20) # Solve the problem of negative signs not being displayed plt.rcParams['axes.unicode_minus']=False # Set text plt.text(2.5,2.5,'Mean: 0\\ Standard deviation: 1') #Set the coordinate axis range plt.xlim(-4,4) plt.ylim(-4,4) # Set axis label text plt.xlabel('abscissa x',fontsize=14) plt.ylabel('vertical coordinate y',fontsize=14) # Evenly distributed x2 = np.random.uniform(-4, 4, (1, n)) y2 = np.random.uniform(-4, 4, (1, n)) plt.scatter(x2,y2,color='yellow',label='uniformly distributed') plt.legend() plt.show()
The output is
11.2 Line chart
# Line chart plot function() # plot(x, y, cplor, marker, label, linewidth, markersize) #Generate random number sequence n=24 y3=np.random.randint(27,37,n) y4=np.random.randint(40,60,n) plt.plot(y3,label='Temperature') plt.plot(y4,label='Humidity') # Axis range plt.xlim(0,23) plt.ylim(20,70) # Axis label text plt.xlabel('hour',fontsize=12) plt.ylabel('Measurement value',fontsize=12) plt.title('24-hour temperature statistics',fontsize=16) plt.legend() plt.show()
The output is
11.3 Histogram
11.3.1 Ordinary bar chart
# bar(x, left, height, width, facecolor, edgecolor, label) y1=[32, 25, 16, 30, 24, 45, 40, 33, 28, 17, 24, 20] y2=[-23, -35, -26, -35, -45, -43, -35, -32, -23, -17, -22, -28] # Stripe left coordinate plt.bar(range(len(y1)),y1,width=0.8,facecolor='green',edgecolor='white',label='statistic 1') plt.bar(range(len(y2)),y2,width=0.8,facecolor='red',edgecolor='white',label='statistic 2') plt.title('Bar chart',fontsize=20) plt.legend() plt.show()
The output is
11.3.2 Side-by-side histogram
label_list=['2020','2021','2022','2023'] num_list_1=[20,30,15,35] num_list_2=[15,30,40,20] x=range(len(num_list_1)) rects1=plt.bar(x=x,height=num_list_1,width=0.4,alpha=0.8,color='red',label='Part of') rects2=plt.bar(x=[i + 0.4 for i in x],height=num_list_2,width=0.4,color='green',label='Second Department') plt.ylim(0,50) #y-axis value range plt.ylabel('Quantity') #Set the x-axis scale display value, Parameter 1: Midpoint coordinate Parameter 2: Display value plt.xticks([index + 0.2 for index in x],label_list) plt.xlabel('Year') plt.title('XX company') plt.legend() # Set caption # Edit text for rect in rects1: height=rect.get_height() plt.text(rect.get_x() + rect.get_width()/2,height + 1,str(height),ha='center',va='bottom') for rect in rects2: height=rect.get_height() plt.text(rect.get_x() + rect.get_width()/2,height + 1,str(height),ha='center',va='bottom') plt.show()
The output is
11.3.3 Stacked Column Chart
label_list=['2020','2021','2022','2023'] num_list_1=[20,30,15,35] num_list_2=[15,30,40,20] x=range(len(num_list_1)) rects_1=plt.bar(x=x,height=num_list_1,width=0.45,alpha=0.8,color='red',label='part of it') rects_2=plt.bar(x=x,height=num_list_2,width=0.45,color='green',label='Second Department',bottom=num_list_1) plt.ylim(0,80) plt.ylabel('Quantity') plt.xticks(x,label_list) plt.xlabel('Year') plt.title('XX company') plt.legend() plt.show()
The output is
11.3.4 Mixed Bar and Line Chart
x=[2,4,6,8] y=[450,500,200,1000] # Draw histogram plt.bar(x=x,height=y,label='Book Library Encyclopedia',color='steelblue',alpha=0.8) # Display specific values on the histogram. The ha parameter controls the horizontal alignment and va controls the vertical alignment. for x1,yy in zip(x,y): plt.text(x1,yy + 1,str(yy),ha='center',va='bottom',fontsize=20,rotation=0) # Set title plt.title('80 Novel Activity') # Set names for the two axes plt.xlabel=('Release date') plt.ylabel=('Number of novels') # Show legend plt.legend() # Draw a line chart plt.plot(x,y,'r',marker='*',ms=10,label='a') plt.xticks(rotation=45) plt.legend(loc='upper left') plt.savefig('a.jpg') plt.show()
The output is
11.4 Pie Chart
pythonlabels=['Entertainment','Parenting','Food','Mortgage','Transportation','Others'] sizes=[2,5,12,70,2,9] # Whether each pie chart moves away from the center explode=(0,0,0,0.1,0,0) # The 4th pie chart moves away from the center colors=['r','g','y','b','r'] # Customize the color list, and finally colors=colors in pie # autopct controls the percentage settings in the pie chart # '%1.1f' is to retain one valid value after the decimal point '%1.2f%%' is to retain two decimal points, add the percent sign % # startangle is the starting drawing angle. The default is to draw counterclockwise from the positive direction of the x-axis. If startangle=90 is set, it will be drawn from the positive direction of the y-axis. # counterclock specifies the pointer direction, the default is True counterclockwise, False is clockwise # labeldistance label drawing position, if "1", it will be drawn inside the pie chart, the default value is 1.1 # radius controls the radius of the pie chart, floating point type, optional parameter, defaults to None and is set to 1 # pctdistance specifies the position scale of autopct, the default value is 0.6 # textprops sets the format of labels and proportional text, dictionary type such as textprops={'fontsize':20,'color':'black'} #Add legend # loc = 'upper right' is located in the upper right corner #ncol=2 divide into two columns # borderaxespad = 0.3 legend padding # bbox_to_anchor=[0.5, 0.5] # Margin top right plt.legend(loc='upper right',fontsize=10,bbox_to_anchor=(1.1,1.05),borderaxespad=0.3) #Make the length and width of the pie chart equal (it seems to work without it) plt.axis('equal') plt.pie(sizes,explode=explode,labels=labels,autopct='%1.1f%%',shadow=False,startangle=150) plt.title('Pie Chart Example-Household Expenditure in October') plt.show()
The output is