1. Introduction
Matplotlib is a Python 2D plotting library that produces publication-quality graphics in a variety of hardcopy formats and in a cross-platform interactive environment. (Excerpted from Baidu Encyclopedia)
2. Import
Need to import before use
import matplotlib.pyplot as plt
3. Display charts
By default, the image is not displayed directly, and the plt.show() function needs to be called to display the image.
By default, an image is opened in a new window and buttons for operating on the image are provided.
4. Draw linear graph
You need to specify x and y, you can also enter only y, x defaults to 0~n-1
plt.plot(x,y) plt.show()
5. Basic settings
1. Set x-axis/y-axis/title name
plt.xlabel('x',fontsize=18) plt.ylabel('y',fontsize=18) plt.title('title',fontsize=20)
fontsize: Set font size
Fonts can also be rotated:
plt.xticks(rotation=90) #The text on the x-axis is rotated 90 degrees
2. Set the display range of x-axis/y-axis
plt.axis([xmin,xmax,ymin,ymax])
3. Normal display of Chinese characters and negative signs
By default, Chinese characters and negative signs will not be displayed properly. You need to enter the following statement:
plt.rcParams['font.sans-serif']=['SimHei'] #Set the font so that Chinese characters can be displayed normally plt.rcParams['axes.unicode_minus']=False #Making negative signs display normally
4. Character parameters
(1) indicates color
Character parameters | Color |
---|---|
‘b’ | Blue |
‘g’ | Green |
‘r’ | Red |
‘c’ | Cyan |
‘m’ | Magenta |
‘y’ | Yellow |
‘k’ | Black |
‘w’ | White |
(2) represents the type:
Character parameters | Type |
---|---|
‘-‘ | Solid line |
‘–’ | Dotted line |
‘-.’ | Dotted line |
‘: ‘ | Dotted line |
‘. ‘ | Points |
‘, ‘ | Pixels |
‘ o’ | Circular point |
‘v’ | Lower triangular point |
‘^’ | Upper triangular point |
‘<' | Left triangular point |
‘>’ | Right triangle point |
‘s’ | Square point |
‘p’ | Penta Point |
‘*’ | Star Point |
‘h’ | Hexagon point 1 |
‘H’ | Hexagon point 2 |
‘ + ‘ | Plus point |
‘x’ | Multiplication point |
‘D’ | Solid diamond point |
‘d’ | Thin diamond point |
‘_’ | Horizontal point |
‘1’ | Lower triple point |
‘2’ | Upper triple point |
‘3’ | Left trident point |
‘4’ | Right trident point |
Example: draw red dots
plt.plot([1,2,3,4],[1,4,9,16],'ro')
5. Line attributes
(1) Set with character parameters
(2) Use keywords to set
linewidth can change the line width, color can change the line color
example:
plt.plot(x,y,linewidth=4.0,color='r') #Set the line width to 4 and the color to red
(3) Use the return value of plt.plot()
The plot function returns a list of live2D objects, each object represents a pair of input combinations
example:
line1=plt.plot(x,y) line1.set_antialiasoed(False) #Turn off anti-aliasing plt.show()
(4) Use plt.setp()
example:
line=plt.plot(x,y) plt.setp(line,color='g',linewidth=4.0)
6. Sub-picture
(1)figure() function
The figure function will generate a figure with the specified number num
plt.figure(num,figsize=(10,6)) #The number is num, and the size of the figure is set
Note: figure(1) can be omitted
(2)subplot() function
plt.subplot(numrows,numcols,fignum)
Among them, numrows represents the number of rows, numcols represents the number of columns, and fignum is the number of the picture.
The total number of graphs sum=numrows*numcols
When sum<10, the comma in the middle can be omitted
example:
import matplotlib.pyplot as plt plt.figure(figsize=(10,6)) plt.subplot(211) #Equivalent to plt.subplot(2,1,1) plt.plot([0,1,2,3,4],'r-') plt.subplot(212) plt.plot([0,1,2,3,4],[0,3,6,9,12],'b--') plt.show()
As shown in the figure after running:
7. Close warning
import warnings warnings.filterwarnings('ignore')
8. Coordinate system setting
a=plt.gac() #Get the coordinate system a.patch.set_facecolor('gray') #Set the background color a.patch.set_alpha(0.3) #Set the background transparency, the value range is 0~1
Add grid lines to the background:
plt.grid()
9. Add data to the graph
plt.text(x,y,n)
Where (x, y) is the coordinate of the added data, n is the added data
10. Add comments
plt.annotate(text,xy=(x,y),xytest=(x + 10,y + 10),arrowprops=dict(facecolor='black',edgecolor='red'))
Note:
text: added comment content
xy: coordinates of the marked point
xytest: coordinates of annotation content
arrowprops=dict(facecolor=black’,edgecolor=red’): Set the fill color and edge color of the arrow
11. Add legend
plt.legend()
6. Bar chart (bar)
1. Basic format:
plt.bar(x,y)
2. Add data to the graph
plt.text(x,y,n,ha='center',va='bottom')
Where (x, y) is the coordinate of the added data, n is the added data
ha=center’: centered
va=bottom’: the number is above the top of the column
va=top’: the number is below the top of the column
7. Pie chart (pie)
pie (x,explode=None,labels=None,colors=None,autopct=None,petdistance=0.6,shadow=False,labeldistance=1.1,startangle=None,radius=None)
Note:
x: the proportion of (each block), if sum (x)>1, sum (x) will be used to normalize it
labels: (each piece) the explanatory text displayed outside the pie chart
explode: (each block) distance from the center
startangle: starting drawing angle. The default drawing is from the positive direction of the x-axis counterclockwise. If set = 90, it will be drawn from the positive direction of the y-axis.
shadow: whether to shadow
labeldistancelabel: drawing position, proportion relative to the radius, if <1, it is drawn inside the pie chart
autopct controls the percentage setting in the pie chart, you can use the format string or format function
%1.1f refers to the number of digits before and after the decimal point (not padded with spaces)
pctdistance: similar to labeldistance, specifies the position scale of autopct
radius: Control the radius of the pie chart
return value:
If autopct is not set, return ( patches , texts )
If autopct is set, returns ( patches , texts , autotexts )
example:
import matplotlib.pyplot as plt import pandas as pd df=pd.read_excel(r'C:\Users\Gong Xihui\Desktop\data used in the course\film.xlsx') plt.figure(figsize=(10,10)) data=pd.cut(df['rating'],[0,3,5,7,9,10]).value_counts() y=data.values y=y/sum(y) plt.title=('Movie rating ratio') plt.pie(y,labels=data.index,autopct='%.1f.%%',colors='bygr') plt.show()
The output result is
8. Frequency distribution histogram (hist)
plt.hist(arr)
There are many parameters to hist, only the first one is required, the rest are optional
arr: one-dimensional array whose histogram needs to be calculated
bins: Number of columns of the histogram, optional, default is 10
normed: Whether to normalize the resulting histogram vector. Default is 0
facecolor: histogram color
edgecolor: histogram border color
alpha: transparency
histtype: histogram type, bar , barstacked , step , stepfilled
return value:
n: Histogram vector, whether normalized or not is set by parameter normed
bins: Returns the range of each bin
patches: Returns the data contained in each bin, which is a list
example:
import matplotlib.pyplot as plt import pandas as pd df=pd.read_excel(r'C:\Users\Gong Xihui\Desktop\data used in the course\film.xlsx') plt.figure(figsize=(10,10)) plt.hist(df['rating'],bins=20,edgecolor='k') plt.show()
The output is:
9. Dual axis chart
Using twinx():
import matplotlib.pyplot as plt plt.figure(figsize=(10,10)) a=plt.plot([0,1,2,3,4,5],'b-') b=plt.twinx() b.plot([0,1,4,9,16,25],'r--') plt.show()
The output is:
10. Scatter
plt.scatter(x,y,marker='.')
marker can set the shape of scatter points
The correspondence between scatter point shapes and character parameters is shown above.
11. Box plot (boxplot)
1. Introduction
Box-plot, also known as box-and-whisker plot, box plot or box plot, is a statistical chart used to display the dispersion of a set of data. It is named after its shape like a box. It is also often used in various fields and is commonly used in quality management. It is mainly used to reflect the characteristics of the original data distribution, and can also compare the distribution characteristics of multiple groups of data. The method of drawing a boxplot is: first find the median, two quartiles, and upper and lower edge lines of a set of data; then, connect the two quartiles to draw the box; then connect the upper and lower edge lines with the box. Connected, the median is in the middle of the box.
2. Drawing steps
(1) Calculate the upper quartile (Q3), median, and lower quartile (Q1)
(2) Calculate the difference between the upper quartile and the lower quartile, that is, the interquartile difference (IQR, interquartile range) Q3-Q1
(3) Draw the upper and lower ranges of the box plot, with the upper limit being the upper quartile and the lower limit being the lower quartile. Draw a horizontal line at the median position inside the box
(4) Values greater than 1.5 times the interquartile difference of the upper quartile, or values less than 1.5 times the interquartile difference of the lower quartile, are classified as outliers (outliers)
(5) Except for outliers, draw horizontal lines at the two values closest to the upper edge and lower edge as the tentacles of the box plot.
(6) Extreme outliers, that is, outliers that are beyond 3 times the interquartile difference, are represented by solid points; more moderate outliers, that is, outliers that are between 1.5 times and 3 times the interquartile difference, Represented by hollow dots
(7) Add a name, number axis, etc. to the box plot
3. Form
pit . boxplot ( x , notch = None , sm = None , vert = None . whis = None , positions = None , widths = None , patch_artist = None , meanline = None , showmeans = None . showcaps = None , showbox = None , showfliers = None , boxprops = None , labels = None , flierprops = None . medianprops = None , meanprops = None , capprops = None , whiskerprops = None )
x: Specify the data to be drawn as a box plot;
notch: Whether to display the box plot in the form of a notch, the default is not a notch;
sym: Specify the shape of the abnormal point, the default is + sign display;
vert: Whether the box plot needs to be placed vertically. The default is vertical placement:
whis: Specify the distance between the upper and lower whiskers and the upper and lower quartiles, the default is 1.5 times the interquartile difference;
positions: Specify the position of the box plot, the default is [0.1.2…];
widths: Specify the width of the box plot, the default is 0.5;
patch_artist: Whether to fill the color of the box;
meanline: whether to express the mean in the form of a line, by default it is expressed as a point;
meanprops: Set the properties of the mean, such as point size, color, etc.;
capprops: Set the properties of the top and end lines of the box plot, such as color, thickness, etc.; whiskerprops: Set the properties of the whiskers, such as color, thickness, line type, etc.
Support multiple sets of data import
12. Correlation coefficient matrix graph
pandas itself also encapsulates the drawing function
1. scatter_martrix()
You can draw a scatter plot between each attribute, and the diagonal line is the distribution plot.
%pylab inline #Display the image result=pd.scatter_matrix(data,diagonal='hist',color='k',alpha=0.3,figsize=(10,10))
diagonal=hist’: The diagonal distribution chart is a histogram.
diagonal=kde’: the diagonal distribution chart is a curve chart
2. seaborn
seaborn is a streamlined python library that can create statistically significant charts and understands Pandas’ DataFrame type.
seaborn . heatmap ( data , vmin = None , vmax = None . cmap = None , center = None , robust = False , annot = None , fmt ='2g', annot _ kws = None . linewidths =0 . linecolor =' white ', cbar = True , cbar _ kws = None , cbar _ ax = None , square = False , xticklabels =' auto ', yticklabels =' auto ', mask = None , ax = None ," kwargs )
(1) Heat map input data parameters:
data: Matrix data set, which can be a numpy array (array) or a pandas DataFrame. If it is a DataFrame, the index / column information of df will correspond to the columns and rows of heatmap respectively, that is, pt. index is the row label of the heat map, and pt. columns is the column label of the heat map.
(2) Heat map matrix block color parameters:
vmax, vmin: respectively the maximum and minimum color value range of the heat map. The default is determined based on the value in the data data table.
cmap: Mapping from numbers to color space, the value is the colormap name or color object in the matplotlib package, or a list representing colors; change the parameter default value: set according to the center parameter
center: When there are differences in the data table values, set the color center alignment value of the heat map; by setting the center value, you can adjust the overall depth of the generated image color: when setting the center data, if there is data overflow, manually set vmax, vmin will automatically change
robust: Default value False: If it is Faise, and the values of vmin and vmax are not set.
(3) Heat map matrix block annotation parameters:
annot (abbreviation of annotate): Default value is False; if it is True, data is written in each square of the heat map; if it is a matrix, data corresponding to the matrix is written in each square of the heat map.
fmt: string format code, data format identifying numbers on the matrix, such as retaining several digits after the decimal point
annot _ kws: Default value False: If True, set the size, color, and font of the numbers on the heat map matrix. Font settings under the ltext class of the matplotlib package:
(4) Spacing and spacing line parameters between heat map matrix blocks:
linewidths: Define the size of the gap between the matrix patches that represent pairwise feature relationships in the heat map
linecolor: The color of the line that divides each matrix patch on the heat map. The default value is ‘white’
(5) Heat map color scale bar parameters:
cbar: Whether to draw a color scale bar on the side of the heat map, the default value is True
cbar_kws: related font settings when drawing color scale bars on the side of the heat map, the default value is None
cbar_ax: Set the scale bar position when drawing the color bar on the side of the heat map. The default value is None.
xticklabels, yticklabels: xticklabels controls the output of label names in each column; yticklabels controls the output of label names in each row. The default value is auto. If True, the column name of the DataFrame is used as the label name. If False, row label names are not added. If it is a list, the label name is changed to the content given in the list. If it is an integer K, label every K labels on the graph.