Pandas+Matplotlib, an in-depth introduction to Python data analysis

Explore charts with visualization

1. Data visualization and exploration diagram

Data visualization refers to the presentation of data in the form of graphics or tables. Charts can clearly present the nature of data and the relationships between data or attributes, making it easy for people to interpret the chart. Through the Exploratory Graph, users can understand the characteristics of the data, find trends in the data, and lower the threshold for understanding the data.

2. Common chart examples

This chapter mainly uses Pandas to draw graphics instead of using the Matplotlib module. In fact, Pandas has integrated Matplotlib’s drawing methods into DataFrame, so in practical applications, users can complete drawing work without directly referencing Matplotlib.

1. Line chart

A line chart is the most basic chart and can be used to present the relationship between continuous data in different fields. The plot.line() method is used to draw a line chart, and parameters such as color and shape can be set. In terms of use, the method of drawing the split line diagram completely inherits the usage of Matplotlib, so the program must also call plt.show() at the end to generate the diagram, as shown in Figure 8.4.

df_iris[['sepal length (cm)']].plot.line()
plt.show()
ax = df[['sepal length (cm)']].plot.line(color='green',title="Demo",style='--')
ax.set(xlabel="index", ylabel="length")
plt.show()

2. Scatter chart

Scatter Chart is used to view the relationship between discrete data in different fields. Scatter plots are drawn using df.plot.scatter(), as shown in Figure 8.5.

df = df_iris
df.plot.scatter(x='sepal length (cm)', y='sepal width (cm)')
  
from matplotlib import cm
cmap = cm.get_cmap('Spectral')
df.plot.scatter(x='sepal length (cm)',
          y='sepal width (cm)',
          s=df[['petal length (cm)']]*20,
          c=df['target'],
          cmap=cmap,
          title='different circle size by petal length (cm)')

3. Histogram and bar graph

 Histogram Chart is usually used in the same column to show the distribution of continuous data. Another chart similar to the histogram is the Bar Chart, which is used to view the same column, as shown in the figure As shown in 8.6.
df[['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)','petal width (cm)']].plot.hist()
2 df.target.value_counts().plot.bar()

4. Pie chart, box chart

Pie Chart can be used to view the proportion of each category in the same field, while Box Chart can be used to view the same field or compare the distribution differences of data in different fields, as shown in Figure 8.7. Show.

df.target.value_counts().plot.pie(legend=True)
df.boxplot(column=['target'],?gsize=(10,5))

Practical sharing of data exploration

This section uses two real data sets to actually demonstrate several techniques for data exploration.

1. 2013 American Community Survey

In the American Community Survey, approximately 3.5 million families are asked each year detailed questions about who they are and how they live. The survey covers a number of topics including ancestry, education, work, transport, internet use and residence.

Data source: https://www.kaggle.com/census/2013-american-community-survey.

Data name: 2013 American Community Survey.

First observe the appearance and characteristics of the data, as well as the meaning, type and scope of each field.

# Read data
df = pd.read_csv("./ss13husa.csv")
# Number of field types
df.shape
# (756065,231)
  
# Field value range
df.describe()

First, connect the two ss13pusa.csv. This data contains a total of 300,000 pieces of data, with 3 fields: SCHL (School Level), PINCP (Income) and ESR (Work Status).

pusa = pd.read_csv("ss13pusa.csv") pusb = pd.read_csv("ss13pusb.csv")
# Concatenate two pieces of data
col = ['SCHL','PINCP','ESR']
df['ac_survey'] = pd.concat([pusa[col],pusb[col],axis=0)

Group the data according to academic qualifications, observe the proportion of numbers with different academic qualifications, and then calculate their average income.

group = df['ac_survey'].groupby(by=['SCHL']) print('Education distribution:' + group.size())
group = ac_survey.groupby(by=['SCHL']) print('Average income:' + group.mean())

2. Boston housing data set

The Boston House Price Dataset contains information about housing in the Boston area, including 506 data samples and 13 feature dimensions.

Data source: https://archive.ics.uci.edu/ml/machine-learning-databases/housing/.

Data name: Boston House Price Dataset.

First observe the appearance and characteristics of the data, as well as the meaning, type and scope of each field.

The distribution of house prices (MEDV) can be plotted in the form of a histogram, as shown in Figure 8.8.

df = pd.read_csv("./housing.data")
# Number of field types
df.shape
# (506, 14)
  
#Field value range df.describe()
import matplotlib.pyplot as plt
df[['MEDV']].plot.hist()
plt.show()

Note: The English in the picture corresponds to the names specified by the author in the code or data. In practice, readers can replace them with the words they need.

The next thing you need to know is which dimensions are significantly related to “house prices”. First observe it using a scatter diagram, as shown in Figure 8.9.

# draw scatter chart
df.plot.scatter(x='MEDV', y='RM') .
plt.show()

Finally, the correlation coefficient is calculated and visually presented using a cluster heatmap, as shown in Figure 8.10.

# compute pearson correlation
corr = df.corr()
# draw heatmap
import seaborn as sns
corr = df.corr()
sns.heatmap(corr)
plt.show()

The color is red, indicating a positive relationship; the color is blue, indicating a negative relationship; the color is white, indicating no relationship. The correlation between RM and house prices is biased toward red, indicating a positive relationship; the correlation between LSTAT, PTRATIO and housing prices is toward dark blue, indicating a negative relationship; the correlation between CRIM, RAD, and AGE and housing prices is biased toward white, indicating no relationship.

Finally:

Python learning materials

If you want to learn Python to help you automate your office, or are preparing to learn Python or are currently learning it, you should be able to use the following and get it if you need it.

① A roadmap for learning Python in all directions, knowing what to learn in each direction
② More than 100 Python course videos, covering essential basics, crawlers and data analysis
③ More than 100 Python practical cases, learning is no longer just theory
④ Huawei’s exclusive Python comic tutorial, you can also learn it on your mobile phone
⑤Real Python interview questions from Internet companies over the years, very convenient for review

There are ways to get it at the end of the article

1. Learning routes in all directions of Python

The Python all-direction route is to organize the commonly used technical points of Python to form a summary of knowledge points in various fields. Its usefulness is that you can find corresponding learning resources according to the above knowledge points to ensure that you learn more comprehensively.

2. Python course video

When we watch videos and learn, we can’t just move our eyes and brain but not our hands. The more scientific learning method is to use them after understanding. At this time, hands-on projects are very suitable.

3. Python practical cases

Optical theory is useless. You must learn to follow along and practice it in order to apply what you have learned into practice. At this time, you can learn from some practical cases.

Four Python Comics Tutorial

Use easy-to-understand comics to teach you to learn Python, making it easier for you to remember and not boring.

5. Internet company interview questions

We must learn Python to find a high-paying job. The following interview questions are the latest interview materials from first-tier Internet companies such as Alibaba, Tencent, Byte, etc., and Alibaba bosses have given authoritative answers. After finishing this set I believe everyone can find a satisfactory job based on the interview information.


This complete version of the complete set of Python learning materials has been uploaded to CSDN. If friends need it, you can also scan the official QR code of csdn below or click on the WeChat card at the bottom of the homepage and article to get the method. [Guaranteed 100% free]

syntaxbug.com © 2021 All Rights Reserved.