The Three Musketeers of Python Data Analysis: Pandas, Matplotlib and Seaborn

Python has powerful data analysis and processing capabilities. To use Python for data analysis, you need to master the three Python packages: pandas, matplotlib, and seaborn. Mastering the knowledge of Python data analysis can help us better discover data. The laws and trends behind it provide support for business decisions.

Reading data using Pandas

First, import the pandas library. The pandas library has powerful data processing capabilities. Use the read_excel function to import data. Just import the file path to import the data. Head can preview the first 5 rows of data.

import pandas as pddf=pd.read_excel(r'C:\Desktop\e-commerce sales data-August 23.xlsx')df.head()#Preview the first 5 rows of data

image

The info function can look at the information of each field, including non-null value count and data type. For example, the date here is date data, the area is character data, and the customer’s age is numeric data.

df.info()#Data information preview

image

Use the describe function to perform descriptive statistics on the data. The descriptive statistics include counts, averages, standard deviations, minimum values, maximum values, etc. For example, if you perform descriptive statistics on customer age, the average customer age is 42 years old, and the youngest customer is 22 years old. , the oldest customer is 62 years old.

df.describe()#Data description statistics

image

Using Pandas data sorting

The sort_values function can be used to sort data by importing the sorted data column. The default sorting is ascending order.

df.sort_values(by='sales',inplace=True)#Default sort df in ascending order

image

The ascending=False parameter can be sorted in descending order.

df.sort_values(by='sales',inplace=True,ascending=False)#Sort df in descending order

image

If you want to customize the sorting, such as ascending order by product category, descending order by sales volume, by importing custom data columns, ascending setting parameters, True ascending order, False descending order.

df.sort_values(by=['Product Category','Sales'], ascending=[True,False],inplace=True)#Customized sorting df

image

Use Pandas data filtering

Data filtering uses data frame [] to filter. Write conditions in the data frame for data filtering. & Filter when all conditions are met. == Filter when specific conditions are met. This means filtering data where the customer’s gender is male and the customer’s age is greater than 60. .

df[(df['Customer gender']=='Male') & amp;(df['Customer age']>60)]#Data filtering & amp;

image

The following filter area is the data of “Northwest-Gansu Province-Silver” or the product category is “Computer Hardware”, & amp; filters the data when all conditions are met, | filters the data when one of the conditions is met.

df[(df['region']=='Northwest-Gansu Province-Baiyin')|(df['commodity category']=='computer hardware')]#Data filtering|or

image

isin can filter the data of specific tags. Just write the specific filtering tag within (). For example, the following filters the data of specific order numbers.

df[df['Order number'].isin(['10021296335','10021669688','10021250896','10021434444','10021412817'])]

image

Data splitting using Pandas

Use the str.split method to split the data in the column, and set the expand parameter to True to return a DataFrame object containing the split data. The region is split into two columns: “province” and “city” as follows.

df_split=df['region'].str.split(pat='-',expand=True)#Data split df['region']=df_split.iloc[:,0]df['province' ]=df_split.iloc[:,1]df['City']=df_split.iloc[:,2]df

image

Using Pandas statistical operations

Perform statistical operations on the data. Count is used for counting. For example, the order number is counted here. The counting result is 7409 orders.

df['Order Number'].count()#Count

7409

If you want to count non-duplicate product categories, you can first use unique to return a non-duplicate list, and then use len to count the list. The returned product category has 8 unique values.

len(df['Product category'].unique())#unique counting

8

To sum the sales numbers, sum is used for the sum. The result here shows that the total number of sales is 48354.

df['Sales'].sum()#Sum

48354

To count the number of orders for each product category, use groupby to count the orders to get the number of orders for each product category.

df.groupby(['Product category'])['Order number'].count().reset_index()#Group count

image

Using Python data graphing

The matplotlib and seaborn libraries in Python have powerful data visualization functions. They can count the sales in each region, import the matplotlib package, pass in the sales data column, and calculate specific sales data. After setting the chart parameters, it can be concluded that the sales volume in the South China region accounts for a maximum of 36.3%, and the sales volume in the Southwest region accounts for a minimum of 3.1%.

import matplotlib.pyplot as plt import matplotlib.style as pslplt.rcParams['font.sans-serif']=['SimHei'] #Used to display Chinese labels normally plt.rcParams['axes.unicode_minus']= False #Used to display negative signs normally psl.use('ggplot')df_QY=df.groupby(['Area'])['Sales'].count().reset_index()#Pie chart labels = df_QY[' Area'].tolist()explode = [0.05,0.05,0,0,0,0] # Used to highlight data df_QY['Sales Number'].plot(kind='pie',figsize=(9,6 ), autopct='%.1f%%', #data labels labels=labels, startangle=260, #Initial angle explode=explode, # Highlight data pctdistance=0.87, # Set the distance between the percentage label and the center of the circle textprops = {' fontsize':12, 'color':'k'}, #Set the attribute value of the text label)plt.title("Sales proportion of each region")plt.show()

image

Make a boxplot of profits, use the boxplot function, and set the parameters of the boxplot chart to get the data distribution of profits. Most of the profit data in the boxplot exceeds the upper and lower limits of the boxplot.

import matplotlib.pyplot as plt import matplotlib.style as pslplt.rcParams['font.sans-serif']=['SimHei'] #Used to display Chinese labels normally plt.rcParams['axes.unicode_minus']= False #Used to display negative signs normally psl.use('ggplot')plt.title('Profit Boxplot')df_XB=df[df['Region']=='Northwest']#Boxplot plt.boxplot (x=df_XB['profit'], #Specify the data for drawing the box plot whis=1.5, #Specify 1.5 times the interquartile difference widths=0.1, #Specify the width of the box in the box plot to be 0.3 showmeans=True , #Display mean #patch_artist=True, #The color of the filled box #boxprops={'facecolor':'RoyalBlue'}, #Specify the filling color of the box to be royal blue flierprops={'markerfacecolor':'red','markeredgecolor' :'red','markersize':3}, #Specify the fill color, border color and size of outliers meanprops={'marker':'h','markerfacecolor':'black','markersize':8}, #Specify the marker symbol (dashed line) and color of the median props={'linestyle':'--','color':'orange'}, #Specify the marker symbol (hexagon), fill color and size labels=['Northwest'] )plt.show()

image

Make a line chart of the sales number and import it into the seaborn library. The date column is used as the X-axis and the sales number is used as the Y-axis. From the line chart, you can see the fluctuation trend of the sales number with the date.

import seaborn as sns import matplotlib.pyplot as plt plt.rcParams['font.sans-serif']=['SimHei'] #Used to display Chinese labels normally plt.rcParams['axes.unicode_minus']=False #Used to display negative signs normally plt.figure(figsize=(10,6)) # Use Seaborn to draw a line chart sns.lineplot(data=df, x='date', y='sales number', color='blue ') # Set the chart title and axis label plt.title('Sales line chart') plt.xlabel('Date') plt.ylabel('Sales') # Display the graph plt.show()

image

For word cloud analysis of product categories, the wordcloud library can be used to create word cloud charts. Use a dictionary to count the number of product categories. After creating a word cloud object, matplotlib can draw a word cloud chart. From the word cloud chart, it can be seen that bedding sets have the most categories, including office furniture. Furniture has the least variety.

from wordcloud import WordCloud import matplotlib.pyplot as plt # Product category list product_categories = df['Product category'].tolist() # Use dictionary to count the number of product categories category_counts = dict() for category in product_categories: if category in category_counts: category_counts[category] + = 1 else: category_counts[category] = 1 #Create word cloud object wordcloud = WordCloud(font_path='simhei.ttf', background_color='white').generate_from_frequencies(category_counts) #Use matplotlib to draw words Cloud figure plt.figure(figsize=(9, 6)) plt.imshow(wordcloud, interpolation='bilinear') plt.axis("off") plt.show()

image

Using Python for data analysis requires proficiency in pandas, matplotlib, seaborn and other Python libraries, as well as basic skills in programming and data analysis. If you want to learn more Python data analysis knowledge, you can follow me and continue to share data analysis knowledge to help you better master Python!

It’s good to learn Python well, whether it’s for employment or making money as a side job. Here I share with you a full set of Python learning materials, including learning routes , software, source code, videos, interview questions, etc., are all compiled by me when I was studying. I hope it can be helpful to friends who are learning or want to learn Python!

CSDN gift package: “Python entry & advanced learning resource package” free sharing

Complete set of Python learning materials


(Screenshot of part of the information)

① A complete set of Python books and video cheats

It includes basic introduction to Python, crawlers, data analysis, and web development. There are dozens of them here, maybe not a lot, but they should be enough for beginners.
The knowledge points inside are relatively practical, and the duration of each class is a normal 40 minutes.

② Python data analysis from beginner to proficient

(Video course + courseware + source code)

③A python office automation tutorial that can be understood even with no basic knowledge

④Python interview highlights and resume templates

After learning Python, you can go out and find a job if you have the skills! The following interview questions are all from first-tier Internet companies such as Alibaba, Tencent, Byte, etc., and Alibaba bosses have given authoritative answers. I believe everyone can find a satisfactory job after reviewing this set of interview materials.

⑤Python side job part-time route

After learning Python, you can also take orders and make money on major part-time platforms. I have compiled various part-time channels + part-time precautions + how to communicate with customers into documents.

All the above information, friends, if you need “A complete set of 0 basic entry to advanced Python learning” Information”, you can scan the QR code below to get it for free

syntaxbug.com © 2021 All Rights Reserved.