What modules in Python do you think are awesome?

The following 10 awesome modules of Python are recommended. For example, the numpy and pandas packages are used for data cleaning. The matplotlib library is used for data visualization. The matplotlib library is easy to get started. For more advanced learning, the seaborn library is an improved chart drawing of the matplotlib library. Method, if you create interactive charts, you can use the Pyecharts library, as well as some other libraries used in automated offices.

The role of Python is not just for data cleaning and data visualization. It has many other functions. It has many unexpected functions. It can efficiently perform data processing, data visualization, and realize automated office work. Here are just some examples of how to use Python. For the most common packages, everyone is welcome to add to them. Let’s learn together below.

1. Numpy

Official website https://www.numpy.org.cn/

NumPy is the basic package for scientific computing in Python. It is a Python library that provides multi-dimensional array objects, various derived objects, and various APIs for fast array operations, including mathematics, logic, shape operations, sorting, selection, input and output, discrete Fourier transform, and basic linear algebra. , basic statistical operations and stochastic simulation, etc.

The core of the NumPy package is the ndarray object. It encapsulates Python’s native n-dimensional array of the same data type. In order to ensure its excellent performance, many operations are executed after the code is compiled locally.

The main objects of NumPy are homogeneous multidimensional arrays. It is a table of elements, all of the same type, indexed by tuples of non-negative integers. Called axis in NumPy dimensions.

2. Pandas

Official website https://www.pypandas.cn/

Pandas is Python’s core data analysis support library. It provides a fast, flexible, and clear data structure. It is designed to process relational and tagged data simply and intuitively. It is widely used in the field of data analysis. Pandas is suitable for processing tables similar to Excel. tabular data, as well as ordered and unordered time series data, etc.

The main data structures of Pandas are Series (one-dimensional data) and DataFrame (two-dimensional data). These two data structures are sufficient to handle most typical use cases in finance, statistics, social sciences, engineering and other fields. Use pandas for data analysis processes It includes stages such as data sorting and cleaning, data analysis and modeling, data visualization and tabulation.

Flexible grouping function: (group by) data grouping, aggregation, and data conversion;
Intuitive merge functions: (merge) data connection;
Flexible reshaping functions: (reshape) data reshaping;

3, Matplotlib

Official website https://www.matplotlib.org.cn/

Matplotlib is a Python 2D plotting library that produces publication-quality graphics in a variety of hardcopy formats and in a cross-platform interactive environment. Matplotlib is available for Python scripts, Python and IPython shells, Jupyter notebooks, web application servers and four graphical user interface toolkits.

Matplotlib tries to make easy things easier and hard things possible, allowing you to generate charts, histograms, power spectra, bar graphs, error plots, scatter plots, and more with just a few lines of code.

For simple plotting, the pyplot module provides a MATLAB-like interface, especially when combined with IPython. For advanced users, you have complete control over line style, font attributes, through an object-oriented interface or a set of features familiar to MATLAB users. , axis properties, etc.

4. Seaborn

Official website http://seaborn.pydata.org/

Seaborn is a Python data visualization library built on top of matplotlib and tightly integrated with Pandas data structures to provide a high-level interface for drawing attractive and informative statistical graphics.

Seaborn can be used to explore data. Its plotting functions operate on data frames and arrays containing the entire dataset and internally perform the necessary semantic mapping and statistical aggregation to generate information graphs. Its declarative API for datasets can focus on The meaning of the different elements of drawing, rather than the details of how to draw them.

Matplotlib has a comprehensive and powerful API that allows you to change almost any attribute of the graph to your liking. The combination of seaborn’s advanced interface and matplotlib’s deep customizability allows Seaborn to both quickly explore data and create customizable publications of quality. Graphics of the final product.

5. Pyecharts

Official website https://pyecharts.org/#/

Echarts is a data visualization open sourced by Baidu. It has been recognized by many developers for its good interactivity and exquisite chart design. Python is an expressive language and is well suited for data processing. When data analysis meets data visualization, pyecharts was born.

Pyecharts has a simple API design, is as smooth as silk, supports chain calls, includes 30+ common charts, everything you need, supports mainstream Notebook environments, Jupyter Notebook and JupyterLab, and has highly flexible configuration items that can be easily matched Beautiful chart.

The powerful data interaction function of Pyecharts makes data expression information more vivid and increases the human-computer interaction effect. The data presentation effect can be directly exported to html files, increasing the opportunity for data result interaction and making information communication easier.

Pyecharts has rich chart materials and supports chain calls. The following is a geographical chart function using Pyecharts to intuitively display data visualization effects in space.

6. wordcloud

To draw a word cloud chart, you can use the wordcloud library in Python. First, use pip install wordcloud to install the library. After importing the text data, create a WordCloud object, set the background color, width, and height of the word cloud chart, and use the generate() method to convert the text Pass it to the word cloud object to generate the word cloud graph. Finally, use the imshow() method to display the word cloud graph, and use the axis() method to hide the coordinate axis.

import matplotlib.pyplot as plt
from wordcloud import WordCloud
  
text = "This is some sample text for generating a word cloud."
  
#Create word cloud object
wordcloud = WordCloud(background_color='white', width=800, height=600).generate(text)
  
# Display word cloud image
plt.figure(figsize=(9, 6))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()

7. Faker

The Faker library is a good library for simulating data generation. While ensuring data security, the Faker library can be used to meet the testing needs of our data analysis to the greatest extent. It can simulate the generation of text, numbers, dates and other fields.

Importing the Faker library can be used to simulate generating data. Locale=”zh_CN” is used to display Chinese. As follows, a group containing name, mobile phone number, ID number, date of birth, email, address, company, and position is generated. data in several fields.

#Display running results in multiple lines
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

from faker import Faker
faker=Faker(locale="zh_CN")#Simulate generated data

faker.name()
faker.phone_number()
faker.ssn()
faker.ssn()[6:14]
faker.email()
faker.address()
faker.company()
faker.job()

8. PySimpleGUI

In order to add interactive operations to the running process of the code, a graphical interface is developed here using the PySimpleGUI library, in which layout is used to customize the window layout, window is used to define the overall window interface, while True: runs in a loop, when specific ” event”, the specific “value” is returned, thereby realizing the human-computer interaction function.

import PySimpleGUI as sg

#Customize the window layout. There are two lines in total. The first line is used to find the file directory that needs to be merged with Excel. The second line clicks the Start Merge button to merge.
layout = [[sg.Text("Please select the directory where the Excel file is located:"),sg.Input(size=(25, 1), enable_events=True, key="File path"),sg.FolderBrowse(button_text=" Browse documents"),],
          [sg.Button('Start merging', enable_events=True, key="Start"),]
         ]

window = sg.Window('Batch data merging: By Dahua data analysis', layout)#Define window

while True:
    event, values = window.read()
    if event in (None,):
        break #Close the user interface
    elif event == "start":
        if values["file path"]:
            print(values["file path"])
            sg.popup('Data merging completed!')
        else:
            sg.popup('Please enter the path where the Excel file is located first!')

window.close()

The human-computer interaction function is developed as follows. Click to browse files, find the folder directory where files need to be merged in batches, click to start merging, and the results will be output. Among them, values [“file path”] input is the file that needs to be merged with Excel data. Path, what prints out is the file path to which Excel data needs to be merged.

9. pipenv

After the interactive command is developed, how can I share it with others? Or can the data merging function be used normally without Python installed on other people’s computers? Here I will introduce you to Python program packaging, using a virtual environment for packaging, and entering the following command on the command line to download the pipenv package.

#Use virtual environment compression
pip install pipenv -i https://pypi.tuna.tsinghua.edu.cn/simple

Use the shortcut keys Win + R, then enter CMD, enter the pipenv shell command to enter the virtual environment. If there is no virtual environment, one will be automatically created.

#Win + R and enter CMD to enter the virtual environment. If there is no virtual environment, one will be automatically created.
pipenv shell

Since Python packaging will package all Python packages installed on the computer, here we only install the modules involved in the Python program in the virtual environment, which will reduce the packaging volume. Note that xlrd==1.2.0 downloads lower version packages, which are installed by default. It is a higher version. Installing a higher version of the package will cause an error when using the program.

#Only install modules involved in Python programs
pip install pandas xlrd==1.2.0 id-validator PySimpleGUI pyinstaller -i https://pypi.tuna.tsinghua.edu.cn/simple

Export the data merging code with interactive commands as a .py file, enter the following packaging command on the command line, and specify the specific path to package.

#Package
pyinstaller -F -w C:\Desktop\combine.py

Wait a few minutes and you will see a dist file in the Python working directory. If you do not know your Python working directory, you can use the os.getcwd() command to view it.

The dist file contains a combine.exe program, as shown below is the packaged program.

10, pandasql

The pandasql library can write SQL in Python, and SQL syntax is fully supported in Python. Writing SQL in Python can be done by hand. Importing the pandasql library and running SQL require the use of the pandasql library. We use The command is sql.sqldf(“”” *** “””), where *** is the SQL statement you want to write. Writing SQL is not difficult and easy to get started. Just write the SQL statement within the brackets, that is Data query can be realized.

import pandasql as sql

After importing the pandasql library, the cumulative box office of the movie needs to be divided into ‘Ultra-low box office‘, ‘Low box office‘, ‘Medium box office‘ , ‘High box office‘, ‘Super high box office‘, use case when to group, end with end, successfully implement case when query in pandas, the query results are as follows Show.

#Use CASE WHEN grouping for the cumulative box office of the movie
sql.sqldf("""select movie name, movie director, movie starring, cumulative box office,
case
when cumulative box office < 100000 then 'super low box office'
when cumulative box office < 200000 then 'low box office'
when cumulative box office < 300000 then 'medium box office'
when cumulative box office < 400000 then 'high box office'
else 'super high box office'
end as 'movie box office grouping' from df
where cumulative box office is not null;""")

What? Still haven’t had enough of the above Python libraries? If you want to learn more about Python libraries, you may wish to follow me and continue to share data analysis knowledge to help you better master a data skill. If you encounter any problems during the learning process, you can ask me questions at any time and solve them. Your question, the link is here, click on the card↓

The rapid rise of Python is extremely beneficial to the entire industry, but “There are many popular people and not many people“, which has led to a lot of criticism, but it still cannot stop its popularity. development momentum.

If you are interested in Python and want to learn Python, here I would like to share with you a Complete set of Python learning materials, which I compiled during my own study. I hope it can help you, let’s work together!

Friends in need can click the link below to get it for free or Scan the QR code below to get it for free
Python complete set of learning materials

1Getting started with zero basics

① Learning route

For students who have never been exposed to Python, we have prepared a detailed Learning and Growth Roadmap for you. It can be said to be the most scientific and systematic learning route. You can follow the above knowledge points to find corresponding learning resources to ensure that you learn more comprehensively.

② Route corresponding learning video

There are also many learning videos suitable for beginners. With these videos, you can easily get started with Python~

③Exercise questions

After each video lesson, there are corresponding exercises to test your learning results haha!

2Domestic and foreign Python books and documents

① Documents and books

3Python toolkit + project source code collection

①Python toolkit

The commonly used development software for learning Python is here! Each one has a detailed installation tutorial to ensure you can install it successfully!

②Python practical case

Optical theory is useless. You must learn to type code along with it and practice it in order to apply what you have learned to practice. At this time, you can learn from some practical cases. 100+ practical case source codes are waiting for you!

③Python mini game source code

If you feel that the practical cases above are a bit boring, you can try writing your own mini-game in Python to add a little fun to your learning process!

4Python interview questions

After we learn Python, we can go out and find a job if we have the skills! The following interview questions are all from first-tier Internet companies such as Alibaba, Tencent, Byte, etc., and Alibaba bosses have given authoritative answers. I believe everyone can find a satisfactory job after reviewing this set of interview materials.

5Python part-time channels

Moreover, after learning Python, you can also take orders and make money on major part-time platforms. I have compiled various part-time channels + part-time precautions + how to communicate with customers into documents.

All the above information , if friends need it, you can scan the QR code below to get it for free