Python is also an important tool in the fields of data science and machine learning (yyds). This is a Python environment guide that tells you how to configure the Python environment and install related libraries for data science and machine learning research.
In the field of data science and machine learning, we can use the standard environment of Python or we can use Anaconda. Because Anaconda not only supports Python language, but also supports other data science tools, such as Matlab, R language, and Fortran language. Anaconda is also a tool library distribution platform from which libraries can be downloaded and installed. In the Anaconda environment, we can use the conda
command to install the library. If we don’t need another language, we can use its concise version Miniconda. In addition, data scientists are more accustomed to using Jupyter for research. Jupyter is a web-based development tool that can execute code called Notebook in a single step and interactively. JupyterLab is the next generation of Jupter. The above are the basic concepts we need to understand before we start. They are summarized in the following table:
Name | Description |
---|---|
Anaconda/Miniconda | A data science environment and development platform that can be understood as the PyPi source of Python. |
conda | Anaconda’s command line tool can be understood as the Pip command |
Jupyter/JupterLab | A web development tool |
Notebook | A file that mixes code, annotation documents and execution results |
The interface of jupyter is roughly like this:
You can also experience jupyter.org/try-jupyter directly…
Environment installation
You can install miniconda using the following command:
curl https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh | sh
After miniconda is installed, you can create and use a virtual environment like this:
# Best practice, use an environment rather than install in the base env conda create -n my-env conda activate my-env # If you want to install from conda-forge conda config --env --add channels conda-forge # The actual install command conda install numpy
This works similarly to Python’s virtual environment:
python3 -m venv .venv source .venv/bin/activate pip install numpy
Of course, you can use pip directly in the conda environment:
(my-env) [game404@y ~]$ pip list Package Version ---------- ------- numpy 1.24.1 pip 22.3.1 setuptools 65.6.3 wheel 0.38.4
We can install jupyterlab using one of the following two commands:
conda install jupyterlab or pip install jupyterlab
- The conda command is installed from the anaconda source; pip is installed from the PyPi source. The two commands have the same purpose. It depends on which one has the faster network speed.
Start jupyter-lab
After installing jupyter-lab, you can use the following command to open it:
(my-env) [game404@y ~]$jupyter-lab [I 2023-01-08 21:26:50.250 ServerApp] jupyter_server_terminals | extension was successfully linked. [I 2023-01-08 21:26:50.257 ServerApp] jupyterlab | extension was successfully linked. [I 2023-01-08 21:26:50.262 ServerApp] nbclassic | extension was successfully linked. ... [I 2023-01-08 21:26:50.842 ServerApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation). [W 2023-01-08 21:26:50.847 ServerApp] No web browser found: could not locate runnable browser. [C 2023-01-08 21:26:50.848 ServerApp] To access the server, open this file in a browser: file:///home/yuanzuxiang/.local/share/jupyter/runtime/jpserver-3540-open.html Or copy and paste one of these URLs: http://localhost:8889/lab?token=f5028b1978baa74512cec56cff7c4f9e2dbbc4592cdf5b69 or http://127.0.0.1:8889/lab?token=f5028b1978baa74512cec56cff7c4f9e2dbbc4592cdf5b69
- Note that the token here is a token for authorized access, which is required when visiting the homepage for the first time.
Then we access jupyter-lab through the browser, create Notebook, and directly test the python environment:
- The red Notebook Icon is consistent with the boot interface
- Use the top toolbar [->] to execute code
- The main function of notebook is to execute code according to cells.
Install common libraries
After installing the Python environment and Jupyter-lab tools, we start to install commonly used libraries, mainly involving the following 7 libraries:
- numpy The fundamental package for scientific computing with Python
- pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.
- matplotlib Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python. Matplotlib makes easy things easy and hard things possible.
- seaborn is a Python data visualization library based on matplotlib.
- scipy Fundamental algorithms for scientific computing in Python
- statsmodels statistical models, hypothesis tests, and data exploration
- sklearn Machine Learning in Python
These libraries also have dependencies on each other. Numpy is the most basic matrix implementation, pandas is the core data table operation, seaborn is based on matplotlib, which is responsible for data visualization, scipy and statsmodels provide some statistical methods, and sklearn performs machine learning and linear regression. We can install it in this order:
conda install numpy conda install pandas conda install matplotlib conda install seaborn conda install scipy conda install statsmodels conda install scikit-learn
You can also install it directly using the pip command:
pip install numpy pip install pandas pip install matplotlib pip install seaborn pip install scipy pip install statsmodels pip install scikit-learn
Generally we import them like this:
import numpy as np import pandas as pd import matplotlib as mpl import matplotlib.pyplot as plt import seaborn as sns import seaborn.objects as so from scipy import stats from sklearn import linear_model import statsmodels.api as sm
You can also use the pip command to install the library in jupyter-lab:
- Note that the preceding
!
is required
———————————END——————- ——–
Digression
Interested friends will receive a complete set of Python learning materials, including interview questions, resume information, etc. See below for details.
CSDN gift package:The most complete “Python learning materials” on the Internet are given away for free! (Safe link, click with confidence)
1. Python learning routes in all directions
The technical points in all directions of Python have been compiled to form a summary of knowledge points in various fields. Its usefulness is that you can find corresponding learning resources according to the following knowledge points to ensure that you learn more comprehensively.
2. Python essential development tools
The tools have been organized for you, and you can get started directly after installation!
3. Latest Python study notes
When I learn a certain basic and have my own understanding ability, I will read some books or handwritten notes compiled by my seniors. These notes record their understanding of some technical points in detail. These understandings are relatively unique and can be learned. to a different way of thinking.
4. Python video collection
Watch a comprehensive zero-based learning video. Watching videos is the fastest and most effective way to learn. It is easy to get started by following the teacher’s ideas in the video, from basic to in-depth.
5. Practical cases
What you learn on paper is ultimately shallow. You must learn to type along with the video and practice it in order to apply what you have learned into practice. At this time, you can learn from some practical cases.
6. Interview Guide
CSDN gift package:The most complete “Python learning materials” on the Internet are given away for free! (Safe link, click with confidence)
If there is any infringement, please contact us for deletion.