The Python language has always been famous for its rich third-party libraries. Today, I will introduce a few very nice libraries, which are fun, fun and powerful!
Data collection
In today’s Internet era, data is so important, first of all, let’s introduce several excellent data collection projects
AKShare
AKShare is a Python-based financial data interface library, the purpose of which is to realize fundamental data, real-time and historical market data, and derivative data from data collection, A set of tools from data cleaning to data landing, mainly used for academic research purposes.
import akshare as ak stock_zh_a_hist_df = ak.stock_zh_a_hist(symbol="000001", period="daily", start_date="20170301", end_date='20210907', adjust="") print(stock_zh_a_hist_df)
Output:
Date Opening Closing Highest ... Amplitude Change Amount Change Amount Turnover Rate 0 2017-03-01 9.49 9.49 9.55 ... 0.84 0.11 0.01 0.21 1 2017-03-02 9.51 9.43 9.54 ... 1.26 -0.63 -0.06 0.24 2 2017-03-03 9.41 9.40 9.43 ... 0.74 -0.32 -0.03 0.20 3 2017-03-06 9.40 9.45 9.46 ... 0.74 0.53 0.05 0.24 4 2017-03-07 9.44 9.45 9.46 ... 0.63 0.00 0.00 0.17 ... ... ... ... ... ... ... ... ... ... 1100 2021-09-01 17.48 17.88 17.92 ... 5.11 0.45 0.08 1.19 1101 2021-09-02 18.00 18.40 18.78 ... 5.48 2.91 0.52 1.25 1102 2021-09-03 18.50 18.04 18.50 ... 4.35 -1.96 -0.36 0.72 1103 2021-09-06 17.93 18.45 18.60 ... 4.55 2.27 0.41 0.78 1104 2021-09-07 18.60 19.24 19.56 ... 6.56 4.28 0.79 0.84 [1105 rows x 11 columns]
?
https://github.com/akfamily/akshare
TuShare
TuShare is a tool to realize the process of data collection, cleaning and processing to data storage of financial data such as stocks/futures, and meets the needs of financial quantitative analysts and people who study data analysis in terms of data acquisition. It is characterized by a wide range of data coverage, The interface call is simple and the response is fast.
However, some functions of this project are charged, so you can choose to use them.
import tushare as ts ts.get_hist_data('600848') #Get all the data at once
Output:
open high close low volume p_change ma5 \ date 2012-01-11 6.880 7.380 7.060 6.880 14129.96 2.62 7.060 2012-01-12 7.050 7.100 6.980 6.900 7895.19 -1.13 7.020 2012-01-13 6.950 7.000 6.700 6.690 6611.87 -4.01 6.913 2012-01-16 6.680 6.750 6.510 6.480 2941.63 -2.84 6.813 2012-01-17 6.660 6.880 6.860 6.460 8642.57 5.38 6.822 2012-01-18 7.000 7.300 6.890 6.880 13075.40 0.44 6.788 2012-01-19 6.690 6.950 6.890 6.680 6117.32 0.00 6.770 2012-01-20 6.870 7.080 7.010 6.870 6813.09 1.74 6.832 ma10 ma20 v_ma5 v_ma10 v_ma20 turnover date 2012-01-11 7.060 7.060 14129.96 14129.96 14129.96 0.48 2012-01-12 7.020 7.020 11012.58 11012.58 11012.58 0.27 2012-01-13 6.913 6.913 9545.67 9545.67 9545.67 0.23 2012-01-16 6.813 6.813 7894.66 7894.66 7894.66 0.10 2012-01-17 6.822 6.822 8044.24 8044.24 8044.24 0.30 2012-01-18 6.833 6.833 7833.33 8882.77 8882.77 0.45 2012-01-19 6.841 6.841 7477.76 8487.71 8487.71 0.21 2012-01-20 6.863 6.863 7518.00 8278.38 8278.38 0.23
?
https://github.com/waditu/tushare
GoPUP
The data collected by the GoPUP project comes from public data sources and does not involve any personal privacy data or non-public data. But in the same way, some interfaces need to register TOKEN to use.
import gopup as gp df = gp.weibo_index(word="epidemic", time_type="1hour") print(df)
Output:
Epidemic index 2022-12-17 18:15:00 18544 2022-12-17 18:20:00 14927 2022-12-17 18:25:00 13004 2022-12-17 18:30:00 13145 2022-12-17 18:35:00 13485 2022-12-17 18:40:00 14091 2022-12-17 18:45:00 14265 2022-12-17 18:50:00 14115 2022-12-17 18:55:00 15313 2022-12-17 19:00:00 14346 2022-12-17 19:05:00 14457 2022-12-17 19:10:00 13495 2022-12-17 19:15:00 14133
?
https://github.com/justinzm/gopup
GeneralNewsExtractor
This project is based on the paper “Webpage Text Extraction Method Based on Text and Symbol Density”. The text extractor implemented in Python can be used to extract the content, author, and title of the text in HTML.
>>> from gne import GeneralNewsExtractor >>> html = '''Rendered web page HTML code''' >>> extractor = GeneralNewsExtractor() >>> result = extractor. extract(html, noise_node_list=['//div[@class="comment-list"]']) >>> print(result)
Output:
{"title": "xxxx", "publish_time": "2019-09-10 11:12:13", "author": "yyy", "content": "zzzz", "images": [ "/xxx.jpg", "/yyy.png"]}
News page extraction example
?
https://github.com/GeneralNewsExtractor/GeneralNewsExtractor
Crawler
Crawlers are also a major application direction of the Python language. Many friends also start with crawlers. Let’s take a look at some excellent crawler projects.
playwright-python
Microsoft’s open source browser automation tool can operate browsers in the Python language. Chromium, Firefox, and WebKit browsers under Linux, macOS, and Windows are supported.
from playwright.sync_api import sync_playwright with sync_playwright() as p: for browser_type in [p.chromium, p.firefox, p.webkit]: browser = browser_type. launch() page = browser. new_page() page.goto('http://whatsmyuseragent.org/') page.screenshot(path=f'example-{browser_type.name}.png') browser. close()
?
https://github.com/microsoft/playwright-python
awesome-python-login-model
This project collects the login methods of major websites and crawler programs of some websites. The implementation of login method includes selenium login, direct simulation login through packet capture, etc. It is helpful for novices to research and write crawlers.
However, as we all know, crawlers are very maintenance intensive. This project has not been updated for a long time, so there are still doubts about whether the various login interfaces can still be used normally. Everyone chooses to use it, or develop it on their own.
?
https://github.com/Kr1s77/awesome-python-login-model
DecryptLogin
Compared with the previous one, this project is still being updated continuously. It also simulates logging in to major websites, which is still very valuable for beginners.
from DecryptLogin import login # the instanced Login class object lg = login. Login() # use the provided api function to login in the target website (e.g., twitter) infos_return, session = lg.twitter(username='Your Username', password='Your Password')
?
https://github.com/CharlesPikachu/DecryptLogin
Scylla
Scylla is a high-quality free proxy IP pooling tool that currently only supports Python 3.6.
http://localhost:8899/api/v1/stats
Output:
{ "median": 181.2566407083, "valid_count": 1780, "total_count": 9528, "mean": 174.3290085201 }
?
https://github.com/scylladb/scylladb
ProxyPool
The main function of the crawler agent IP pool project is to regularly collect free agents published on the Internet for verification and storage, and to regularly verify and store agents to ensure the availability of agents, and to provide API and CLI. At the same time, the proxy source can also be expanded to increase the quality and quantity of the proxy pool IP. The project design document is detailed, the module structure is concise and easy to understand, and it is suitable for crawler novice to better learn crawler technology.
import requests def get_proxy(): return requests.get("http://127.0.0.1:5010/get/").json() def delete_proxy(proxy): requests.get("http://127.0.0.1:5010/delete/?proxy={}".format(proxy)) # your spider code def getHtml(): #.... retry_count = 5 proxy = get_proxy().get("proxy") while retry_count > 0: try: html = requests.get('http://www.example.com', proxies={"http": "http://{}".format(proxy)}) # use proxy access return html except Exception: retry_count -= 1 # Delete the agent in the agent pool delete_proxy(proxy) return None
?
https://github.com/Python3WebSpider/ProxyPool
getproxy
getproxy is a program for grabbing and distributing proxy websites, obtaining http/https proxies, and updating data every 15 minutes.
(test2.7) ? ~ getproxy INFO: getproxy. getproxy: [*] Init INFO:getproxy.getproxy:[*] Current Ip Address: 1.1.1.1 INFO: getproxy. getproxy: [*] Load input proxies INFO: getproxy. getproxy: [*] Validate input proxies INFO: getproxy. getproxy: [*] Load plugins INFO: getproxy. getproxy: [*] Grab proxies INFO: getproxy. getproxy: [*] Validate web proxies INFO: getproxy.getproxy:[*] Check 6666 proxies, Got 666 valid proxies ...
?
https://github.com/fate0/getproxy
freeproxy
It is also a project for grabbing free proxies, which supports many proxy websites and is easy to use.
from freeproxy import freeproxy proxy_sources = ['proxylistplus', 'kuaidali'] fp_client = freeproxy. FreeProxy(proxy_sources=proxy_sources) headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36' } response = fp_client.get('https://space.bilibili.com/406756145', headers=headers) print(response. text)
?
https://github.com/CharlesPikachu/freeproxy
fake-useragent
Masquerade browser identity, often used in crawlers. There is very little code for this project, you can read it to see how ua.random returns a random browser identity.
from fake_useragent import UserAgent ua = UserAgent() ua.ie # Mozilla/5.0 (Windows; U; MSIE 9.0; Windows NT 9.0; en-US); ua.msie # Mozilla/5.0 (compatible; MSIE 10.0; Macintosh; Intel Mac OS X 10_7_3; Trident/6.0)' ua['Internet Explorer'] # Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; GTB7.4; InfoPath.2; SV1; .NET CLR 3.3.69573; WOW64; en-US) ua.opera # Opera/9.80 (X11; Linux i686; U; ru) Presto/2.8.131 Version/11.11 ua.chrome # Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.2 (KHTML, like Gecko) Chrome/22.0.1216.0 Safari/537.2' ua.google # Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_4) AppleWebKit/537.13 (KHTML, like Gecko) Chrome/24.0.1290.1 Safari/537.13 ua['google chrome'] # Mozilla/5.0 (X11; CrOS i686 2268.111.0) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.57 Safari/536.11 ua.firefox # Mozilla/5.0 (Windows NT 6.2; Win64; x64; rv:16.0.1) Gecko/20121011 Firefox/16.0.1 ua.ff # Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:15.0) Gecko/20100101 Firefox/15.0.1 ua.safari # Mozilla/5.0 (iPad; CPU OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5355d Safari/8536.25 # and the best one, get a random browser user-agent string ua.random
?
https://github.com/fake-useragent/fake-useragent
Web related
Python Web has too many excellent and old-fashioned libraries, such as Django, and Flask will not be mentioned. Everyone knows that we will introduce a few niche but easy-to-use ones.
streamlit
streamlit is a Python framework that can quickly make data into visual and interactive pages. Turn our data into charts in minutes.
import streamlit as st x = st. slider('Select a value') st. write(x, 'squared is', x * x)
Output:
?
https://github.com/streamlit/streamlit
wagtail
is a powerful open source Django CMS (Content Management System). First of all, the project is updated and iteratively active. Secondly, the functions mentioned on the project homepage are all free, and there is no paid unlocking operation. Focus on content management, not bound by front-end implementation.
?
https://github.com/wagtail/wagtail
fastapi
High performance web framework based on Python 3.6+. “As the name suggests”, using FastAPI to write interfaces is fast and easy to debug. Python is making progress and based on these progress, it makes Web development faster and stronger.
from typing import Union from fastapi import FastAPI app = FastAPI() @app.get("/") def read_root(): return {"Hello": "World"} @app.get("/items/{item_id}") def read_item(item_id: int, q: Union[str, None] = None): return {"item_id": item_id, "q": q}
?
https://github.com/tiangolo/fastapi
django-blog-tutorial
This is a Django usage tutorial. This project takes us step by step to use Django to develop a personal blog system from scratch, and master Django development skills while practicing.
?
https://github.com/jukanntenn/django-blog-tutorial
dash
dash is a web framework specially designed for machine learning, through which a machine learning APP can be quickly built.
?
https://github.com/plotly/dash
PyWebIO
It is also a very good Python web framework. It is really convenient to complete the construction of the entire web page without writing front-end code.
?
https://github.com/pywebio/PyWebIO
Python Tutorial
practical-python
A very popular Python learning resource project, it is a tutorial in MarkDown format, which is very friendly.
?
https://github.com/dabeaz-course/practical-python
learn-python3
A Python3 tutorial in the form of Jupyter notebooks for easy running and reading. And it also includes practice questions, friendly to novices.
?
https://github.com/jerry-git/learn-python3
python-guide
Introductory Python tutorial written by kennethreitz, author of the Requests library. Not only at the grammatical level, but also covering project structure, code style, advanced, tools and other aspects. Let’s appreciate the demeanor of the great gods in the tutorial~
?
https://github.com/realpython/python-guide
Other
pytools
This is a project similar to a toolset written by a great god, which contains many interesting gadgets.
The screenshot is just the tip of the iceberg, the whole picture needs to be explored by yourself
import random from pytools import pytools tool_client = pytools.pytools() all_supports = tool_client. getallsupported() tool_client.execute(random.choice(list(all_supports.values())))
?
https://github.com/CharlesPikachu/pytools
amazing-qr
It is an interesting library that can generate dynamic, colorful, and various QR codes.
#3 -n, -d amzqr https://github.com -n github_qr.jpg -d .../paths/
?
https://github.com/x-hw/amazing-qr
sh
sh is a mature library used to replace subprocess, which allows us to call any program, it looks like it is a function.
$> ./run.sh FunctionalTests.test_unicode_arg
?
https://github.com/amoffat/sh
tqdm
Powerful, fast, and easily extensible progress bar library for Python.
from tqdm import tqdm for i in tqdm(range(10000)): ...
?
https://github.com/tqdm/tqdm
loguru
A library that makes logging in Python easy.
from loguru import logger logger.debug("That's it, beautiful and simple logging!")
?
https://github.com/Delgan/loguru
click
A third-party library for Python for quickly creating command lines. Support decorator method calls, multiple parameter types, automatic generation of help information, etc.
import click @click.command() @click. option("--count", default=1, help="Number of greetings.") @click. option("--name", prompt="Your name", help="The person to greet.") def hello(count, name): """Simple program that greets NAME for a total of COUNT times.""" for _ in range(count): click.echo(f"Hello, {name}!") if __name__ == '__main__': hello()
Output:
$ python hello.py --count=3 Your name: Click Hello, Click! Hello, Click! Hello, Click!
Keymouse Go
The simplified green version of the button wizard implemented by Python records the user’s mouse and keyboard operations, automatically executes the previously recorded operations, and can set the number of executions. When doing some simple, monotonous and repetitive operations, using this software can save you a lot of trouble. You only need to record once, and KeymouseGo will do the rest.
?
https://github.com/taojy123/KeymouseGo
Well, this is all the content shared today, if you like it, click Like~
—END—