Those interesting, fun and powerful Python libraries

The Python language has always been famous for its rich third-party libraries. Today, I will introduce a few very nice libraries, which are fun, fun and powerful!

Data collection

In today’s Internet era, data is so important, first of all, let’s introduce several excellent data collection projects

AKShare

AKShare is a Python-based financial data interface library, the purpose of which is to realize fundamental data, real-time and historical market data, and derivative data from data collection, A set of tools from data cleaning to data landing, mainly used for academic research purposes.

import akshare as ak

stock_zh_a_hist_df = ak.stock_zh_a_hist(symbol="000001", period="daily", start_date="20170301", end_date='20210907', adjust="")
print(stock_zh_a_hist_df)

Output:

 Date Opening Closing Highest ... Amplitude Change Amount Change Amount Turnover Rate
0 2017-03-01 9.49 9.49 9.55 ... 0.84 0.11 0.01 0.21
1 2017-03-02 9.51 9.43 9.54 ... 1.26 -0.63 -0.06 0.24
2 2017-03-03 9.41 9.40 9.43 ... 0.74 -0.32 -0.03 0.20
3 2017-03-06 9.40 9.45 9.46 ... 0.74 0.53 0.05 0.24
4 2017-03-07 9.44 9.45 9.46 ... 0.63 0.00 0.00 0.17
          ... ... ... ... ... ... ... ... ... ...
1100 2021-09-01 17.48 17.88 17.92 ... 5.11 0.45 0.08 1.19
1101 2021-09-02 18.00 18.40 18.78 ... 5.48 2.91 0.52 1.25
1102 2021-09-03 18.50 18.04 18.50 ... 4.35 -1.96 -0.36 0.72
1103 2021-09-06 17.93 18.45 18.60 ... 4.55 2.27 0.41 0.78
1104 2021-09-07 18.60 19.24 19.56 ... 6.56 4.28 0.79 0.84
[1105 rows x 11 columns]

?

https://github.com/akfamily/akshare

TuShare

TuShare is a tool to realize the process of data collection, cleaning and processing to data storage of financial data such as stocks/futures, and meets the needs of financial quantitative analysts and people who study data analysis in terms of data acquisition. It is characterized by a wide range of data coverage, The interface call is simple and the response is fast.

However, some functions of this project are charged, so you can choose to use them.

import tushare as ts

ts.get_hist_data('600848') #Get all the data at once

Output:

 open high close low volume p_change ma5 \
date
2012-01-11 6.880 7.380 7.060 6.880 14129.96 2.62 7.060
2012-01-12 7.050 7.100 6.980 6.900 7895.19 -1.13 7.020
2012-01-13 6.950 7.000 6.700 6.690 6611.87 -4.01 6.913
2012-01-16 6.680 6.750 6.510 6.480 2941.63 -2.84 6.813
2012-01-17 6.660 6.880 6.860 6.460 8642.57 5.38 6.822
2012-01-18 7.000 7.300 6.890 6.880 13075.40 0.44 6.788
2012-01-19 6.690 6.950 6.890 6.680 6117.32 0.00 6.770
2012-01-20 6.870 7.080 7.010 6.870 6813.09 1.74 6.832

ma10 ma20 v_ma5 v_ma10 v_ma20 turnover
date
2012-01-11 7.060 7.060 14129.96 14129.96 14129.96 0.48
2012-01-12 7.020 7.020 11012.58 11012.58 11012.58 0.27
2012-01-13 6.913 6.913 9545.67 9545.67 9545.67 0.23
2012-01-16 6.813 6.813 7894.66 7894.66 7894.66 0.10
2012-01-17 6.822 6.822 8044.24 8044.24 8044.24 0.30
2012-01-18 6.833 6.833 7833.33 8882.77 8882.77 0.45
2012-01-19 6.841 6.841 7477.76 8487.71 8487.71 0.21
2012-01-20 6.863 6.863 7518.00 8278.38 8278.38 0.23

?

https://github.com/waditu/tushare

GoPUP

The data collected by the GoPUP project comes from public data sources and does not involve any personal privacy data or non-public data. But in the same way, some interfaces need to register TOKEN to use.

import gopup as gp
df = gp.weibo_index(word="epidemic", time_type="1hour")
print(df)

Output:

 Epidemic
index
2022-12-17 18:15:00 18544
2022-12-17 18:20:00 14927
2022-12-17 18:25:00 13004
2022-12-17 18:30:00 13145
2022-12-17 18:35:00 13485
2022-12-17 18:40:00 14091
2022-12-17 18:45:00 14265
2022-12-17 18:50:00 14115
2022-12-17 18:55:00 15313
2022-12-17 19:00:00 14346
2022-12-17 19:05:00 14457
2022-12-17 19:10:00 13495
2022-12-17 19:15:00 14133

?

https://github.com/justinzm/gopup

GeneralNewsExtractor

This project is based on the paper “Webpage Text Extraction Method Based on Text and Symbol Density”. The text extractor implemented in Python can be used to extract the content, author, and title of the text in HTML.

>>> from gne import GeneralNewsExtractor

>>> html = '''Rendered web page HTML code'''

>>> extractor = GeneralNewsExtractor()
>>> result = extractor. extract(html, noise_node_list=['//div[@class="comment-list"]'])
>>> print(result)

Output:

{"title": "xxxx", "publish_time": "2019-09-10 11:12:13", "author": "yyy", "content": "zzzz", "images": [ "/xxx.jpg", "/yyy.png"]}

News page extraction example

?

https://github.com/GeneralNewsExtractor/GeneralNewsExtractor

Crawler

Crawlers are also a major application direction of the Python language. Many friends also start with crawlers. Let’s take a look at some excellent crawler projects.

playwright-python

Microsoft’s open source browser automation tool can operate browsers in the Python language. Chromium, Firefox, and WebKit browsers under Linux, macOS, and Windows are supported.

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    for browser_type in [p.chromium, p.firefox, p.webkit]:
        browser = browser_type. launch()
        page = browser. new_page()
        page.goto('http://whatsmyuseragent.org/')
        page.screenshot(path=f'example-{browser_type.name}.png')
        browser. close()

?

https://github.com/microsoft/playwright-python

awesome-python-login-model

This project collects the login methods of major websites and crawler programs of some websites. The implementation of login method includes selenium login, direct simulation login through packet capture, etc. It is helpful for novices to research and write crawlers.

However, as we all know, crawlers are very maintenance intensive. This project has not been updated for a long time, so there are still doubts about whether the various login interfaces can still be used normally. Everyone chooses to use it, or develop it on their own.

?

https://github.com/Kr1s77/awesome-python-login-model

DecryptLogin

Compared with the previous one, this project is still being updated continuously. It also simulates logging in to major websites, which is still very valuable for beginners.

from DecryptLogin import login

# the instanced Login class object
lg = login. Login()
# use the provided api function to login in the target website (e.g., twitter)
infos_return, session = lg.twitter(username='Your Username', password='Your Password')

?

https://github.com/CharlesPikachu/DecryptLogin

Scylla

Scylla is a high-quality free proxy IP pooling tool that currently only supports Python 3.6.

http://localhost:8899/api/v1/stats

Output:

{
    "median": 181.2566407083,
    "valid_count": 1780,
    "total_count": 9528,
    "mean": 174.3290085201
}

?

https://github.com/scylladb/scylladb

ProxyPool

The main function of the crawler agent IP pool project is to regularly collect free agents published on the Internet for verification and storage, and to regularly verify and store agents to ensure the availability of agents, and to provide API and CLI. At the same time, the proxy source can also be expanded to increase the quality and quantity of the proxy pool IP. The project design document is detailed, the module structure is concise and easy to understand, and it is suitable for crawler novice to better learn crawler technology.

import requests

def get_proxy():
    return requests.get("http://127.0.0.1:5010/get/").json()

def delete_proxy(proxy):
    requests.get("http://127.0.0.1:5010/delete/?proxy={}".format(proxy))

# your spider code

def getHtml():
    #....
    retry_count = 5
    proxy = get_proxy().get("proxy")
    while retry_count > 0:
        try:
            html = requests.get('http://www.example.com', proxies={"http": "http://{}".format(proxy)})
            # use proxy access
            return html
        except Exception:
            retry_count -= 1
    # Delete the agent in the agent pool
    delete_proxy(proxy)
    return None

?

https://github.com/Python3WebSpider/ProxyPool

getproxy

getproxy is a program for grabbing and distributing proxy websites, obtaining http/https proxies, and updating data every 15 minutes.

(test2.7) ? ~ getproxy
INFO: getproxy. getproxy: [*] Init
INFO:getproxy.getproxy:[*] Current Ip Address: 1.1.1.1
INFO: getproxy. getproxy: [*] Load input proxies
INFO: getproxy. getproxy: [*] Validate input proxies
INFO: getproxy. getproxy: [*] Load plugins
INFO: getproxy. getproxy: [*] Grab proxies
INFO: getproxy. getproxy: [*] Validate web proxies
INFO: getproxy.getproxy:[*] Check 6666 proxies, Got 666 valid proxies
...

?

https://github.com/fate0/getproxy

freeproxy

It is also a project for grabbing free proxies, which supports many proxy websites and is easy to use.

from freeproxy import freeproxy

proxy_sources = ['proxylistplus', 'kuaidali']
fp_client = freeproxy. FreeProxy(proxy_sources=proxy_sources)
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36'
}
response = fp_client.get('https://space.bilibili.com/406756145', headers=headers)
print(response. text)

?

https://github.com/CharlesPikachu/freeproxy

fake-useragent

Masquerade browser identity, often used in crawlers. There is very little code for this project, you can read it to see how ua.random returns a random browser identity.

from fake_useragent import UserAgent
ua = UserAgent()

ua.ie
# Mozilla/5.0 (Windows; U; MSIE 9.0; Windows NT 9.0; en-US);
ua.msie
# Mozilla/5.0 (compatible; MSIE 10.0; Macintosh; Intel Mac OS X 10_7_3; Trident/6.0)'
ua['Internet Explorer']
# Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; GTB7.4; InfoPath.2; SV1; .NET CLR 3.3.69573; WOW64; en-US)
ua.opera
# Opera/9.80 (X11; Linux i686; U; ru) Presto/2.8.131 Version/11.11
ua.chrome
# Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.2 (KHTML, like Gecko) Chrome/22.0.1216.0 Safari/537.2'
ua.google
# Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_4) AppleWebKit/537.13 (KHTML, like Gecko) Chrome/24.0.1290.1 Safari/537.13
ua['google chrome']
# Mozilla/5.0 (X11; CrOS i686 2268.111.0) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.57 Safari/536.11
ua.firefox
# Mozilla/5.0 (Windows NT 6.2; Win64; x64; rv:16.0.1) Gecko/20121011 Firefox/16.0.1
ua.ff
# Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:15.0) Gecko/20100101 Firefox/15.0.1
ua.safari
# Mozilla/5.0 (iPad; CPU OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5355d Safari/8536.25

# and the best one, get a random browser user-agent string
ua.random

?

https://github.com/fake-useragent/fake-useragent

Web related

Python Web has too many excellent and old-fashioned libraries, such as Django, and Flask will not be mentioned. Everyone knows that we will introduce a few niche but easy-to-use ones.

streamlit

streamlit is a Python framework that can quickly make data into visual and interactive pages. Turn our data into charts in minutes.

import streamlit as st

x = st. slider('Select a value')
st. write(x, 'squared is', x * x)

Output:

?

https://github.com/streamlit/streamlit

wagtail

is a powerful open source Django CMS (Content Management System). First of all, the project is updated and iteratively active. Secondly, the functions mentioned on the project homepage are all free, and there is no paid unlocking operation. Focus on content management, not bound by front-end implementation.

?

https://github.com/wagtail/wagtail

fastapi

High performance web framework based on Python 3.6+. “As the name suggests”, using FastAPI to write interfaces is fast and easy to debug. Python is making progress and based on these progress, it makes Web development faster and stronger.

from typing import Union

from fastapi import FastAPI

app = FastAPI()


@app.get("/")
def read_root():
    return {"Hello": "World"}


@app.get("/items/{item_id}")
def read_item(item_id: int, q: Union[str, None] = None):
    return {"item_id": item_id, "q": q}

?

https://github.com/tiangolo/fastapi

django-blog-tutorial

This is a Django usage tutorial. This project takes us step by step to use Django to develop a personal blog system from scratch, and master Django development skills while practicing.

?

https://github.com/jukanntenn/django-blog-tutorial

dash

dash is a web framework specially designed for machine learning, through which a machine learning APP can be quickly built.

?

https://github.com/plotly/dash

PyWebIO

It is also a very good Python web framework. It is really convenient to complete the construction of the entire web page without writing front-end code.

?

https://github.com/pywebio/PyWebIO

Python Tutorial

practical-python

A very popular Python learning resource project, it is a tutorial in MarkDown format, which is very friendly.

?

https://github.com/dabeaz-course/practical-python

learn-python3

A Python3 tutorial in the form of Jupyter notebooks for easy running and reading. And it also includes practice questions, friendly to novices.

?

https://github.com/jerry-git/learn-python3

python-guide

Introductory Python tutorial written by kennethreitz, author of the Requests library. Not only at the grammatical level, but also covering project structure, code style, advanced, tools and other aspects. Let’s appreciate the demeanor of the great gods in the tutorial~

?

https://github.com/realpython/python-guide

Other

pytools

This is a project similar to a toolset written by a great god, which contains many interesting gadgets.

The screenshot is just the tip of the iceberg, the whole picture needs to be explored by yourself

import random
from pytools import pytools

tool_client = pytools.pytools()
all_supports = tool_client. getallsupported()
tool_client.execute(random.choice(list(all_supports.values())))

?

https://github.com/CharlesPikachu/pytools

amazing-qr

It is an interesting library that can generate dynamic, colorful, and various QR codes.

#3 -n, -d
amzqr https://github.com -n github_qr.jpg -d .../paths/

?

https://github.com/x-hw/amazing-qr

sh

sh is a mature library used to replace subprocess, which allows us to call any program, it looks like it is a function.

$> ./run.sh FunctionalTests.test_unicode_arg

?

https://github.com/amoffat/sh

tqdm

Powerful, fast, and easily extensible progress bar library for Python.

from tqdm import tqdm
for i in tqdm(range(10000)):
    ...

?

https://github.com/tqdm/tqdm

loguru

A library that makes logging in Python easy.

from loguru import logger

logger.debug("That's it, beautiful and simple logging!")

?

https://github.com/Delgan/loguru

click

A third-party library for Python for quickly creating command lines. Support decorator method calls, multiple parameter types, automatic generation of help information, etc.

import click

@click.command()
@click. option("--count", default=1, help="Number of greetings.")
@click. option("--name", prompt="Your name", help="The person to greet.")
def hello(count, name):
    """Simple program that greets NAME for a total of COUNT times."""
    for _ in range(count):
        click.echo(f"Hello, {name}!")

if __name__ == '__main__':
    hello()

Output:

$ python hello.py --count=3
Your name: Click
Hello, Click!
Hello, Click!
Hello, Click!

Keymouse Go

The simplified green version of the button wizard implemented by Python records the user’s mouse and keyboard operations, automatically executes the previously recorded operations, and can set the number of executions. When doing some simple, monotonous and repetitive operations, using this software can save you a lot of trouble. You only need to record once, and KeymouseGo will do the rest.

?

https://github.com/taojy123/KeymouseGo

Well, this is all the content shared today, if you like it, click Like~

END