BeautifulSoup4 quick overview

Beautiful soup is a Python library for extracting data from HTML or XML files. It is based on Python, which is relatively slower than Xpath, but is very powerful. This article introduces the basic usage of this library to help you get started quickly. There are a lot of learning materials online. The super detailed […]

21.8 Python using the BeautifulSoup library

BeautifulSoup library is used to extract data from HTML or XML files. It can automatically convert complex HTML documents into tree structures and provide a simple method to search for nodes in the document, allowing us to easily traverse and modify the content of the HTML document. It is widely used in web crawling and […]

urllib+BeautifulSoup crawls and parses 2345 Weather King historical weather data

urllib + BeautifulSoup crawls and parses 2345 Weather King historical weather data Website: Dongcheng Historical Weather Query_Historical Weather Forecast Query_2345 Weather Forecast 1. Code import json import logging import urllib.parse from datetime import date, datetime from random import randint from time import sleep importpymysql from bs4 import BeautifulSoup #Define target URL import requests def weather_req(): […]

[UCAS Natural Language Processing Assignment 1] Use BeautifulSoup to crawl Chinese and English data, calculate entropy, and verify Zip’s law

Article directory Preface Chinese Data crawling Crawl interface Crawl code Data cleaning data analysis Experimental results English Data crawling Crawl interface Dynamic crawling Data cleaning data analysis Experimental results in conclusion Foreword This article crawls Chinese and English corpora respectively, and calculates their corresponding entropy in the two languages to verify Zip’s law. github: ShiyuNee/python-spider […]

Application of BeautifulSoup in data collection

Table of Contents 1. Installation and import of BeautifulSoup library 2. Parsing of HTML or XML documents 1. Directly pass the HTML text string as a parameter to the BeautifulSoup function: 2. Load HTML or XML documents through file paths or URLs: 3. Navigation and Search 1. find() method: Find an element in the document. […]

Add BeautifulSoup4 module for Qemu aarch32

Environment Qemu: 2.8.0 Development board: vexpress-ca9 Overview The previous blog post has enabled our development board to successfully ping Baidu. It is said that Python’s network function is also very powerful, and Beautiful Soup is a library of python, but it is not a standard library, so it needs to be installed separately. The most […]

Basic use of BeautifulSoup

Installation pip install beautifulsoup4 Basic usage from bs4 import BeautifulSoup html = ”’ <div data-v-9db1ecac=”” class=”el-carousel__item” style=”transform: translateX(-8790px) scale(1);”> <div data-v-9db1ecac=”” class=”el-image surface”> <img src=”http://47.108.93.211:8080/file/download/d39ab85078e44021b78562ea56cec475.jpg” alt=”Landscape 3″ class=”el-image__inner”> </div> </div> <div data-v-9db1ecac=”” class=”el-carousel__item” style=”transform: translateX(-7325px) scale(1);”> <div data-v-9db1ecac=”” class=”el-image surface”> <img src=”http://47.108.93.211:8080/file/download/6ea2725186d443e0b527ca56b1ab151b.jpg” alt=”Landscape 2″ class=”el-image__inner”> </div> </div> <div data-v-9db1ecac=”” class=”el-carousel__item” style=”transform: translateX(-5860px) scale(1);”> <div data-v-9db1ecac=”” […]