Use Scrapy to build your own dataset

1. Description When I first started working in industry, one of the things I quickly realized was that sometimes you have to collect, organize and clean your own data. In this tutorial, we will collect data from a crowdfunding site called FundRazr. Like many websites, this website has its own structure, form, and has a […]

Problems encountered by pycharm writing scrapy

Table of Contents background create scrapy uncomfortable start specified type Modify the template and specify to use run scrapy Background There is actually a python program that the almighty pycharm cannot solve? ? ? Create scrapy Since there is no option to directly create a Scrapy project in PyCharm, create a project using the command […]

Scrapy crawls asynchronously loaded data

Use of Scrapy middleware foreword Scrapy middleware 1 Classification and function of scrapy middleware 1.1 Classification of scrapy middleware 2 The role of scrapy middleware 2 How to download middleware: process_request(request, spider): process_response(request, response, spider): process_exception(request, exception, spider): 3 Grab some news 3.1 Pre-crawl analysis 3.2 Code configuration 3.3 Print results Summarize Foreword What should […]

It’s “3202” and still use selemunim? Teach you to use scrapy + DrissionPage to crawl 51job and pass the slider verification code

foreword 1. What is DrissionPage? Second, scrapy + DeissionPage crawls 51 jobs 1. Create scrapy project 2. Rewrite middewares.py 3. Write a_51job.py Summary Foreword When crawling website data, we often encounter some encrypted data or various verification codes. However, using request directly requires js reverse engineering to take a lot of time, but it is […]

MongoDB aggregation operation of Scrapy framework

Directory MongoDB aggregation operation Basic Syntax for Aggregate Operations Common Aggregation Operations $group of pipeline commands group by a field Detailed explanation Calculate the average of a field in a collection common expressions $match of pipeline command example $sort of pipeline command $skip and $limit of pipeline commands $project of pipeline command MongoDB aggregation operation […]

Scrapy framework–Request and FormRequest

Directory Request object principle parameter Pass additional data to the callback function principle sample code FormRequest concept parameter request usage example response object parameter Request object Principle Request and response are the most common operations in the crawler. The Request object is generated in the crawler program and passed to the downloader, which >Executes the […]

scrapy — middleware — set User-Agent, proxy

This article mainly talks about scrapy-middleware, and understands the processing flow of middleware. Downloader middleware Downloader middleware, between the downloader and the engine, sets User-Agent, cookie, and proxy. Use selenium in middleware. To use the downloader middleware, first enable the downloader middleware in the settings.py file Same as the pipeline, the smaller the weight value, […]