Problems encountered by pycharm writing scrapy

Table of Contents

  • background
  • create scrapy
  • uncomfortable start
  • specified type
  • Modify the template and specify to use
  • run scrapy

Background

There is actually a python program that the almighty pycharm cannot solve? ? ?

Create scrapy

Since there is no option to directly create a Scrapy project in PyCharm, create a project using the command line

Install scrapy

pip install scrapy

View version
If you can see the version, the installation is successful

scrapy version

Create a scrapy project

scrapy startproject yourprojectname

Create a crawler according to the prompt

cd asd
scrapy genspider example example.com

This creates a successful

Uncomfortable start

Use PyCharm to open this project, but found that the parse function in the crawler is a gray box

Signature of method 'ExampleSpider.parse()' does not match signature of the base method in class 'Spider'

Take a look at how the parse function in the parent class is defined, because we are rewriting the method of the parent class, just click this in pycharm

Enter the parent class and you can see that there is a **kwargs parameter

Add the **kwargs parameter to the parse of your own crawler, and the gray frame will disappear, so it’s not uncomfortable to look at

Specify type

But there is still a problem. The gray frame is just uncomfortable to look at. It is really uncomfortable without code prompts. Who knows if . doesn’t come out?

Run it and print its type to see, you can see that it is of type scrapy.http.response.html.HtmlResponse

scrapy crawl example # example is the name of your crawler, which is the name attribute in the class


If you don’t want to see so many annoying log information, add the log level in settings.py

LOG_LEVEL = 'WARNING'

Re-run to see the effect, the world is quiet

Now that you know what type it is, just specify the type for him.

from scrapy.http.response import Response

def parse(self, response: Response, **kwargs):

. came out, comfortable

Modify the template and specify the use

Finally, you can see that there are normal code prompts, but you can’t write like this every time. Check the genspider command and find that the -t parameter can use a custom template

You can see that the normal creation of crawlers is to use the basic template

Find this folder under your interpreter path

\Python39\Lib\site-packages\scrapy\templates\spiders

This template file can be found under this path

If you use pycharm, you can also do this faster, and directly locate the interpreter location to prevent you from using a different virtual environment path

Import the scrapy package and hold down the ctrl key on the keyboard and then click with the left mouse button to jump to its source code

After clicking, jump to __init__.py, in fact, it doesn’t matter which one to jump, as long as it belongs to the scrapy package.

Then you can find the template file


In this way, you can see his content, and you can also right-click to open his folder

I choose to copy it out (you can also change it directly on it, so you don’t need to use genspider -t to specify the template)
mytemplate.tmpl You can call it whatever you want

import scrapy
from scrapy.http.response import Response


class $classname(scrapy. Spider):
    name = "$name"
    allowed_domains = ["$domain"]
    start_urls = ["$url"]

    def parse(self, response: Response, **kwargs):
        pass

Then you can use custom templates to create crawlers

# scrapy genspider -t template name crawler name domain name
 scrapy genspider -t mytemplate test test.com

You can see that there is no problem with the newly created crawler, comfortable code prompts

Note that the template name must be written correctly, otherwise an error will be reported

If you forget your template name, you can install the prompt to check

scrapy genspider --list

Run scrapy

The aforementioned start scrapy needs to use the following command in the terminal

scrapy crawl example # example is the name of your crawler, which is the name attribute in the class

so troublesome? I have used the almighty pycharm, so there is no easy way for me to run and debug with breakpoints?
Can I right click to run or debug? Almighty ctrl + shift + F10?

That must be possible

In the same layer as scrapy.cfg in the root directory of the project, create main.py (whatever you want line) file write the following code

main.py

from scrapy.cmdline import execute
import os
import sys

if __name__ == '__main__':
    sys.path.append(os.path.dirname(os.path.abspath(__file__)))
    execute(['scrapy', 'crawl', 'example']) # replace the last parameter with your own crawler name

Run this file directly to run scrapy

After running it once, you can use CTRL + F5

You can also use DEBUG to debug, you can see that it has stopped at the breakpoint