scrapy — middleware — set User-Agent, proxy

This article mainly talks about scrapy-middleware, and understands the processing flow of middleware.

Downloader middleware

Downloader middleware, between the downloader and the engine, sets User-Agent, cookie, and proxy. Use selenium in middleware.
To use the downloader middleware, first enable the downloader middleware in the settings.py file
Same as the pipeline, the smaller the weight value, the earlier it will be executed.

DOWNLOADER_MIDDLEWARES = {<!-- -->
    "Mid.middlewares.MidDownloaderMiddleware": 543,
}

process_request() method: Set User-Agent, cookie, and proxy. This method among a bunch of middleware is: the smaller the weight value, the earlier it is executed.

'''This method is called automatically before the engine sends the requested information to the downloader.
        :param request: current request
        :param spider: the spider that made the request
        :return: You can't give it casually.
                Note: The return value of process_request is specified.
                1. If the return is None or no return is written, no interception will be performed, and the middleware execution will continue backwards. (There are a bunch of middleware between the engine and the downloader. If there is no interception, it will be executed according to the weight of the middleware, and it will not be passed to the downloader until the execution of the middleware is completed.)
                2. If the request is returned, the subsequent middleware will no longer be executed, and the request will be re-delivered to the engine, and the engine will re-throw it to the scheduler. The downloader can't get the url either.
                3. If the response is returned, the subsequent middleware will not be executed, and the response information will be handed over to the engine, and the engine will throw the response to the spider for data processing. (meaning: it will not reach the downloader, it will pass the response information directly to the engine through the middleware that returns the response, and then the spider will do data processing)
                '''

process_response() method: mainly where the downloader returns to the engine. This method among a bunch of middleware is: the greater the weight value, the earlier it will be executed.

return response is not intercepted, and continues to submit and return.
 The return request response is intercepted, and the returned content is directly fed back to the scheduler (through the engine), and the subsequent process_response() cannot receive the response content.
class MidDownloaderMiddleware:
    # Not all methods need to be defined. If a method is not defined,
    # scrapy acts as if the downloader middleware does not modify the
    # passed objects.

    @classmethod
    def from_crawler(cls, crawler):
        # This method is used by Scrapy to create your spiders.
        s = cls()
        crawler.signals.connect(s.spider_opened, signal=signals.spider_opened)
        returns

    def process_request(self, request, spider): #Set user-agent, cookie
        # Called for each request that goes through the downloader
        # middleware.

        # Must either:
        # - return None: continue processing this request
        # - or return a Response object
        # - or return a Request object
        # - or raise IgnoreRequest: process_exception() methods of
        # installed downloader middleware will be called
        
        print('I am ware, process_request')
        return None

    def process_response(self, request, response, spider): #mainly where the downloader returns to the engine
        # Called with the response returned from the downloader.

        # Must either;
        # - return a Response object
        # - return a Request object
        # - or raise IgnoreRequest
        '''

        :param request:
        :param response:
        :param spider:
        :return:
               The return response passes the response content to other components or other process_response() processing through the engine. Do not intercept, continue to submit and return.
               The return request response is intercepted, and the returned content is directly fed back to the scheduler (through the engine), and the subsequent process_response() cannot receive the response content. '''
        print('I am ware, process_response')
        return response

    def process_exception(self, request, exception, spider): #There is an error in the current request process, and it will be executed automatically
        # Called when a download handler or a process_request()
        # (from other downloader middleware) raises an exception.

        # Must either:
        # - return None: continue processing this exception
        # - return a Response object: stops process_exception() chain
        # - return a Request object: stops process_exception() chain
        pass

    def spider_opened(self, spider): #Open, the first is to run
        print('I am ware, spider_opened')
        #spider.logger.info("Spider opened: %s" % spider.name)


operation result:

I am ware, spider_opened
I am ware, process_request
I am ware, process_response
Baidu, you will know

If there are multiple downloader middleware in the file, what is their running process?

To use the downloader middleware, first enable the downloader middleware in the settings.py file

DOWNLOADER_MIDDLEWARES = {<!-- -->
    "Mid.middlewares.MidDownloaderMiddleware1": 543,
"Mid.middlewares.MidDownloaderMiddleware2": 544,
}

Multiple downloader middleware.

#Downloader middleware, between the downloader and the engine, set user-agent, cookie
class MidDownloaderMiddleware1:
    # Not all methods need to be defined. If a method is not defined,
    # scrapy acts as if the downloader middleware does not modify the
    # passed objects.

    @classmethod
    def from_crawler(cls, crawler):
        # This method is used by Scrapy to create your spiders.
        s = cls()
        crawler.signals.connect(s.spider_opened, signal=signals.spider_opened)
        returns

    def process_request(self, request, spider): #Set user-agent, cookie
        # Called for each request that goes through the downloader
        # middleware.

        # Must either:
        # - return None: continue processing this request
        # - or return a Response object
        # - or return a Request object
        # - or raise IgnoreRequest: process_exception() methods of
        # installed downloader middleware will be called
        
        print('I am ware1, process_request')
        return None

    def process_response(self, request, response, spider): #mainly where the downloader returns to the engine
        # Called with the response returned from the downloader.

        # Must either;
        # - return a Response object
        # - return a Request object
        # - or raise IgnoreRequest
       
        print('I am ware1, process_response')
        return response

    def process_exception(self, request, exception, spider): #There is an error in the current request process, and it will be executed automatically
        # Called when a download handler or a process_request()
        # (from other downloader middleware) raises an exception.

        # Must either:
        # - return None: continue processing this exception
        # - return a Response object: stops process_exception() chain
        # - return a Request object: stops process_exception() chain
        pass

    def spider_opened(self, spider): #Open, the first is to run
        print('I am ware1, spider_opened')
        #spider.logger.info("Spider opened: %s" % spider.name)


class MidDownloaderMiddleware2:
    # Not all methods need to be defined. If a method is not defined,
    # scrapy acts as if the downloader middleware does not modify the
    # passed objects.

    @classmethod
    def from_crawler(cls, crawler):
        # This method is used by Scrapy to create your spiders.
        s = cls()
        crawler.signals.connect(s.spider_opened, signal=signals.spider_opened)
        returns

    def process_request(self, request, spider): # Set user-agent, cookie
        # Called for each request that goes through the downloader
        # middleware.

        # Must either:
        # - return None: continue processing this request
        # - or return a Response object
        # - or return a Request object
        # - or raise IgnoreRequest: process_exception() methods of
        # installed downloader middleware will be called
      
        print('I am ware2, process_request')
        return None

    def process_response(self, request, response, spider): # Mainly where the downloader returns to the engine
        # Called with the response returned from the downloader.

        # Must either;
        # - return a Response object
        # - return a Request object
        # - or raise IgnoreRequest
      
        print('I am ware2, process_response')
        return response

    def process_exception(self, request, exception, spider): # There is an error in the current request process, and it will be executed automatically
        # Called when a download handler or a process_request()
        # (from other downloader middleware) raises an exception.

        # Must either:
        # - return None: continue processing this exception
        # - return a Response object: stops process_exception() chain
        # - return a Request object: stops process_exception() chain
        pass

    def spider_opened(self, spider): # open, the first is to run
        print('I am ware2, spider_opened')
        # spider.logger.info("Spider opened: %s" % spider.name)

Run results

I am ware1, spider_opened
I am ware2, spider_opened
I am ware1, process_request
I am ware2, process_request
I am ware2, process_response
I am ware1, process_response
Baidu, you will know

Summary: The process_request() function is executed first with the smaller weight, and the process_response() with the larger weight is executed first.

Crawler middleware, between the spider and the engine. (Not for now.)

class MidSpiderMiddleware:
    # Not all methods need to be defined. If a method is not defined,
    # scrapy acts as if the spider middleware does not modify the
    # passed objects.

    @classmethod
    def from_crawler(cls, crawler):
        # This method is used by Scrapy to create your spiders.
        s = cls()
        crawler.signals.connect(s.spider_opened, signal=signals.spider_opened)
        returns

    def process_spider_input(self, response, spider):
        # Called for each response that goes through the spider
        # middleware and into the spider.

        # Should return None or raise an exception.
        return None

    def process_spider_output(self, response, result, spider):
        # Called with the results returned from the Spider, after
        # it has processed the response.

        # Must return an iterable of Request, or item objects.
        for i in result:
            yield i

    def process_spider_exception(self, response, exception, spider):
        # Called when a spider or process_spider_input() method
        # (from other spider middleware) raises an exception.

        # Should return either None or an iterable of Request or item objects.
        pass

    def process_start_requests(self, start_requests, spider):
        # Called with the start requests of the spider, and works
        # similarly to the process_spider_output() method, except
        # that it doesn’t have a response associated.

        # Must return only requests (not items).
        for r in start_requests:
            yield r

    def spider_opened(self, spider):
        spider.logger.info("Spider opened: %s" % spider.name)

User-Agent Settings

There are two methods.

One, set a fixed User-Agent. Set a User-Agent in the settings.py file.

USER_AGENT = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36"

Second, set dynamic random User-Agent. Add a list of User-Agents in the settings.py file and process them in the middleware.

settings.py

USER_AGENT_list=['Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.63 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.54 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.61 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.71 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.54 Safari/537.36', # 2021.10
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36', # 2021.11
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.93 Safari/537.36', # 2021.12
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36', # 2022.01
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.81 Safari/537.36', # 2022.02
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.82 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36', # 2022.03
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.74 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.82 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.60 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.75 Safari/537.36', # 2022.04
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.88 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.41 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.54 Safari/537.36', # 2022.05
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.64 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.67 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.5005.63 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.5005.115 Safari/537.36', # 2022.06
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.53 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.66 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.114 Safari/537.36', # 2022.07
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.134 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.81 Safari/537.36', # 2022.08
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.102 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.5195.54 Safari/537.36', # 2022.09
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.5195.102 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.5195.127 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.5249.91 Safari/537.36', # 2022.10
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.5249.103 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.5249.119 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.5304.63 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.5304.88 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.5304.106 Safari/537.36', # 2022.11
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.5304.107 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.5304.122 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.5359.72 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.5359.95 Safari/537.36', # 2022.12
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.5359.99 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.5359.100 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.5359.125 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.5414.75 Safari/537.36', # 2023.01
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.5414.120 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.5481.78 Safari/537.36', # 2023.02
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.5481.104 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.5481.105 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.5481.178 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.5481.180 Safari/537.36', # 2023.03
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.5563.64 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.5563.65 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.5563.111 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.5563.112 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.5563.147 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.5615.50 Safari/537.36', # 2023.04
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.5615.87 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.5615.121 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.5615.138 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.5672.64 Safari/537.36', # 2023.05
]

These User-Agents should be added to the request header. So it has to be handled on the downloader middleware.
Open the downloader middleware in settings.py

DOWNLOADER_MIDDLEWARES = {<!-- -->
   "douban.middlewares.DoubanDownloaderMiddleware": 543,
}

Downloader middleware. process_request() only uses this method.

class DoubanDownloaderMiddleware:
    # Not all methods need to be defined. If a method is not defined,
    # scrapy acts as if the downloader middleware does not modify the
    # passed objects.

    @classmethod
    def from_crawler(cls, crawler):
        # This method is used by Scrapy to create your spiders.
        s = cls()
        crawler.signals.connect(s.spider_opened, signal=signals.spider_opened)
        returns

    def process_request(self, request, spider):
        #Set User-Agent
        ua=choice(USER_AGENT_list)
        # Put it in the request header
        request.headers['User-Agent']=ua
        return None#Note that nothing can be returned here, otherwise it will be intercepted.

    def process_response(self, request, response, spider):
        return response

    def process_exception(self, request, exception, spider):
        pass

    def spider_opened(self, spider):
        spider.logger.info("Spider opened: %s" % spider.name)

Proxy IP

Free proxy IP

This timeliness is poor, the speed is slow, and it is not recommended to use it. But it works.
Proxy IP website: https://www.kuaidaili.com

settongs.py

PROXY_IP_LIST=[
        your ip list
]
DOWNLOADER_MIDDLEWARES = {<!-- -->
   "douban.middlewares.DoubanDownloaderMiddleware": 543,
    "douban.middlewares.ProxyDoubanDownloaderMiddleware": 544,
}

Downloader middleware. Free ones are generally less successful.

class ProxyDoubanDownloaderMiddleware:


    def process_request(self, request, spider):
        #Set proxy IP
        ip=choice(PROXY_IP_LIST)
        #Put it in request
        request.meta['proxy']='https://' + ip

        return None#Note that nothing can be returned here, otherwise it will be intercepted.

Paid proxy IP. —- Proxy IP server.

https://www.kuaidadaili.com/tps, tunnel proxy
There are documents and code examples in this.


The tunnel host, port number, username, and password in the picture are required.
According to the provided code example, put it directly into the downloader middleware.

class MoneyProxyDoubanDownloaderMiddleware:
    _proxy = ('XXX.XXX.com', '15818')


    def process_request(self, request, spider):

        # Username password authentication
        username = "username"
        password = "password"
        request.meta['proxy'] = "http://%(user)s:%(pwd)s@%(proxy)s/" % {<!-- -->"user\ ": username, "pwd": password,
                                                                        "proxy": ':'.join(
                                                                            MoneyProxyDoubanDownloaderMiddleware._proxy)}

        # Whitelist authentication
        # request.meta['proxy'] = "http://%(proxy)s/" % {"proxy": proxy}

        request.headers["Connection"] = "close"
        return None

Using selenium in middleware

#Because we want to use selenium, we want to replace the original downloader. The original middleware is actually meaningless to us.
#The maximum priority of the original middleware is 100, so selenium should be defined before 100.

DOWNLOADER_MIDDLEWARES = {<!-- -->
  
    "boss.middlewares.BossSeleniumDownloaderMiddleware": 99,
}

Steps:

# Note: There will be many crawlers py in the spider folder, so you need to judge those that are requested by selenium first.
        #So here should be set to two kinds of requests
        #Design process: 1. Create a new request.py file in the project. request.py needs to inherit Request. As a result, the current seleniumRequest is functionally the same as Request.
        #Design process: 2. Rewrite start_requests(self) in the crawler file
        #Design process: 3. Make judgments in the downloader middleware process_request().
        #Design process: 4. When the program is running, selenium should be started and started in spider_opened()
        #Design process: 5. Then request in design process: 3.
        #Design process: 6. Encapsulate into a response object
class BossSeleniumDownloaderMiddleware:

    @classmethod
    def from_crawler(cls, crawler):
        # This method is used by Scrapy to create your spiders.
        s = cls()
        crawler.signals.connect(s.spider_opened, signal=signals.spider_opened)
        crawler.signals.connect(s.spider_closed, signal=signals.spider_closed)
        returns

    def process_request(self, request, spider):
        #All requests come back here.
        #Start the selenium operation and return the response assembled from the page source code.
        #Note: There will be many crawlers py in the spider folder, so you need to judge which ones are requested using selenium.
        #So here should be set to two kinds of requests
        #Design process: 1. Create a new request.py file in the project. request.py needs to inherit Request. As a result, the current seleniumRequest is functionally the same as Request.
        #Design process: 2. Rewrite start_requests(self) in the crawler file
        #Design process: 3. Make judgments in the downloader middleware process_request().
        #Design process: 4. When the program is running, selenium should be started and started in spider_opened()
        #Design process: 5. Then request in design process: 3.
        #Design process: 6. Encapsulate into a response object
        if isinstance(request,SeleniumRequest):#isinstance determines whether xxxx is of type xxxx
            #selenium processing
            #Note: There are three cases of the return value of the process_request() method (None, request, response). So it should be encapsulated into a response object here.
            self. browser. get(request. url)
            page_source=self.browser.page_source
            time. sleep(2)
            
            return HtmlResponse(url=request.url, status=200, body=page_source, request=request, encoding='utf-8')

        else:
            return None




    def spider_opened(self, spider):
        self.options = webdriver.ChromeOptions()
        self.browser = webdriver.Chrome(chrome_options=self.options)

    def spider_closed(self, spider):
        self. browser. close()

The returned HtmlResponse depends on how the source code is defined.
The parent class has nothing defined.

"""
This module implements the HtmlResponse class which adds encoding
discovering through HTML encoding declarations to the TextResponse class.

See documentation in docs/topics/request-response.rst
"""

from scrapy.http.response.text import TextResponse


class HtmlResponse(TextResponse):
    pass

Look at TextResponse to find

class TextResponse(Response):
    def __init__(self, *args, **kwargs):
        self._encoding = kwargs. pop("encoding", None)
        self._cached_benc = None
        self._cached_ubody = None
        self._cached_selector = None
        super().__init__(*args, **kwargs)

There is also no definition. Look at TextResponse (Response) Response and find:

 def __init__(
        self,
        url: str,
        status=200,
        headers=None,
        body=b"",
        flags=None,
        request=None,
        certificate=None,
        ip_address=None,
        protocol=None,
    ):
        self. headers = Headers(headers or {<!-- -->})
        self. status = int(status)
        self._set_body(body)
        self._set_url(url)
        self.request = request
        self.flags = [] if flags is None else list(flags)
        self. certificate = certificate
        self.ip_address = ip_address
        self.protocol = protocol

From here we can know how to write HtmlResponse

HtmlResponse(url=request.url, status=200, body=page_source, request=request, encoding='utf-8')