Using a custom proxy in a Scrapy spider

Modified on Wed, 3 Feb, 2021 at 7:53 AM

Make use of Scrapy's standard HttpProxyMiddleware by specifying proxy meta value and the autherization header in a Scrapy Request, for example:

import scrapy
from w3lib.http import basic_auth_header

yield scrapy.Request(
    url=url, callback=self.parse,
    meta={'proxy': 'https://<PROXY_IP_OR_URL>:<PROXY_PORT>'},
    headers={
        'Proxy-Authorization': basic_auth_header(
            '<PROXY_USERNAME>', '<PROXY_PASSWORD>')
    }
)

In order to route all spider's requests through the proxy automatically, isolate its details in a middleware by adding this example class in the project's middlewares.py file:

from w3lib.http import basic_auth_header

class CustomProxyMiddleware(object):
    def process_request(self, request, spider):
        request.meta['proxy'] = "https://<PROXY_IP_OR_URL>:<PROXY_PORT>"
        request.headers['Proxy-Authorization'] = basic_auth_header(
            '<PROXY_USERNAME>', '<PROXY_PASSWORD>')

Then reference it in the downloader middlewares section of the project's settings.py, putting it before the standard HttpProxyMiddleware:

DOWNLOADER_MIDDLEWARES = {
    '<PROJECT_NAME>.middlewares.CustomProxyMiddleware': 350,
    'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 400,
}

There's also a ready random proxy middleware for Scrapy here.

For more information on Scrapy and proxies, check out our blog post about how to set up a custom proxy in Scrapy.