Using a custom proxy in a Scrapy spider

Modified on Wed, 3 Feb, 2021 at 7:53 AM

Make use of Scrapy's standard HttpProxyMiddleware by specifying proxy meta value and the autherization header in a Scrapy Request, for example:


import scrapy
from w3lib.http import basic_auth_header

yield scrapy.Request(
    url=url, callback=self.parse,
    meta={'proxy': 'https://<PROXY_IP_OR_URL>:<PROXY_PORT>'},
    headers={
        'Proxy-Authorization': basic_auth_header(
            '<PROXY_USERNAME>', '<PROXY_PASSWORD>')
    }
)


In order to route all spider's requests through the proxy automatically, isolate its details in a middleware by adding this example class in the project's middlewares.py file:


from w3lib.http import basic_auth_header

class CustomProxyMiddleware(object):
    def process_request(self, request, spider):
        request.meta['proxy'] = "https://<PROXY_IP_OR_URL>:<PROXY_PORT>"
        request.headers['Proxy-Authorization'] = basic_auth_header(
            '<PROXY_USERNAME>', '<PROXY_PASSWORD>')


Then reference it in the downloader middlewares section of the project's settings.py, putting it before the standard HttpProxyMiddleware:


DOWNLOADER_MIDDLEWARES = {
    '<PROJECT_NAME>.middlewares.CustomProxyMiddleware': 350,
    'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 400,
}


There's also a ready random proxy middleware for Scrapy here.


For more information on Scrapy and proxies, check out our blog post about how to set up a custom proxy in Scrapy.

Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select at least one of the reasons
CAPTCHA verification is required.

Feedback sent

We appreciate your effort and will try to fix the article