Make use of Scrapy's standard HttpProxyMiddleware by specifying proxy meta value and the autherization header in a Scrapy Request, for example:
import scrapy from w3lib.http import basic_auth_header yield scrapy.Request( url=url, callback=self.parse, meta={'proxy': 'https://<PROXY_IP_OR_URL>:<PROXY_PORT>'}, headers={ 'Proxy-Authorization': basic_auth_header( '<PROXY_USERNAME>', '<PROXY_PASSWORD>') } )
In order to route all spider's requests through the proxy automatically, isolate its details in a middleware by adding this example class in the project's middlewares.py file:
from w3lib.http import basic_auth_header class CustomProxyMiddleware(object): def process_request(self, request, spider): request.meta['proxy'] = "https://<PROXY_IP_OR_URL>:<PROXY_PORT>" request.headers['Proxy-Authorization'] = basic_auth_header( '<PROXY_USERNAME>', '<PROXY_PASSWORD>')
Then reference it in the downloader middlewares section of the project's settings.py, putting it before the standard HttpProxyMiddleware:
DOWNLOADER_MIDDLEWARES = { '<PROJECT_NAME>.middlewares.CustomProxyMiddleware': 350, 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 400, }
There's also a ready random proxy middleware for Scrapy here.
For more information on Scrapy and proxies, check out our blog post about how to set up a custom proxy in Scrapy.