Make use of Scrapy's standard HttpProxyMiddleware by specifying proxy meta value and the autherization header in a Scrapy Request, for example:
import scrapy from w3lib.http import basic_auth_header yield scrapy.Request( url=url, callback=self.parse, meta={'proxy': 'https://<PROXY_IP_OR_URL>:<PROXY_PORT>'}, headers={ 'Proxy-Authorization': basic_auth_header( '<PROXY_USERNAME>', '<PROXY_PASSWORD>') } )
In order to route all spider's requests through the proxy automatically, isolate its details in a middleware by adding this example class in the project's middlewares.py file:
from w3lib.http import basic_auth_header class CustomProxyMiddleware(object): def process_request(self, request, spider): request.meta['proxy'] = "https://<PROXY_IP_OR_URL>:<PROXY_PORT>" request.headers['Proxy-Authorization'] = basic_auth_header( '<PROXY_USERNAME>', '<PROXY_PASSWORD>')
Then reference it in the downloader middlewares section of the project's settings.py, putting it before the standard HttpProxyMiddleware:
DOWNLOADER_MIDDLEWARES = { '<PROJECT_NAME>.middlewares.CustomProxyMiddleware': 350, 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 400, }
There's also a ready random proxy middleware for Scrapy here.
For more information on Scrapy and proxies, check out our blog post about how to set up a custom proxy in Scrapy.
Was this article helpful?
That’s Great!
Thank you for your feedback
Sorry! We couldn't be helpful
Thank you for your feedback
Feedback sent
We appreciate your effort and will try to fix the article