I want to move into async execution via Scrapy. I know there is [scrapy-crawlera](https://github.com/scrapy-plugins/scrapy-crawlera) plugin, but I do not know how to configure it when I have the certificate.
Also, one thing bothers me. Crawlera comes with different pricing plans. The basic one is C10 which allows for 10 concurrent requests. What does it mean? Do I need to set `CONCURRENT_REQUESTS=10` in settings.py?
I would like to if it is possible to crawl https pages using scrapy + crawlera. So far I was using Python requests with the following settings:
proxy_host = 'proxy.crawlera.com'
proxy_port = '8010'
proxy_auth = 'MY_KEY'
proxies = {
"https": "https://{}@{}:{}/".format(proxy_auth, proxy_host,
proxy_port),
"http": "http://{}@{}:{}/".format(proxy_auth, proxy_host, proxy_port)
}
ca_cert = 'crawlera-ca.crt'
res = requests.get(url='https://www.google.com/',
proxies=proxies,
verify=ca_cert
)
I want to move into async execution via Scrapy. I know there is [scrapy-crawlera](https://github.com/scrapy-plugins/scrapy-crawlera) plugin, but I do not know how to configure it when I have the certificate.
Also, one thing bothers me. Crawlera comes with different pricing plans. The basic one is C10 which allows for 10 concurrent requests. What does it mean? Do I need to set `CONCURRENT_REQUESTS=10` in settings.py?
0 Votes
nestor posted almost 6 years ago Admin Best Answer
Only thing to configure is the scrapy-crawlera settings in your Settings.py https://scrapy-crawlera.readthedocs.io/en/v1.4.0/settings.html.
The certificate is not needed with Scrapy because it doesn't employ CONNECT method.
And yes, the pricing plan means exactly that. CONCURRENT_REQUESTS = X number allowed by plan.
0 Votes
1 Comments
nestor posted almost 6 years ago Admin Answer
Only thing to configure is the scrapy-crawlera settings in your Settings.py https://scrapy-crawlera.readthedocs.io/en/v1.4.0/settings.html.
The certificate is not needed with Scrapy because it doesn't employ CONNECT method.
And yes, the pricing plan means exactly that. CONCURRENT_REQUESTS = X number allowed by plan.
0 Votes
Login to post a comment