When trying to use the CRAWLERA_PRESERVE_DELAY = True option for my spiders, I get an error in the logs (see below). Whenever I remove this option, the spiders run perfectly fine. Is this option no longer supported or is this just a bug? Any other way on how to achieve this?
Traceback (most recent call last):
File "/usr/local/lib/python3.5/site-packages/twisted/internet/defer.py", line 1128, in _inlineCallbacks
result = g.send(result)
File "/usr/local/lib/python3.5/site-packages/scrapy/core/downloader/middleware.py", line 37, in process_request
response = yield method(request=request, spider=spider)
File "/usr/local/lib/python3.5/site-packages/scrapy_crawlera.py", line 127, in process_request
request.headers['Proxy-Authorization'] = self._proxyauth
AttributeError: 'CrawleraMiddleware' object has no attribute '_proxyauth'
Traceback (most recent call last):
File "/usr/local/lib/python3.5/site-packages/twisted/internet/defer.py", line 150, in maybeDeferred
result = f(*args, **kw)
File "/usr/local/lib/python3.5/site-packages/pydispatch/robustapply.py", line 55, in robustApply
return receiver(*arguments, **named)
File "/usr/local/lib/python3.5/site-packages/scrapy_crawlera.py", line 53, in open_spider
setattr(self, k, self._get_setting_value(spider, k, type_))
File "/usr/local/lib/python3.5/site-packages/scrapy_crawlera.py", line 95, in _get_setting_value
type_, 'HUBPROXY_' + k.upper(), o))
File "/usr/local/lib/python3.5/site-packages/scrapy_crawlera.py", line 73, in _settings_get
return self.crawler.settings.getbool(*a, **kw)
File "/usr/local/lib/python3.5/site-packages/scrapy/settings/__init__.py", line 129, in getbool
return bool(int(self.get(name, default)))
ValueError: invalid literal for int() with base 10: 'True'
Try setting the value of CRAWLERA_PRESERVE_DELAY to 1.
l
lawli3t
said
about 7 years ago
Hi,
I got no error now, but it is still not crawling at the desired speed.
nestor
said
about 7 years ago
What speed are you trying to achieve?
l
lawli3t
said
about 7 years ago
I have set the DOWNLOAD_DELAY to 300, but it still scrapes about 0.5 reqs / min, which would equate to 120 if im not mistaken. I have tried it with several settings, but the speed remains the same.
lawli3t
When trying to use the CRAWLERA_PRESERVE_DELAY = True option for my spiders, I get an error in the logs (see below). Whenever I remove this option, the spiders run perfectly fine. Is this option no longer supported or is this just a bug? Any other way on how to achieve this?
You have AUTOTHROTTLE turned on.
- Oldest First
- Popular
- Newest First
Sorted by Oldest Firstnestor
Hi,
Try setting the value of CRAWLERA_PRESERVE_DELAY to 1.
lawli3t
Hi,
I got no error now, but it is still not crawling at the desired speed.
nestor
What speed are you trying to achieve?
lawli3t
I have set the DOWNLOAD_DELAY to 300, but it still scrapes about 0.5 reqs / min, which would equate to 120 if im not mistaken. I have tried it with several settings, but the speed remains the same.
nestor
You have AUTOTHROTTLE turned on.
-
Crawlera 503 Ban
-
Amazon scraping speed
-
Website redirects
-
Error Code 429 Too Many Requests
-
Bing
-
Subscribed to Crawlera but saying Not Subscribed
-
Selenium with c#
-
Using Crawlera with browsermob
-
How to connect Selenium PhantomJS to Crawlera?
See all 401 topics