When trying to use the CRAWLERA_PRESERVE_DELAY = True option for my spiders, I get an error in the logs (see below). Whenever I remove this option, the spiders run perfectly fine. Is this option no longer supported or is this just a bug? Any other way on how to achieve this?
Traceback (most recent call last):
File "/usr/local/lib/python3.5/site-packages/twisted/internet/defer.py", line 1128, in _inlineCallbacks
result = g.send(result)
File "/usr/local/lib/python3.5/site-packages/scrapy/core/downloader/middleware.py", line 37, in process_request
response = yield method(request=request, spider=spider)
File "/usr/local/lib/python3.5/site-packages/scrapy_crawlera.py", line 127, in process_request
request.headers['Proxy-Authorization'] = self._proxyauth
AttributeError: 'CrawleraMiddleware' object has no attribute '_proxyauth'
Traceback (most recent call last):
File "/usr/local/lib/python3.5/site-packages/twisted/internet/defer.py", line 150, in maybeDeferred
result = f(*args, **kw)
File "/usr/local/lib/python3.5/site-packages/pydispatch/robustapply.py", line 55, in robustApply
return receiver(*arguments, **named)
File "/usr/local/lib/python3.5/site-packages/scrapy_crawlera.py", line 53, in open_spider
setattr(self, k, self._get_setting_value(spider, k, type_))
File "/usr/local/lib/python3.5/site-packages/scrapy_crawlera.py", line 95, in _get_setting_value
type_, 'HUBPROXY_' + k.upper(), o))
File "/usr/local/lib/python3.5/site-packages/scrapy_crawlera.py", line 73, in _settings_get
return self.crawler.settings.getbool(*a, **kw)
File "/usr/local/lib/python3.5/site-packages/scrapy/settings/__init__.py", line 129, in getbool
return bool(int(self.get(name, default)))
ValueError: invalid literal for int() with base 10: 'True'
0 Votes
nestor posted
over 7 years ago
AdminBest Answer
You have AUTOTHROTTLE turned on.
0 Votes
5 Comments
Sorted by
nestorposted
over 7 years ago
AdminAnswer
You have AUTOTHROTTLE turned on.
0 Votes
l
lawli3tposted
over 7 years ago
I have set the DOWNLOAD_DELAY to 300, but it still scrapes about 0.5 reqs / min, which would equate to 120 if im not mistaken. I have tried it with several settings, but the speed remains the same.
0 Votes
nestorposted
over 7 years ago
Admin
What speed are you trying to achieve?
0 Votes
l
lawli3tposted
over 7 years ago
Hi,
I got no error now, but it is still not crawling at the desired speed.
0 Votes
nestorposted
over 7 years ago
Admin
Hi,
Try setting the value of CRAWLERA_PRESERVE_DELAY to 1.
When trying to use the CRAWLERA_PRESERVE_DELAY = True option for my spiders, I get an error in the logs (see below). Whenever I remove this option, the spiders run perfectly fine. Is this option no longer supported or is this just a bug? Any other way on how to achieve this?
0 Votes
nestor posted over 7 years ago Admin Best Answer
You have AUTOTHROTTLE turned on.
0 Votes
5 Comments
nestor posted over 7 years ago Admin Answer
You have AUTOTHROTTLE turned on.
0 Votes
lawli3t posted over 7 years ago
I have set the DOWNLOAD_DELAY to 300, but it still scrapes about 0.5 reqs / min, which would equate to 120 if im not mistaken. I have tried it with several settings, but the speed remains the same.
0 Votes
nestor posted over 7 years ago Admin
What speed are you trying to achieve?
0 Votes
lawli3t posted over 7 years ago
Hi,
I got no error now, but it is still not crawling at the desired speed.
0 Votes
nestor posted over 7 years ago Admin
Hi,
Try setting the value of CRAWLERA_PRESERVE_DELAY to 1.
0 Votes
Login to post a comment