CRAWLERA_PRESERVE_DELAY leads to error

Posted about 8 years ago by lawli3t

Post a topic

Answered

lawli3t

When trying to use the CRAWLERA_PRESERVE_DELAY = True option for my spiders, I get an error in the logs (see below). Whenever I remove this option, the spiders run perfectly fine. Is this option no longer supported or is this just a bug? Any other way on how to achieve this?

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/site-packages/twisted/internet/defer.py", line 1128, in _inlineCallbacks
    result = g.send(result)
  File "/usr/local/lib/python3.5/site-packages/scrapy/core/downloader/middleware.py", line 37, in process_request
    response = yield method(request=request, spider=spider)
  File "/usr/local/lib/python3.5/site-packages/scrapy_crawlera.py", line 127, in process_request
    request.headers['Proxy-Authorization'] = self._proxyauth
AttributeError: 'CrawleraMiddleware' object has no attribute '_proxyauth'

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/site-packages/twisted/internet/defer.py", line 150, in maybeDeferred
    result = f(*args, **kw)
  File "/usr/local/lib/python3.5/site-packages/pydispatch/robustapply.py", line 55, in robustApply
    return receiver(*arguments, **named)
  File "/usr/local/lib/python3.5/site-packages/scrapy_crawlera.py", line 53, in open_spider
    setattr(self, k, self._get_setting_value(spider, k, type_))
  File "/usr/local/lib/python3.5/site-packages/scrapy_crawlera.py", line 95, in _get_setting_value
    type_, 'HUBPROXY_' + k.upper(), o))
  File "/usr/local/lib/python3.5/site-packages/scrapy_crawlera.py", line 73, in _settings_get
    return self.crawler.settings.getbool(*a, **kw)
  File "/usr/local/lib/python3.5/site-packages/scrapy/settings/__init__.py", line 129, in getbool
    return bool(int(self.get(name, default)))
ValueError: invalid literal for int() with base 10: 'True'

0 Votes

nestor posted about 8 years ago Admin Best Answer

You have AUTOTHROTTLE turned on.

0 Votes

5 Comments

nestor posted about 8 years ago Admin Answer

You have AUTOTHROTTLE turned on.

0 Votes

lawli3t posted about 8 years ago

I have set the DOWNLOAD_DELAY to 300, but it still scrapes about 0.5 reqs / min, which would equate to 120 if im not mistaken. I have tried it with several settings, but the speed remains the same.

0 Votes

nestor posted about 8 years ago Admin

What speed are you trying to achieve?

0 Votes

lawli3t posted about 8 years ago

Hi,

I got no error now, but it is still not crawling at the desired speed.

0 Votes

nestor posted about 8 years ago Admin

Hi,

Try setting the value of CRAWLERA_PRESERVE_DELAY to 1.

0 Votes