Start a new topic
Answered

CRAWLERA_PRESERVE_DELAY leads to error

When trying to use the CRAWLERA_PRESERVE_DELAY = True option for my spiders, I get an error in the logs (see below). Whenever I remove this option, the spiders run perfectly fine. Is this option no longer supported or is this just a bug? Any other way on how to achieve this?

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/site-packages/twisted/internet/defer.py", line 1128, in _inlineCallbacks
    result = g.send(result)
  File "/usr/local/lib/python3.5/site-packages/scrapy/core/downloader/middleware.py", line 37, in process_request
    response = yield method(request=request, spider=spider)
  File "/usr/local/lib/python3.5/site-packages/scrapy_crawlera.py", line 127, in process_request
    request.headers['Proxy-Authorization'] = self._proxyauth
AttributeError: 'CrawleraMiddleware' object has no attribute '_proxyauth'

  

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/site-packages/twisted/internet/defer.py", line 150, in maybeDeferred
    result = f(*args, **kw)
  File "/usr/local/lib/python3.5/site-packages/pydispatch/robustapply.py", line 55, in robustApply
    return receiver(*arguments, **named)
  File "/usr/local/lib/python3.5/site-packages/scrapy_crawlera.py", line 53, in open_spider
    setattr(self, k, self._get_setting_value(spider, k, type_))
  File "/usr/local/lib/python3.5/site-packages/scrapy_crawlera.py", line 95, in _get_setting_value
    type_, 'HUBPROXY_' + k.upper(), o))
  File "/usr/local/lib/python3.5/site-packages/scrapy_crawlera.py", line 73, in _settings_get
    return self.crawler.settings.getbool(*a, **kw)
  File "/usr/local/lib/python3.5/site-packages/scrapy/settings/__init__.py", line 129, in getbool
    return bool(int(self.get(name, default)))
ValueError: invalid literal for int() with base 10: 'True'

 


Best Answer

You have AUTOTHROTTLE turned on.


Hi,


Try setting the value of CRAWLERA_PRESERVE_DELAY to 1.

Hi,


I got no error now, but it is still not crawling at the desired speed.

What speed are you trying to achieve?

I have set the DOWNLOAD_DELAY to 300, but it still scrapes about 0.5 reqs / min, which would equate to 120 if im not mistaken. I have tried it with several settings, but the speed remains the same.

Answer

You have AUTOTHROTTLE turned on.

Login to post a comment