Hi Sometimes I receive that error from Crawlera and after that I'm not able to scrape anymore. Yesterday it happened during the afternoon and I was not able to scrape again for the rest of the day. Today it was working again but after a while same thing happened.
What is the proper way for minimizing this error and for dealing with it?
Retry and wait?
Thank you.
1 Votes
2 Comments
Sorted by
P
PTaylourposted
almost 5 years ago
(The indentation in my code example is off)
0 Votes
P
PTaylourposted
almost 5 years ago
I've been running into the same issue recently. No changes to code my end but with the last couple of weeks we're seeing a lot of `X-Crawlera-Error: noslaves`.
Very anecdotal example:
- Arrive in the morning, set off a scrape. Successfully fetch a few hundred pages and finish, with no 503s of any kind.
- Immediately after the first scrape has finished, set off another one. Either pointing at the same urls or different urls from the same site. This time and seemingly all subsequent times: ``X-Crawlera-Error: noslaves` and zero pages downloaded.
I've been trying the retry and wait approach.
At the moment I'm watching a scrape with the following custom version of the `RetryMiddleware`
I'm getting about a page every 45mins.
Are we doing something wrong, or are there just "noslaves" for most of the day at the moment?
Hi
Sometimes I receive that error from Crawlera and after that I'm not able to scrape anymore. Yesterday it happened during the afternoon and I was not able to scrape again for the rest of the day. Today it was working again but after a while same thing happened.
What is the proper way for minimizing this error and for dealing with it?
Retry and wait?
Thank you.
1 Votes
2 Comments
PTaylour posted almost 5 years ago
(The indentation in my code example is off)
0 Votes
PTaylour posted almost 5 years ago
I've been running into the same issue recently. No changes to code my end but with the last couple of weeks we're seeing a lot of `X-Crawlera-Error: noslaves`.
Very anecdotal example:
- Arrive in the morning, set off a scrape. Successfully fetch a few hundred pages and finish, with no 503s of any kind.
- Immediately after the first scrape has finished, set off another one. Either pointing at the same urls or different urls from the same site. This time and seemingly all subsequent times: ``X-Crawlera-Error: noslaves` and zero pages downloaded.
I've been trying the retry and wait approach.
At the moment I'm watching a scrape with the following custom version of the `RetryMiddleware`
I'm getting about a page every 45mins.
Are we doing something wrong, or are there just "noslaves" for most of the day at the moment?
class CustomRetryMiddleware(RetryMiddleware):
def process_response(self, request, response, spider):
"""Handle "noslaves" errors with custom code, otherwise
use the same implementation as the original RetryMiddleware
"""
no_slaves = response.headers.get("X-Crawlera-Error") == b"noslaves"
if no_slaves:
print(f"\n\nno slaves\n\n")
if no_slaves:
"""Here, there are no servers available so crawlera won't try again.
By default we'd return a 503 which scrapy would immediately try again,
(and there would probably not be a slave available)
Instead of returning a 503 we trigger another retry after sleeping for
a while
(hopefully this approach won't get rate-limited or banned by crawlera)
We bypass the normal max_retry_times used for other requests as we
don't want to stop crawling until we've had access to some servers
"""
debug("sleeping")
time.sleep(20)
debug("sending request")
new_request = request.copy()
new_request.meta["max_retry_times"] = math.inf
return (
self._retry(new_request, f"noslaves for {request.url}", spider)
or response
)
# normal behaviour
if request.meta.get("dont_retry", False):
return response
if response.status in self.retry_http_codes:
reason = response_status_message(response.status)
return self._retry(request, reason, spider) or response
return response
0 Votes
Login to post a comment