Start a new topic

Clarification on errors and request limits when using Scrapy Cloud API


We have recently moved to using Scrapy Cloud (and its API) as a decoupled crawling engine. We are using the Scrapy cloud API to:

- start crawls jobs a site

- validate crawl finish (and then) 

- retrieve content for a particular job.


Since we are integrating from a Java application, I have questions on response times we should expect on these requests and API request limits (as I think we may be hitting them)

Is there any documentation on what errors look like, expected error status codes, or request limits?

A sample integration diagram can be found below along errors


Please see some of the "Java" errors we are reporting in our logs and what I believe is the cause. Any help you can provide on this is helpful

On Crawl Submission:

Connection prematurely closed BEFORE response

I believe this to be that we timeout after N seconds but ScrapyCloud Run API took >N seconds to respond

On Crawl Validation:

- connection timed out:

I believe this is could not connect to Scrapy Cloud API

- Unexpected character ('<' (code 60)): expected a valid value (JSON String, Number, Array, Object or token 'null', 'true' or 'false')

 -Failed to decode:Unrecognized token 'Too':

I believe this  is because the API request for getting job stats returned something like '<some error text>' or 'Too many requests' 

Login to post a comment