Start a new topic

Clarification on errors and request limits when using Scrapy Cloud API

Hello,


We have recently moved to using Scrapy Cloud (and its API) as a decoupled crawling engine. We are using the Scrapy cloud API to:


- start crawls jobs a site

- validate crawl finish (and then) 

- retrieve content for a particular job.


Question


Since we are integrating from a Java application, I have questions on response times we should expect on these requests and API request limits (as I think we may be hitting them)


Is there any documentation on what errors look like, expected error status codes, or request limits?





A sample integration diagram can be found below along errors


image





Please see some of the "Java" errors we are reporting in our logs and what I believe is the cause. Any help you can provide on this is helpful


On Crawl Submission:


Connection prematurely closed BEFORE response


I believe this to be that we timeout after N seconds but ScrapyCloud Run API took >N seconds to respond


On Crawl Validation:


- connection timed out: app.scrapinghub.com/136.243.72.243:443


I believe this is could not connect to Scrapy Cloud API


- Unexpected character ('<' (code 60)): expected a valid value (JSON String, Number, Array, Object or token 'null', 'true' or 'false')

 -Failed to decode:Unrecognized token 'Too':


I believe this  is because the API request for getting job stats returned something like '<some error text>' or 'Too many requests' 

Login to post a comment