Clarification on errors and request limits when using Scrapy Cloud API

Posted almost 4 years ago by Julio Villalta III

Post a topic

Un Answered

Julio Villalta III

Hello,

We have recently moved to using Scrapy Cloud (and its API) as a decoupled crawling engine. We are using the Scrapy cloud API to:

- start crawls jobs a site

- validate crawl finish (and then)

- retrieve content for a particular job.

Question

Since we are integrating from a Java application, I have questions on response times we should expect on these requests and API request limits (as I think we may be hitting them)

Is there any documentation on what errors look like, expected error status codes, or request limits?

A sample integration diagram can be found below along errors

Please see some of the "Java" errors we are reporting in our logs and what I believe is the cause. Any help you can provide on this is helpful

On Crawl Submission:

- Connection prematurely closed BEFORE response

I believe this to be that we timeout after N seconds but ScrapyCloud Run API took >N seconds to respond

On Crawl Validation:

- connection timed out: app.scrapinghub.com/136.243.72.243:443

I believe this is could not connect to Scrapy Cloud API

- Unexpected character ('<' (code 60)): expected a valid value (JSON String, Number, Array, Object or token 'null', 'true' or 'false')

-Failed to decode:Unrecognized token 'Too':

I believe this is because the API request for getting job stats returned something like '<some error text>' or 'Too many requests'

0 Votes

0 Comments