Start a new topic

404 not found while querying items from scrapy cloud

Hi,


I am getting this strange error:

GET https://storage.scrapinghub.com/items/` resulted in a `404 Not Found`

 This is from Guzzle in PHP and this is only appearing "sometimes" while all other times the data loads fine, once you get this error, if you refresh the page, the error goes away and you can again see the data. So, I was wondering if this is a problem with the Scrapy Cloud API or if I am doing something wrong?


What I am doing is that I first query:  

https://app.scrapinghub.com/api/jobs/list.json?project=xxxxxx&spider=abcdefg&state=finished&count=1

 that gives me the id of the latest job, I take it and then query the items API passing the job id as a parameter:  

https://storage.scrapinghub.com/items/[jobid]

   This is where it throws the above error "sometimes" that goes away on the refresh, what can be the problem?


Many thanks in advance,


Seems like your app is trying to request "https://storage.scrapinghub.com/items/" without the jobid which results in a 404, probably not taking the parameter into account or maybe there was no parameter saved (too quick in succession?). If that's not the case you can always use this instead: https://support.scrapinghub.com/support/solutions/articles/22000200409-fetching-latest-spider-data

Thanks for your reply. The code is sequential, so they are definitely going in sequence, maybe the first request is not getting the last job id properly. 


I have tried the request given in the link, it worked, so I am hopeful that error will not come again but encountered another problem. The last job returned with an error as the remote URL is down, so as a result, I am only receiving empty dataset (no items were retrieved as the job failed), how to make sure that I retrieve items only from the last successful i.e., no errors job?


Many thanks for your help, 

Login to post a comment