videocamWeb Data Extraction Summit - September 30th, 2021.
Join some of the greatest minds in web scraping to educate, inspire, and innovate.
Register for free!
Start a new topic
Answered

1 spider doesn't return anything on scrapinghub, but works fine locally

I have a project with 3 spiders. Locally on my machine, all 3 spiders work fine.

However, when i deploy the 3 scrapy spiders on scrapinghub, 1 of the spider always fails to return anything (the other 2 spiders work fine).


Since all 3 spiders work fine locally, and 2 of them still work on scrapinghub, I'm quite sure this is an issue on scrapinghub (is that website blocking scrapinghub ?).

How can i debug this (file attached)?

txt

Best Answer

Hi Scraper,


Could you describe a bit more about this issue?


  • What kind of errors are you experiencing (providing details if possible)
  • The domains you try to crawl
  • provide project ID to check with support
Best,

Pablo


Answer

Hi Scraper,


Could you describe a bit more about this issue?


  • What kind of errors are you experiencing (providing details if possible)
  • The domains you try to crawl
  • provide project ID to check with support
Best,

Pablo

Hi,


I'm having a similar issue.


My spider tries to access a link that has a robots.txt file but gets a timeout error like if USER_AGENT wasn't set. I have set it to:


USER_AGENT = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95 Safari/537.36'


It works fine locally (and was working in the cloud one month ago) but when I deploy and try to run on the cloud, I get a timeout error.


I've attached the log with details.


Thanks.

txt
Login to post a comment