1 spider doesn't return anything on scrapinghub, but works fine locally

Posted over 7 years ago by imrans

Post a topic

Answered

imrans

I have a project with 3 spiders. Locally on my machine, all 3 spiders work fine.

However, when i deploy the 3 scrapy spiders on scrapinghub, 1 of the spider always fails to return anything (the other 2 spiders work fine).

Since all 3 spiders work fine locally, and 2 of them still work on scrapinghub, I'm quite sure this is an issue on scrapinghub (is that website blocking scrapinghub ?).

How can i debug this (file attached)?

Attachments (1)

txt

logbestbuy.com8.txt
3.11 KB

0 Votes

vaz posted over 7 years ago Best Answer

Hi Scraper,

Could you describe a bit more about this issue?

What kind of errors are you experiencing (providing details if possible)
The domains you try to crawl
provide project ID to check with support

Best,

Pablo

0 Votes

2 Comments

mindlessbrain posted over 7 years ago

Hi,

I'm having a similar issue.

My spider tries to access a link that has a robots.txt file but gets a timeout error like if USER_AGENT wasn't set. I have set it to:

USER_AGENT = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95 Safari/537.36'

It works fine locally (and was working in the cloud one month ago) but when I deploy and try to run on the cloud, I get a timeout error.

I've attached the log with details.

Thanks.

Attachments (1)

txt

logfarnell26.txt
7.34 KB

0 Votes

vaz posted over 7 years ago Answer

Hi Scraper,

Could you describe a bit more about this issue?

What kind of errors are you experiencing (providing details if possible)
The domains you try to crawl
provide project ID to check with support

Best,

Pablo

0 Votes