I have a project with 3 spiders. Locally on my machine, all 3 spiders work fine.
However, when i deploy the 3 scrapy spiders on scrapinghub, 1 of the spider always fails to return anything (the other 2 spiders work fine).
Since all 3 spiders work fine locally, and 2 of them still work on scrapinghub, I'm quite sure this is an issue on scrapinghub (is that website blocking scrapinghub ?).
I have a project with 3 spiders. Locally on my machine, all 3 spiders work fine.
However, when i deploy the 3 scrapy spiders on scrapinghub, 1 of the spider always fails to return anything (the other 2 spiders work fine).
Since all 3 spiders work fine locally, and 2 of them still work on scrapinghub, I'm quite sure this is an issue on scrapinghub (is that website blocking scrapinghub ?).
How can i debug this (file attached)?
Attachments (1)
logbestbuy.com8.txt
3.11 KB
0 Votes
vaz posted over 7 years ago Best Answer
Hi Scraper,
Could you describe a bit more about this issue?
0 Votes
2 Comments
mindlessbrain posted about 7 years ago
Hi,
I'm having a similar issue.
My spider tries to access a link that has a robots.txt file but gets a timeout error like if USER_AGENT wasn't set. I have set it to:
USER_AGENT = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95 Safari/537.36'
It works fine locally (and was working in the cloud one month ago) but when I deploy and try to run on the cloud, I get a timeout error.
I've attached the log with details.
Thanks.
Attachments (1)
logfarnell26.txt
7.34 KB
0 Votes
vaz posted over 7 years ago Answer
Hi Scraper,
Could you describe a bit more about this issue?
0 Votes
Login to post a comment