1 spider doesn't return anything on scrapinghub, but works fine locally
i
imrans
started a topic
about 7 years ago
I have a project with 3 spiders. Locally on my machine, all 3 spiders work fine.
However, when i deploy the 3 scrapy spiders on scrapinghub, 1 of the spider always fails to return anything (the other 2 spiders work fine).
Since all 3 spiders work fine locally, and 2 of them still work on scrapinghub, I'm quite sure this is an issue on scrapinghub (is that website blocking scrapinghub ?).
imrans
I have a project with 3 spiders. Locally on my machine, all 3 spiders work fine.
However, when i deploy the 3 scrapy spiders on scrapinghub, 1 of the spider always fails to return anything (the other 2 spiders work fine).
Since all 3 spiders work fine locally, and 2 of them still work on scrapinghub, I'm quite sure this is an issue on scrapinghub (is that website blocking scrapinghub ?).
How can i debug this (file attached)?
Hi Scraper,
Could you describe a bit more about this issue?
- Oldest First
- Popular
- Newest First
Sorted by Oldest Firstvaz
Hi Scraper,
Could you describe a bit more about this issue?
mindlessbrain
Hi,
I'm having a similar issue.
My spider tries to access a link that has a robots.txt file but gets a timeout error like if USER_AGENT wasn't set. I have set it to:
USER_AGENT = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95 Safari/537.36'
It works fine locally (and was working in the cloud one month ago) but when I deploy and try to run on the cloud, I get a timeout error.
I've attached the log with details.
Thanks.
-
Unable to select Scrapy project in GitHub
-
ScrapyCloud can't call spider?
-
Unhandled error in Deferred
-
Item API - Filtering
-
newbie to web scraping but need data from zillow
-
ValueError: Invalid control character
-
Cancelling account
-
Best Practices
-
Beautifulsoup with ScrapingHub
-
Delete a project in ScrapingHub
See all 460 topics