I have 6 spiders that I have set up locally and they all work perfectly. But when I upload them to to run on Scrapinghub, but only 4 work. It looks like all the requests return 403 responses. The first requests to robots.txt also yields a 403 error.
custom_settings = { 'ROBOTSTXT_OBEY': False, 'USER_AGENT': "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/22.0.1207.1 Safari/537.1"}
custom_settings = { 'ROBOTSTXT_OBEY': False, 'USER_AGENT': "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/22.0.1207.1 Safari/537.1" }
I've set up
Best Answer
n
nestor
said
over 5 years ago
Check log line 10 of your job. ROBOTSTXT_OBEY is being overridden, and the 403 is most likely the website blocking request from the IP, you might need a proxy service like Crawlera.
Check log line 10 of your job. ROBOTSTXT_OBEY is being overridden, and the 403 is most likely the website blocking request from the IP, you might need a proxy service like Crawlera.
Nikhil
I have 6 spiders that I have set up locally and they all work perfectly. But when I upload them to to run on Scrapinghub, but only 4 work. It looks like all the requests return 403 responses. The first requests to robots.txt also yields a 403 error.
Check log line 10 of your job. ROBOTSTXT_OBEY is being overridden, and the 403 is most likely the website blocking request from the IP, you might need a proxy service like Crawlera.
- Oldest First
- Popular
- Newest First
Sorted by Oldest FirstNikhil
Here's an log from ScrapingHub
Nikhil
109aa2f7fa3f609e74579c88b593f1fdbbbc7837
GET
2043 bytes
403
2018-12-12 17:57:19 UTC
https://www.@#$%^&*.com/robots.txt
49dddb7196bff0efdfc43250ce81610150696e3f
GET
1914 bytes
403
2018-12-12 17:57:28 UTC
https://www.@#$%^&*(.com/#$%^&*(*&^%$#$%^&*().html
nestor
Check log line 10 of your job. ROBOTSTXT_OBEY is being overridden, and the 403 is most likely the website blocking request from the IP, you might need a proxy service like Crawlera.
-
Unable to select Scrapy project in GitHub
-
ScrapyCloud can't call spider?
-
Unhandled error in Deferred
-
Item API - Filtering
-
newbie to web scraping but need data from zillow
-
ValueError: Invalid control character
-
Cancelling account
-
Best Practices
-
Beautifulsoup with ScrapingHub
-
Delete a project in ScrapingHub
See all 452 topics