Hi, I was trying to deploy the splash example provided by you using scrapinghub(https://learn.scrapinghub.com/scrapy/) but when I run the spider using scrapinghub, it returns the error
[scrapy.downloadermiddlewares.robotstxt] Error downloading <GET http://192.168.99.100:8050/robots.txt>: TCP connection timed out: 110: Connection timed out.
again and again, till the process die with the error
[scrapy.core.scraper] Error downloading <GET http://quotes.toscrape.com/js via http://192.168.99.100:8050/render.html>: TCP connection timed out: 110: Connection timed out.
The code of the spyder is:
import scrapy from scrapy_splash import SplashRequestclass QuotesJSSpider(scrapy.Spider): name = 'quotesjs' # all these settings can be put in your project's settings.py file custom_settings = { 'SPLASH_URL': 'http://localhost:8050', 'DOWNLOADER_MIDDLEWARES': { 'scrapy_splash.SplashCookiesMiddleware': 723, 'scrapy_splash.SplashMiddleware': 725, 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810, }, 'SPIDER_MIDDLEWARES': { 'scrapy_splash.SplashDeduplicateArgsMiddleware': 100, }, 'DUPEFILTER_CLASS': 'scrapy_splash.SplashAwareDupeFilter', } def start_requests(self): yield SplashRequest( url='http://quotes.toscrape.com/js', callback=self.parse, ) def parse(self, response): for quote in response.css("div.quote"): yield { 'text': quote.css("span.text::text").extract_first(), 'author': quote.css("small.author::text").extract_first(), 'tags': quote.css("div.tags > a.tag::text").extract(), }
Should I change the SPLASH_URL variable? or is there any other action I should take
to implement scrapy_splash using scrapinghub?
Thanks a lot.
Did you find the solution?
rodrigohumanitec
Hi, I was trying to deploy the splash example provided by you using scrapinghub(https://learn.scrapinghub.com/scrapy/) but when I run the spider using scrapinghub, it returns the error
[scrapy.downloadermiddlewares.robotstxt] Error downloading <GET http://192.168.99.100:8050/robots.txt>: TCP connection timed out: 110: Connection timed out.
More[scrapy.downloadermiddlewares.robotstxt] Error downloading <GET http://192.168.99.100:8050/robots.txt>: TCP connection timed out: 110: Connection timed out.
Moreagain and again, till the process die with the error
[scrapy.core.scraper] Error downloading <GET http://quotes.toscrape.com/js via http://192.168.99.100:8050/render.html>: TCP connection timed out: 110: Connection timed out.
The code of the spyder is:
import scrapy from scrapy_splash import SplashRequestclass QuotesJSSpider(scrapy.Spider): name = 'quotesjs' # all these settings can be put in your project's settings.py file custom_settings = { 'SPLASH_URL': 'http://localhost:8050', 'DOWNLOADER_MIDDLEWARES': { 'scrapy_splash.SplashCookiesMiddleware': 723, 'scrapy_splash.SplashMiddleware': 725, 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810, }, 'SPIDER_MIDDLEWARES': { 'scrapy_splash.SplashDeduplicateArgsMiddleware': 100, }, 'DUPEFILTER_CLASS': 'scrapy_splash.SplashAwareDupeFilter', } def start_requests(self): yield SplashRequest( url='http://quotes.toscrape.com/js', callback=self.parse, ) def parse(self, response): for quote in response.css("div.quote"): yield { 'text': quote.css("span.text::text").extract_first(), 'author': quote.css("small.author::text").extract_first(), 'tags': quote.css("div.tags > a.tag::text").extract(), }
Should I change the SPLASH_URL variable? or is there any other action I should take
to implement scrapy_splash using scrapinghub?
Thanks a lot.
3 people have this question