Start a new topic

Issue deploying spider using scrapy_splash

Hi,  I was trying to deploy the splash example provided by you using scrapinghub(https://learn.scrapinghub.com/scrapy/) but when I run the spider using scrapinghub, it returns the error 

 


[scrapy.downloadermiddlewares.robotstxt] Error downloading <GET http://192.168.99.100:8050/robots.txt>: TCP connection timed out: 110: Connection timed out.

 More

[scrapy.downloadermiddlewares.robotstxt] Error downloading <GET http://192.168.99.100:8050/robots.txt>: TCP connection timed out: 110: Connection timed out.

 More

 

again and again, till the process die with the error 


[scrapy.core.scraper] Error downloading <GET http://quotes.toscrape.com/js via http://192.168.99.100:8050/render.html>: TCP connection timed out: 110: Connection timed out.

 

The code of the spyder is:

 

import scrapy from scrapy_splash import SplashRequestclass QuotesJSSpider(scrapy.Spider): name = 'quotesjs' # all these settings can be put in your project's settings.py file custom_settings = { 'SPLASH_URL': 'http://localhost:8050', 'DOWNLOADER_MIDDLEWARES': { 'scrapy_splash.SplashCookiesMiddleware': 723, 'scrapy_splash.SplashMiddleware': 725, 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810, }, 'SPIDER_MIDDLEWARES': { 'scrapy_splash.SplashDeduplicateArgsMiddleware': 100, }, 'DUPEFILTER_CLASS': 'scrapy_splash.SplashAwareDupeFilter', } def start_requests(self): yield SplashRequest( url='http://quotes.toscrape.com/js', callback=self.parse, ) def parse(self, response): for quote in response.css("div.quote"): yield { 'text': quote.css("span.text::text").extract_first(), 'author': quote.css("small.author::text").extract_first(), 'tags': quote.css("div.tags > a.tag::text").extract(), }

 

Should I change the SPLASH_URL variable? or is there any other action I should take

to implement scrapy_splash using scrapinghub?
 

 

Thanks a lot.


3 people have this question
1 Comment

Did you find the solution?

Login to post a comment