Issue deploying spider using scrapy_splash

Posted about 6 years ago by rodrigohumanitec

Post a topic

Un Answered

rodrigohumanitec

Hi, I was trying to deploy the splash example provided by you using scrapinghub(https://learn.scrapinghub.com/scrapy/) but when I run the spider using scrapinghub, it returns the error

[scrapy.downloadermiddlewares.robotstxt] Error downloading <GET http://192.168.99.100:8050/robots.txt>: TCP connection timed out: 110: Connection timed out.

again and again, till the process die with the error

[scrapy.core.scraper] Error downloading <GET http://quotes.toscrape.com/js via http://192.168.99.100:8050/render.html>: TCP connection timed out: 110: Connection timed out.

The code of the spyder is:

import scrapy from scrapy_splash import SplashRequestclass QuotesJSSpider(scrapy.Spider): name = 'quotesjs' # all these settings can be put in your project's settings.py file custom_settings = { 'SPLASH_URL': 'http://localhost:8050', 'DOWNLOADER_MIDDLEWARES': { 'scrapy_splash.SplashCookiesMiddleware': 723, 'scrapy_splash.SplashMiddleware': 725, 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810, }, 'SPIDER_MIDDLEWARES': { 'scrapy_splash.SplashDeduplicateArgsMiddleware': 100, }, 'DUPEFILTER_CLASS': 'scrapy_splash.SplashAwareDupeFilter', } def start_requests(self): yield SplashRequest( url='http://quotes.toscrape.com/js', callback=self.parse, ) def parse(self, response): for quote in response.css("div.quote"): yield { 'text': quote.css("span.text::text").extract_first(), 'author': quote.css("small.author::text").extract_first(), 'tags': quote.css("div.tags > a.tag::text").extract(), }

Should I change the SPLASH_URL variable? or is there any other action I should take

to implement scrapy_splash using scrapinghub?

Thanks a lot.

3 Votes

1 Comments

JumbikTank1337 posted 11 months ago

Did you find the solution?

0 Votes