No recent searches
Popular Articles
Sorry! nothing found for
Posted over 5 years ago by rodrigohumanitec
Hi, I was trying to deploy the splash example provided by you using scrapinghub(https://learn.scrapinghub.com/scrapy/) but when I run the spider using scrapinghub, it returns the error
[scrapy.downloadermiddlewares.robotstxt] Error downloading <GET http://192.168.99.100:8050/robots.txt>: TCP connection timed out: 110: Connection timed out.
again and again, till the process die with the error
[scrapy.core.scraper] Error downloading <GET http://quotes.toscrape.com/js via http://192.168.99.100:8050/render.html>: TCP connection timed out: 110: Connection timed out.
The code of the spyder is:
import scrapy from scrapy_splash import SplashRequestclass QuotesJSSpider(scrapy.Spider): name = 'quotesjs' # all these settings can be put in your project's settings.py file custom_settings = { 'SPLASH_URL': 'http://localhost:8050', 'DOWNLOADER_MIDDLEWARES': { 'scrapy_splash.SplashCookiesMiddleware': 723, 'scrapy_splash.SplashMiddleware': 725, 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810, }, 'SPIDER_MIDDLEWARES': { 'scrapy_splash.SplashDeduplicateArgsMiddleware': 100, }, 'DUPEFILTER_CLASS': 'scrapy_splash.SplashAwareDupeFilter', } def start_requests(self): yield SplashRequest( url='http://quotes.toscrape.com/js', callback=self.parse, ) def parse(self, response): for quote in response.css("div.quote"): yield { 'text': quote.css("span.text::text").extract_first(), 'author': quote.css("small.author::text").extract_first(), 'tags': quote.css("div.tags > a.tag::text").extract(), }
Should I change the SPLASH_URL variable? or is there any other action I should take
to implement scrapy_splash using scrapinghub?
Thanks a lot.
3 Votes
1 Comments
JumbikTank1337 posted 4 months ago
Did you find the solution?
0 Votes
Login to post a comment
People who like this
This post will be deleted permanently. Are you sure?
Hi, I was trying to deploy the splash example provided by you using scrapinghub(https://learn.scrapinghub.com/scrapy/) but when I run the spider using scrapinghub, it returns the error
[scrapy.downloadermiddlewares.robotstxt] Error downloading <GET http://192.168.99.100:8050/robots.txt>: TCP connection timed out: 110: Connection timed out.
More[scrapy.downloadermiddlewares.robotstxt] Error downloading <GET http://192.168.99.100:8050/robots.txt>: TCP connection timed out: 110: Connection timed out.
Moreagain and again, till the process die with the error
[scrapy.core.scraper] Error downloading <GET http://quotes.toscrape.com/js via http://192.168.99.100:8050/render.html>: TCP connection timed out: 110: Connection timed out.
The code of the spyder is:
import scrapy from scrapy_splash import SplashRequestclass QuotesJSSpider(scrapy.Spider): name = 'quotesjs' # all these settings can be put in your project's settings.py file custom_settings = { 'SPLASH_URL': 'http://localhost:8050', 'DOWNLOADER_MIDDLEWARES': { 'scrapy_splash.SplashCookiesMiddleware': 723, 'scrapy_splash.SplashMiddleware': 725, 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810, }, 'SPIDER_MIDDLEWARES': { 'scrapy_splash.SplashDeduplicateArgsMiddleware': 100, }, 'DUPEFILTER_CLASS': 'scrapy_splash.SplashAwareDupeFilter', } def start_requests(self): yield SplashRequest( url='http://quotes.toscrape.com/js', callback=self.parse, ) def parse(self, response): for quote in response.css("div.quote"): yield { 'text': quote.css("span.text::text").extract_first(), 'author': quote.css("small.author::text").extract_first(), 'tags': quote.css("div.tags > a.tag::text").extract(), }
Should I change the SPLASH_URL variable? or is there any other action I should take
to implement scrapy_splash using scrapinghub?
Thanks a lot.
3 Votes
1 Comments
JumbikTank1337 posted 4 months ago
Did you find the solution?
0 Votes
Login to post a comment