how to connect scrapy/splash to crawlera

Posted almost 5 years ago by Simone Gabbriellini

Post a topic
Answered
S
Simone Gabbriellini

apart from setting:


 

SPLASH_URL = SCRAPINGHUB_SPLASH_URL

SPLASH_APIKEY = SCRAPINGHUB_SPLASH_KEY

CRAWLERA_ENABLED = True

CRAWLERA_APIKEY = SCRAPINGHUB_CRAWLERA_KEY


and then make a Scrapy.Spider with 


start_request(self):

 yield SplashRequest(

 

url=self.start_urls[0],

endpoint='execute',

callback=self.parse,

args={

'lua_source': self.script,

'crawlera_user': self.settings['CRAWLERA_APIKEY'],

'timeout': 3600,

},

cache_args=['lua_source'],

)


what is exactly needed to connect to CRAWLERA to scrape stuff???

I am kind of lost at this point...



0 Votes

nestor

nestor posted almost 5 years ago Admin Best Answer

CRAWLERA_ENABLED should not be set to 'true' as that will activate the scrapy-crawlera middleware and mess up the order of requests Scrapy > Crawlera > Splash > Website instead of Scrapy > Splash > Crawlera > Website.

Other than that, the rest looks fine. You can refer to this article for more information:  https://support.scrapinghub.com/support/solutions/articles/22000188428-using-crawlera-with-splash-scrapy 

0 Votes


1 Comments

nestor

nestor posted almost 5 years ago Admin Answer

CRAWLERA_ENABLED should not be set to 'true' as that will activate the scrapy-crawlera middleware and mess up the order of requests Scrapy > Crawlera > Splash > Website instead of Scrapy > Splash > Crawlera > Website.

Other than that, the rest looks fine. You can refer to this article for more information:  https://support.scrapinghub.com/support/solutions/articles/22000188428-using-crawlera-with-splash-scrapy 

0 Votes

Login to post a comment