how to connect scrapy/splash to crawlera

Posted over 5 years ago by Simone Gabbriellini

Post a topic

Answered

Simone Gabbriellini

apart from setting:

SPLASH_URL = SCRAPINGHUB_SPLASH_URL

SPLASH_APIKEY = SCRAPINGHUB_SPLASH_KEY

CRAWLERA_ENABLED = True

CRAWLERA_APIKEY = SCRAPINGHUB_CRAWLERA_KEY

and then make a Scrapy.Spider with

start_request(self):

yield SplashRequest(

url=self.start_urls[0],

endpoint='execute',

callback=self.parse,

args={

'lua_source': self.script,

'crawlera_user': self.settings['CRAWLERA_APIKEY'],

'timeout': 3600,

cache_args=['lua_source'],

)

what is exactly needed to connect to CRAWLERA to scrape stuff???

I am kind of lost at this point...

0 Votes

nestor posted over 5 years ago Admin Best Answer

CRAWLERA_ENABLED should not be set to 'true' as that will activate the scrapy-crawlera middleware and mess up the order of requests Scrapy > Crawlera > Splash > Website instead of Scrapy > Splash > Crawlera > Website.

Other than that, the rest looks fine. You can refer to this article for more information: https://support.scrapinghub.com/support/solutions/articles/22000188428-using-crawlera-with-splash-scrapy

0 Votes

1 Comments

nestor posted over 5 years ago Admin Answer

Other than that, the rest looks fine. You can refer to this article for more information: https://support.scrapinghub.com/support/solutions/articles/22000188428-using-crawlera-with-splash-scrapy

0 Votes