Start a new topic
Answered

how to connect scrapy/splash to crawlera

apart from setting:


 

SPLASH_URL = SCRAPINGHUB_SPLASH_URL

SPLASH_APIKEY = SCRAPINGHUB_SPLASH_KEY

CRAWLERA_ENABLED = True

CRAWLERA_APIKEY = SCRAPINGHUB_CRAWLERA_KEY


and then make a Scrapy.Spider with 


start_request(self):

 yield SplashRequest(

 

url=self.start_urls[0],

endpoint='execute',

callback=self.parse,

args={

'lua_source': self.script,

'crawlera_user': self.settings['CRAWLERA_APIKEY'],

'timeout': 3600,

},

cache_args=['lua_source'],

)


what is exactly needed to connect to CRAWLERA to scrape stuff???

I am kind of lost at this point...




Best Answer

CRAWLERA_ENABLED should not be set to 'true' as that will activate the scrapy-crawlera middleware and mess up the order of requests Scrapy > Crawlera > Splash > Website instead of Scrapy > Splash > Crawlera > Website.

Other than that, the rest looks fine. You can refer to this article for more information:  https://support.scrapinghub.com/support/solutions/articles/22000188428-using-crawlera-with-splash-scrapy 

1 Comment

Answer

CRAWLERA_ENABLED should not be set to 'true' as that will activate the scrapy-crawlera middleware and mess up the order of requests Scrapy > Crawlera > Splash > Website instead of Scrapy > Splash > Crawlera > Website.

Other than that, the rest looks fine. You can refer to this article for more information:  https://support.scrapinghub.com/support/solutions/articles/22000188428-using-crawlera-with-splash-scrapy 

Login to post a comment