Using Scrapy with Splash

Modified on Wed, 10 Mar, 2021 at 4:56 AM

The recommended way to integrate Scrapy and Splash is using the scrapy-splash library. There are two ways to authenticate to your Splash instance when using it.



1. Using HttpAuthMiddleware

You can use the HttpAuthMiddleware to send every single request from your spider to Splash. Simply add the following attribute to your spider class:


http_user = '<APIKEY>'


Where <APIKEY> is your Splash API key (see details below).


Check out an example spider



2. Using splash_headers

If you only want to make certain requests through Splash, you can send the authorization header manually using the splash_headers parameter to the SplashRequest object. See this example:


from w3lib.http import basic_auth_header

...

yield SplashRequest(
    'http://target.website.com/',
     splash_headers={'Authorization': basic_auth_header('<APIKEY>', '')}
)


Notice that you have to build a basic HTTP authorization header with your API key on it. 


Check out an example spider



Where are my credentials?

You can find the API key (user) and URL for your Splash instance in your organizations's Splash > Setup page, as shown below:


If you haven't signed up for Splash yet, have a look at this article on how to do it.

Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select at least one of the reasons
CAPTCHA verification is required.

Feedback sent

We appreciate your effort and will try to fix the article