Start a new topic

Crawlspider and Splash

Hi there,

i coded a normal spider using splash and ur great samples on github (, but i couldn't get a crawlspider to work with splash.

Could someone upload a sample on how to implement splash with the crawlspider class?

Alternatively i wrote a normal spider doing a similar job like a crawlspider but the handy linkextractor rules missing. I replaced the linkextractor rules with a custom build linkextractor but i miss a seperate rule to only parse specific links:

# WORKING manual way Crawlspider!!!!!!

from scrapy.spiders import Spider
from scrapy_splash import SplashRequest
from w3lib.http import basic_auth_header
from CrawlSpiderSplashTest.items import CrawlspidersplashtestItem

from scrapy.http import Request
import re

class MySpider(Spider):
    name = 'reccrawler'
    allowed_domains = [""]
    start_urls = [""]

    def start_requests(self):
        for url in self.start_urls:
            yield SplashRequest(
                    'Authorization': basic_auth_header(self.settings['APIKEY'], ''),

    def parse(self, response):
        links = response.xpath('//a/@href').extract()

        # We stored already crawled links in this list
        crawledLinks = []

        # Pattern to check proper link
        linkPattern = re.compile(".*/js/.*")

        for link in links:
            # If it is a proper link and is not checked yet, yield it to the Spider
            if linkPattern.match(link) and not link in crawledLinks:

                yield SplashRequest(
                        'Authorization': basic_auth_header(self.settings['APIKEY'], ''),

        for quote in response.css('div.quote'):
            item = CrawlspidersplashtestItem()
            item["text"] = quote.css('span.text::text').extract_first()
            yield item


Thx in Advance


would also appreciate this. CrawlSpider (with rules) by using splash-crawlera combination. I couldn't get this done. An example would help a lot.

looking forward

Login to post a comment