Could someone upload a sample on how to implement splash with the crawlspider class?
Alternatively i wrote a normal spider doing a similar job like a crawlspider but the handy linkextractor rules missing. I replaced the linkextractor rules with a custom build linkextractor but i miss a seperate rule to only parse specific links:
# WORKING manual way Crawlspider!!!!!!
from scrapy.spiders import Spider
from scrapy_splash import SplashRequest
from w3lib.http import basic_auth_header
from CrawlSpiderSplashTest.items import CrawlspidersplashtestItem
from scrapy.http import Request
import re
class MySpider(Spider):
name = 'reccrawler'
allowed_domains = [""]
start_urls = [""]
def start_requests(self):
for url in self.start_urls:
yield SplashRequest(
'Authorization': basic_auth_header(self.settings['APIKEY'], ''),
def parse(self, response):
links = response.xpath('//a/@href').extract()
# We stored already crawled links in this list
crawledLinks = []
# Pattern to check proper link
linkPattern = re.compile(".*/js/.*")
for link in links:
# If it is a proper link and is not checked yet, yield it to the Spider
if linkPattern.match(link) and not link in crawledLinks:
yield SplashRequest(
'Authorization': basic_auth_header(self.settings['APIKEY'], ''),
for quote in response.css('div.quote'):
item = CrawlspidersplashtestItem()
item["text"] = quote.css('span.text::text').extract_first()
yield item
Alessandro Eren posted
over 3 years ago
looking forward
Nickolas Verdegem posted
almost 6 years ago
would also appreciate this. CrawlSpider (with rules) by using splash-crawlera combination. I couldn't get this done. An example would help a lot.
Hi there,
Alessandro Eren posted over 3 years ago
looking forward
Nickolas Verdegem posted almost 6 years ago
would also appreciate this. CrawlSpider (with rules) by using splash-crawlera combination. I couldn't get this done. An example would help a lot.
Sebastian Pachl posted over 7 years ago
