Creating a Scrapy spider

Modified on Wed, 3 Feb, 2021 at 6:43 AM

Here we will show you how to create your first Scrapy spider. We strongly recommend you also read the Scrapy tutorial for a more in-depth guide.

This assumes you have Scrapy already installed, otherwise please refer to the Scrapy installation guide.

For this example, we will build a spider to scrape famous quotes from this website: http://quotes.toscrape.com/

We begin by creating a Scrapy project which we will call quotes_crawler:

$ scrapy startproject quotes_crawler

Then we create a spider for quotes.toscrape.com:

$ scrapy genspider quotes-toscrape quotes.toscrape.com

Created spider 'quotes-toscrape' using template 'basic' in module:
quotes_crawler.spiders.quotes_toscrape

Then we edit the spider:

$ scrapy edit quotes-toscrape

Here is the code:

import scrapy


class QuotesToScrapeSpider(scrapy.Spider):
    name = "quotes-toscrape"
    allowed_domains = ["quotes.toscrape.com"]
    start_urls = ['http://quotes.toscrape.com/', ]

    def parse(self, response):
        for quote in response.css("div.quote"):
            yield {
                'text': quote.css("span.text ::text").extract_first(),
                'author': quote.css("small.author ::text").extract_first(),
                'tags': quote.css("div.tags > a.tag ::text").extract()
            }
        next_page_url = response.css("nav > ul > li.next > a ::attr(href)").extract_first()
        if next_page_url:
            yield scrapy.Request(response.urljoin(next_page_url))

For more information about Scrapy please refer to the Scrapy documentation.