Here we will show you how to create your first Scrapy spider. We strongly recommend you also read the Scrapy tutorial for a more in-depth guide.
This assumes you have Scrapy already installed, otherwise please refer to the Scrapy installation guide.
For this example, we will build a spider to scrape famous quotes from this website: http://quotes.toscrape.com/
We begin by creating a Scrapy project which we will call quotes_crawler
:
$ scrapy startproject quotes_crawler
Then we create a spider for quotes.toscrape.com
:
$ scrapy genspider quotes-toscrape quotes.toscrape.com Created spider 'quotes-toscrape' using template 'basic' in module: quotes_crawler.spiders.quotes_toscrape
Then we edit the spider:
$ scrapy edit quotes-toscrape
Here is the code:
import scrapy class QuotesToScrapeSpider(scrapy.Spider): name = "quotes-toscrape" allowed_domains = ["quotes.toscrape.com"] start_urls = ['http://quotes.toscrape.com/', ] def parse(self, response): for quote in response.css("div.quote"): yield { 'text': quote.css("span.text ::text").extract_first(), 'author': quote.css("small.author ::text").extract_first(), 'tags': quote.css("div.tags > a.tag ::text").extract() } next_page_url = response.css("nav > ul > li.next > a ::attr(href)").extract_first() if next_page_url: yield scrapy.Request(response.urljoin(next_page_url))
For more information about Scrapy please refer to the Scrapy documentation.
Was this article helpful?
That’s Great!
Thank you for your feedback
Sorry! We couldn't be helpful
Thank you for your feedback
Feedback sent
We appreciate your effort and will try to fix the article