What are the differences between running a spider locally and on Scrapy Cloud?

Modified on Wed, 18 Oct, 2023 at 12:46 PM

Here are a few things that work differently on Scrapy Cloud, compared to a default Scrapy configuration:

AutoThrottle extension is enabled, to crawl websites politely
JOBDIR is set, causing the scheduler (requests queue) to be persisted on disk, and save memory
LOG_LEVEL is set to INFO

Note that Scrapy Cloud servers are located in Germany -- this may pose an obstacle when targeting websites that restrict access based on geolocation. In such cases, using a proxy service is recommended, e.g. Zyte Smart Proxy Manager(formerly Crawlera) .