Scrapy Cloud Advanced Topics

Here you'll find articles on advanced settings and features of Scrapy Cloud.

Deploying Python Dependencies for Your Projects in Scrapy Cloud
The environment where your spiders run on Scrapy Cloud brings a set of pre-installed packages. However, sometimes you'll need some extra packages that m...
Thu, 11 Feb, 2021 at 6:34 PM
Deploying Python 3 spiders to Scrapy Cloud
Since March 2020 all new projects on Scrapy Cloud will use a Scrapy Stack that runs on top of Python 3 by default. If you existing project is still runn...
Wed, 3 Feb, 2021 at 7:19 AM
Deploying non-code files
You need to declare the files in the package_data  section of your setup.py  file. For example, if your Scrapy project has the following structure: ...
Wed, 3 Feb, 2021 at 7:20 AM
Changing the Deploy Environment With Scrapy Cloud Stacks
You can select the runtime environment for your spiders from a list of pre-defined stacks. Each stack is a runtime environment containing certain versions o...
Fri, 12 Feb, 2021 at 1:54 AM
Running custom Python scripts
In addition to Scrapy spiders, you can also run custom, standalone python scripts on Scrapy Cloud. They need to be declared in the scripts section of your p...
Wed, 25 Oct, 2023 at 10:41 AM
Exporting scraped items to an AWS/S3 account (UI mode)
In order to store your items in an S3 account provided by AWS, you need to enable certain Scrapy settings in Scrapy Cloud. First, go to your Spider sett...
Wed, 3 Feb, 2021 at 11:38 AM
Sharing data between spiders
If you need to provide data to a spider within a given project, you can use the API, or the python-scrapinghub library to store the data in collections. ...
Mon, 28 Nov, 2022 at 3:22 PM
Scrapy Cloud API
The Scrapy Cloud API (often also referred as the Zyte API) is a HTTP API that you can use to control your spiders and consume the scraped data, among other ...
Wed, 3 Feb, 2021 at 7:34 AM
Fetching latest spider data
To get the scraped items, you can use the Items API. In some cases, it's convenient to have a static URL that points to the last job, in a specific for...
Wed, 3 Feb, 2021 at 7:34 AM
Understanding Job Outcomes
The job outcome indicates whether the job succeeded or failed. By default, it contains the value of the spider close reason from Scrapy. It’s available in t...
Fri, 12 Feb, 2021 at 2:00 AM