Scrapy Cloud Advanced Topics
Here you'll find articles on advanced settings and features of Scrapy Cloud.
The environment where your spiders run on Scrapy Cloud brings a set of pre-installed packages. However, sometimes you'll need some extra packages that m...
Thu, 11 Feb, 2021 at 6:34 PM
Since March 2020 all new projects on Scrapy Cloud will use a Scrapy Stack that runs on top of Python 3 by default. If you existing project is still runn...
Wed, 3 Feb, 2021 at 7:19 AM
You need to declare the files in the package_data section of your setup.py file. For example, if your Scrapy project has the following structure: ...
Wed, 3 Feb, 2021 at 7:20 AM
You can select the runtime environment for your spiders from a list of pre-defined stacks. Each stack is a runtime environment containing certain versions o...
Fri, 12 Feb, 2021 at 1:54 AM
In addition to Scrapy spiders, you can also run custom, standalone python scripts on Scrapy Cloud. They need to be declared in the scripts section of your p...
Wed, 3 Feb, 2021 at 7:25 AM
In order to store your items in an S3 account provided by AWS, you need to enable certain Scrapy settings in Scrapy Cloud. First, go to your Spider sett...
Wed, 3 Feb, 2021 at 11:38 AM
If you need to provide data to a spider within a given project, you can use the API, or the python-scrapinghub library to store the data in collections. ...
Mon, 28 Nov, 2022 at 3:22 PM
The Scrapy Cloud API (often also referred as the Zyte API) is a HTTP API that you can use to control your spiders and consume the scraped data, among other ...
Wed, 3 Feb, 2021 at 7:34 AM
To get the scraped items, you can use the Items API. In some cases, it's convenient to have a static URL that points to the last job, in a specific for...
Wed, 3 Feb, 2021 at 7:34 AM
The job outcome indicates whether the job succeeded or failed. By default, it contains the value of the spider close reason from Scrapy. It’s available in t...
Fri, 12 Feb, 2021 at 2:00 AM