Scrapy Cloud Advanced Topics

Here you'll find articles on advanced settings and features of Scrapy Cloud.

Publishing and sharing datasets
Note: Portia is no longer available for new users. It has been disabled for all the new organisations from August 20, 2018 onward. You’ve gone through the...
Wed, 3 Feb, 2021 at 11:39 AM
Deploying Custom Docker images on Scrapy Cloud
⚠ Note: this is an advanced feature in beta stage. Use with care. Scrapy Cloud runs your spiders in Docker containers and allows you to build custom images...
Wed, 3 Feb, 2021 at 7:41 AM
Errors while deploying Custom Image to Scrapy Cloud
While deploying custom Docker images to Scrapy Cloud there're some known issues. We are actively working on getting it resolved, but until it's co...
Wed, 3 Feb, 2021 at 7:43 AM
Inspecting your spider's runtime environment with the Job Console
With the job console you can open a Unix shell directly into the container where your job is running. Once in the console, you can perform tasks such as: ...
Wed, 3 Feb, 2021 at 7:44 AM
Configuring scraped fields
In the Job page you will find the Fields box, which is also available in the items browser (but hidden by default). It looks like this: The Fields ...
Thu, 11 Feb, 2021 at 10:44 PM
Versioning your deploys to Zyte Developer Tool Scrapy Cloud
Shub assigns a version number to your project every time you make a deploy to  Zyte Developer Tool Scrapy Cloud. The version assigned depends on whether you...
Wed, 3 Feb, 2021 at 7:49 AM
Reset db using DeltaFetch Add-on
In some occasions you may experience errors using DeltaFetch due the interactions with files in S3. Your output may show errors like this: DBRunReco...
Wed, 3 Feb, 2021 at 11:40 AM
Using a custom proxy in a Scrapy spider
Make use of Scrapy's standard HttpProxyMiddleware by specifying proxy meta value and the autherization header in a Scrapy Request, for example: imp...
Wed, 3 Feb, 2021 at 7:53 AM
Incremental crawls with Scrapy and DeltaFetch in Scrapy Cloud
NOT TO BE CONFUSED WITH THE DELTAFETCH AND DOTSCRAPY PERSISTENCE ADDONS The purpose of this is to avoid requesting pages that have already scraped items...
Wed, 3 Feb, 2021 at 7:54 AM
Downloading and processing images
NOT TO BE CONFUSED WITH THE IMAGES ADDON Scrapy provides reusable item pipelines for downloading images attached to a particular item (for example, when...
Wed, 3 Feb, 2021 at 7:55 AM