Page Storage addon

Modified on Wed, 3 Feb, 2021 at 9:40 AM

If viewing the logs is not enough, the Page Storage Addon could help inspecting the responses Scrapy Cloud is getting from a job's crawl.

1 - Go to https://app.zyte.com/p/<PROJECT_ID>/addons/page_storage, enable it and configure the settings:

Page storage mode:

Cache: Items expire after a month
Versioned Cache: Multiple copies are retained, and each one expires after a month

2 - Stored pages are found as collections at https://app.zyte.com/p/<PROJECT_ID>/collections/.

3 - Each stored page could be downloaded as JSON object or viewed from Dash. In order to check the HTML in a browser, the contents of the body field should be saved as HTML in a new file and opened in any browser.

Fields available per stored page as JSON:

body: html code of the page
_encoding: 
cookies:
url: url of the response
_jobid: job id where the response came from