DotScrapy Persistence addon

Modified on Wed, 3 Feb, 2021 at 9:38 AM

This addon keeps the content of the .scrapy directory in a persistent store, which is loaded when the spider starts and saved when the spider finishes. It allows spiders to share data between different runs, keeping a state or any kind of data that needs to be persisted.

The .scrapy directory is well known in Scrapy and a few extensions use it to keep a state between runs. The canonical way to work with the .scrapy directory is by calling the scrapy.utils.project.data_path function, as illustrated in the following example:

from scrapy.utils.project import data_path

filename = 'data.txt'
mydata_path = data_path(filename)

# in a local project mydata_path will be /<SCRAPY_PROJECT>/.scrapy/data.txt
# on Scrapy Cloud mydata_path will be /Zyte/.scrapy/data.txt
# use mydata_path to store or read data which will be persisted among runs
# for instance:

if os.path.exists(mydata_path) and os.path.getsize(mydata_path) > 0:
    with open(mydata_path, 'r') as f:
        canned_cookie_jar = f.read()
        cookies_to_send = ast.literal_eval(canned_cookie_jar)

yield scrapy.Request(url='<SOME_URL>', callback=self.parse, cookies=cookies_to_send,)

Supported settings:

DOTSCRAPY_ENABLED -- enables or disables DotScrapy addon (either project-wide or per spider)