This addon keeps the content of the .scrapy
directory in a persistent store, which is loaded when the spider starts and saved when the spider finishes. It allows spiders to share data between different runs, keeping a state or any kind of data that needs to be persisted.
The .scrapy
directory is well known in Scrapy and a few extensions use it to keep a state between runs. The canonical way to work with the .scrapy
directory is by calling the scrapy.utils.project.data_path
function, as illustrated in the following example:
from scrapy.utils.project import data_path filename = 'data.txt' mydata_path = data_path(filename) # in a local project mydata_path will be /<SCRAPY_PROJECT>/.scrapy/data.txt # on Scrapy Cloud mydata_path will be /Zyte/.scrapy/data.txt # use mydata_path to store or read data which will be persisted among runs # for instance: if os.path.exists(mydata_path) and os.path.getsize(mydata_path) > 0: with open(mydata_path, 'r') as f: canned_cookie_jar = f.read() cookies_to_send = ast.literal_eval(canned_cookie_jar) yield scrapy.Request(url='<SOME_URL>', callback=self.parse, cookies=cookies_to_send,)
Supported settings:
DOTSCRAPY_ENABLED
-- enables or disables DotScrapy addon (either project-wide or per spider)
Was this article helpful?
That’s Great!
Thank you for your feedback
Sorry! We couldn't be helpful
Thank you for your feedback
Feedback sent
We appreciate your effort and will try to fix the article