Sharing data between spiders

Modified on Mon, 28 Nov, 2022 at 3:22 PM

If you need to provide data to a spider within a given project, you can use the API, or the python-scrapinghub library to store the data in collections.


You can use collections to store an arbitrary number of records which are indexed by a key. Projects often use them as a single location to write data from multiple jobs.


The example below shows how you can create a collection and add some data:


$ curl -u APIKEY: -X POST  -d '{"_key": "first_name", "value": "John"}{ "_key": "last_name", "value": "Doe"}'  https://storage.zyte.com/collections/79855/s/form_filling


To retrieve the data, you would then simply do:


$ curl -u APIKEY: -X GET  "https://storage.zyte.com/collections/79855/s/form_filling?key=first_name&key=last_name"
{"value":"John"}
{"value":"Doe"}


And finally, you can delete the data by sending a DELETE request:


$ curl -u APIKEY: -X DELETE "https://storage.zyte.com/collections/79855/s/form_filling"


Using python-scrapinghub programatically


As mentioned before, the python-scrapinghub library can be used to handle the API calls programatically. Here's a sample code that shows how to use the library within a simple python script:


scrapinghub import ScrapinghubClient

API_KEY = 'APIKEY'
PROJECT_ID = '12345'
COLLECTION 'collection-name'

client = ScrapinghubClient(API_KEY)
project = client.get_project(PROJECT_ID)
collection = project.collections.get_store(COLLECTION)
collection.set({
  '_key': '002d050ee3ff6192dcbecc4e4b4457d7',
  'value': '1447221694537'
})

collections.get('002d050ee3ff6192dcbecc4e4b4457d7')
# Returns {'value': '1447221694537'}

collections.iter()
# Returns a Generator object


You can find more information about the library's full API within its documentation

Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select at least one of the reasons
CAPTCHA verification is required.

Feedback sent

We appreciate your effort and will try to fix the article