If you need to provide data to a spider within a given project, you can use the API, or the python-scrapinghub library to store the data in collections.
You can use collections to store an arbitrary number of records which are indexed by a key. Projects often use them as a single location to write data from multiple jobs.
The example below shows how you can create a collection and add some data:
$ curl -u APIKEY: -X POST -d '{"_key": "first_name", "value": "John"}{ "_key": "last_name", "value": "Doe"}' https://storage.zyte.com/collections/79855/s/form_filling
To retrieve the data, you would then simply do:
$ curl -u APIKEY: -X GET "https://storage.zyte.com/collections/79855/s/form_filling?key=first_name&key=last_name"
{"value":"John"}
{"value":"Doe"}
And finally, you can delete the data by sending a DELETE request:
$ curl -u APIKEY: -X DELETE "https://storage.zyte.com/collections/79855/s/form_filling"
Using python-scrapinghub programatically
As mentioned before, the python-scrapinghub library can be used to handle the API calls programatically. Here's a sample code that shows how to use the library within a simple python script:
scrapinghub import ScrapinghubClient API_KEY = 'APIKEY' PROJECT_ID = '12345' COLLECTION 'collection-name' client = ScrapinghubClient(API_KEY) project = client.get_project(PROJECT_ID) collection = project.collections.get_store(COLLECTION) collection.set({ '_key': '002d050ee3ff6192dcbecc4e4b4457d7', 'value': '1447221694537' }) collections.get('002d050ee3ff6192dcbecc4e4b4457d7') # Returns {'value': '1447221694537'} collections.iter() # Returns a Generator object
You can find more information about the library's full API within its documentation
Was this article helpful?
That’s Great!
Thank you for your feedback
Sorry! We couldn't be helpful
Thank you for your feedback
Feedback sent
We appreciate your effort and will try to fix the article