videocamWeb Data Extraction Summit - September 30th, 2021.
Join some of the greatest minds in web scraping to educate, inspire, and innovate.
Register for free!
Start a new topic
Answered

Aggregate results from several spiders

I have four spiders that crawl different websites and collect the same kind of information (same Items) but in very different ways. 


Is it possible to get the results aggregated as one dataset? Even if I select all four spiders when adding a job to run they get split up to different jobs with different result sets.


Best Answer

You can add the data of the spiders to Collections as given in https://support.scrapinghub.com/solution/articles/22000200420-sharing-data-between-spiders.


You can also export the data from collection through UI . Once the collections are created you can find the Collections option in the left sidebar



image


Then navigate to the collection and click on Export. You can export in Json/Json line / XML format. 


image


Regards,

Thriveni Patil




Answer

You can add the data of the spiders to Collections as given in https://support.scrapinghub.com/solution/articles/22000200420-sharing-data-between-spiders.


You can also export the data from collection through UI . Once the collections are created you can find the Collections option in the left sidebar



image


Then navigate to the collection and click on Export. You can export in Json/Json line / XML format. 


image


Regards,

Thriveni Patil



I don't want to run anything locally. Setting up a Collection is fine, but I'm running the jobs from scrapinghub so I want the data to be collected when run there. 


I created a pipeline that writes to the collection. This works when I run it locally, but not from within scrapinghub cloud. The error is that the scrapinghub python package can't be imported, even if I added it to requirements.txt. How can this be resolved?

Login to post a comment