Aggregate results from several spiders

Posted almost 7 years ago by Tomas Walch

Post a topic
Answered
T
Tomas Walch

I have four spiders that crawl different websites and collect the same kind of information (same Items) but in very different ways. 


Is it possible to get the results aggregated as one dataset? Even if I select all four spiders when adding a job to run they get split up to different jobs with different result sets.

0 Votes

thriveni

thriveni posted almost 7 years ago Admin Best Answer

You can add the data of the spiders to Collections as given in https://support.scrapinghub.com/solution/articles/22000200420-sharing-data-between-spiders.


You can also export the data from collection through UI . Once the collections are created you can find the Collections option in the left sidebar



image


Then navigate to the collection and click on Export. You can export in Json/Json line / XML format. 


image


Regards,

Thriveni Patil



0 Votes


2 Comments

Sorted by
thriveni

thriveni posted almost 7 years ago Admin Answer

You can add the data of the spiders to Collections as given in https://support.scrapinghub.com/solution/articles/22000200420-sharing-data-between-spiders.


You can also export the data from collection through UI . Once the collections are created you can find the Collections option in the left sidebar



image


Then navigate to the collection and click on Export. You can export in Json/Json line / XML format. 


image


Regards,

Thriveni Patil



0 Votes

T

Tomas Walch posted almost 7 years ago

I don't want to run anything locally. Setting up a Collection is fine, but I'm running the jobs from scrapinghub so I want the data to be collected when run there. 


I created a pipeline that writes to the collection. This works when I run it locally, but not from within scrapinghub cloud. The error is that the scrapinghub python package can't be imported, even if I added it to requirements.txt. How can this be resolved?

0 Votes

Login to post a comment