I have four spiders that crawl different websites and collect the same kind of information (same Items) but in very different ways.
Is it possible to get the results aggregated as one dataset? Even if I select all four spiders when adding a job to run they get split up to different jobs with different result sets.
0 Votes
thriveni posted
almost 7 years ago
AdminBest Answer
You can also export the data from collection through UI . Once the collections are created you can find the Collections option in the left sidebar
Then navigate to the collection and click on Export. You can export in Json/Json line / XML format.
Regards,
Thriveni Patil
0 Votes
T
Tomas Walchposted
almost 7 years ago
I don't want to run anything locally. Setting up a Collection is fine, but I'm running the jobs from scrapinghub so I want the data to be collected when run there.
I created a pipeline that writes to the collection. This works when I run it locally, but not from within scrapinghub cloud. The error is that the scrapinghub python package can't be imported, even if I added it to requirements.txt. How can this be resolved?
I have four spiders that crawl different websites and collect the same kind of information (same Items) but in very different ways.
Is it possible to get the results aggregated as one dataset? Even if I select all four spiders when adding a job to run they get split up to different jobs with different result sets.
0 Votes
thriveni posted almost 7 years ago Admin Best Answer
You can add the data of the spiders to Collections as given in https://support.scrapinghub.com/solution/articles/22000200420-sharing-data-between-spiders.
You can also export the data from collection through UI . Once the collections are created you can find the Collections option in the left sidebar
Then navigate to the collection and click on Export. You can export in Json/Json line / XML format.
Regards,
Thriveni Patil
0 Votes
2 Comments
thriveni posted almost 7 years ago Admin Answer
You can add the data of the spiders to Collections as given in https://support.scrapinghub.com/solution/articles/22000200420-sharing-data-between-spiders.
You can also export the data from collection through UI . Once the collections are created you can find the Collections option in the left sidebar
Then navigate to the collection and click on Export. You can export in Json/Json line / XML format.
Regards,
Thriveni Patil
0 Votes
Tomas Walch posted almost 7 years ago
I don't want to run anything locally. Setting up a Collection is fine, but I'm running the jobs from scrapinghub so I want the data to be collected when run there.
I created a pipeline that writes to the collection. This works when I run it locally, but not from within scrapinghub cloud. The error is that the scrapinghub python package can't be imported, even if I added it to requirements.txt. How can this be resolved?
0 Votes
Login to post a comment