I have four spiders that crawl different websites and collect the same kind of information (same Items) but in very different ways.
Is it possible to get the results aggregated as one dataset? Even if I select all four spiders when adding a job to run they get split up to different jobs with different result sets.
You can also export the data from collection through UI . Once the collections are created you can find the Collections option in the left sidebar
Then navigate to the collection and click on Export. You can export in Json/Json line / XML format.
Regards,
Thriveni Patil
T
Tomas Walch
said
over 6 years ago
I don't want to run anything locally. Setting up a Collection is fine, but I'm running the jobs from scrapinghub so I want the data to be collected when run there.
I created a pipeline that writes to the collection. This works when I run it locally, but not from within scrapinghub cloud. The error is that the scrapinghub python package can't be imported, even if I added it to requirements.txt. How can this be resolved?
Tomas Walch
I have four spiders that crawl different websites and collect the same kind of information (same Items) but in very different ways.
Is it possible to get the results aggregated as one dataset? Even if I select all four spiders when adding a job to run they get split up to different jobs with different result sets.
You can add the data of the spiders to Collections as given in https://support.scrapinghub.com/solution/articles/22000200420-sharing-data-between-spiders.
You can also export the data from collection through UI . Once the collections are created you can find the Collections option in the left sidebar
Then navigate to the collection and click on Export. You can export in Json/Json line / XML format.
Regards,
Thriveni Patil
- Oldest First
- Popular
- Newest First
Sorted by Oldest Firstthriveni
You can add the data of the spiders to Collections as given in https://support.scrapinghub.com/solution/articles/22000200420-sharing-data-between-spiders.
You can also export the data from collection through UI . Once the collections are created you can find the Collections option in the left sidebar
Then navigate to the collection and click on Export. You can export in Json/Json line / XML format.
Regards,
Thriveni Patil
Tomas Walch
I don't want to run anything locally. Setting up a Collection is fine, but I'm running the jobs from scrapinghub so I want the data to be collected when run there.
I created a pipeline that writes to the collection. This works when I run it locally, but not from within scrapinghub cloud. The error is that the scrapinghub python package can't be imported, even if I added it to requirements.txt. How can this be resolved?
-
Unable to select Scrapy project in GitHub
-
ScrapyCloud can't call spider?
-
Unhandled error in Deferred
-
Item API - Filtering
-
newbie to web scraping but need data from zillow
-
ValueError: Invalid control character
-
Cancelling account
-
Best Practices
-
Beautifulsoup with ScrapingHub
-
Delete a project in ScrapingHub
See all 458 topics