Is it possible to send request to scraper on scraping hub to start and get results continuously returned to frontend from where I called the scraper?
0 Votes
3 Comments
Sorted by
peixotoposted
over 5 years ago
Admin
Hi,
Sounds like you need to integrate two different systems. One crawler that scrapes the content from the web and produces a "database", and one consume job that will use that data.
If you wish to run your Crawler inside Scrapy Cloud, we offer a few APIs that may help you to create this integration.
If you need real time data, then I also recommend you checking on ScrapyRT. It it not yet supported as product and not yet integrated on Scrapy Cloud, but it's worth mentioning: https://github.com/scrapinghub/scrapyrt
Let me know if these help your current project.
0 Votes
J
Jakub Baresposted
over 5 years ago
The business problem is following. On the hotel reviews website user opens a hotel profile. Browser extension gets triggered and the scraper begins downloading all reviews for the hotel. Immediately a summary of the reviews is displayed. As more reviews are scraped the summary updates every few seconds
0 Votes
J
Jakub Baresposted
over 5 years ago
Actually I need something better. Because I need to process the scraped data on backend with deep learning model. So better solution is I will start a process on backend from where I will start the scraper on scraping hub and process the results it continously returns back. Then from frontend to the backend I will ask for the final results every x seconds.
Hey,
Is it possible to send request to scraper on scraping hub to start and get results continuously returned to frontend from where I called the scraper?
0 Votes
3 Comments
peixoto posted over 5 years ago Admin
Hi,
Sounds like you need to integrate two different systems. One crawler that scrapes the content from the web and produces a "database", and one consume job that will use that data.
If you wish to run your Crawler inside Scrapy Cloud, we offer a few APIs that may help you to create this integration.
Using the Scrapy Cloud Jobs API you may control running jobs: https://doc.scrapinghub.com/api/jobs.html#jobs-api
We also offer our Storage set of APIs that provides several endpoints to deal with the results of jobs and spiders: https://doc.scrapinghub.com/scrapy-cloud.html#storage-scrapinghub-com
If you need real time data, then I also recommend you checking on ScrapyRT. It it not yet supported as product and not yet integrated on Scrapy Cloud, but it's worth mentioning: https://github.com/scrapinghub/scrapyrt
Let me know if these help your current project.
0 Votes
Jakub Bares posted over 5 years ago
The business problem is following.
On the hotel reviews website user opens a hotel profile. Browser extension gets triggered and the scraper begins downloading all reviews for the hotel. Immediately a summary of the reviews is displayed. As more reviews are scraped the summary updates every few seconds
0 Votes
Jakub Bares posted over 5 years ago
Actually I need something better.
Because I need to process the scraped data on backend with deep learning model.
So better solution is I will start a process on backend from where I will start the scraper on scraping hub and process the results it continously returns back. Then from frontend to the backend I will ask for the final results every x seconds.
0 Votes
Login to post a comment