Start a new topic

Continously sending scraped data to frontend

Hey,

Is it possible to send request to scraper on scraping hub to start and get results continuously returned to frontend from where I called the scraper?


Hi,


Sounds like you need to integrate two different systems. One crawler that scrapes the content from the web and produces a "database", and one consume job that will use that data.

If you wish to run your Crawler inside Scrapy Cloud, we offer a few APIs that may help you to create this integration.


Using the Scrapy Cloud Jobs API you may control running jobs: https://doc.scrapinghub.com/api/jobs.html#jobs-api

We also offer our Storage set of APIs that provides several endpoints to deal with the results of jobs and spiders: https://doc.scrapinghub.com/scrapy-cloud.html#storage-scrapinghub-com


If you need real time data, then I also recommend you checking on ScrapyRT. It it not yet supported as product and not yet integrated on Scrapy Cloud, but it's worth mentioning: https://github.com/scrapinghub/scrapyrt


Let me know if these help your current project.

Actually I need something better.
Because I need to process the scraped data on backend with deep learning model.
So better solution is I will start a process on backend from where I will start the scraper on scraping hub and process the results it continously returns back. Then from frontend to the backend I will ask for the final results every x seconds.

The business problem is following.
On the hotel reviews website user opens a hotel profile. Browser extension gets triggered and the scraper begins downloading all reviews for the hotel. Immediately a summary of the reviews is displayed. As more reviews are scraped the summary updates every few seconds

Login to post a comment