Start a new topic

Dynamically register spider to periodically scrape IG post links

We want to scrape IG post pages to gather insights with registering a spider periodic job for the specific post link every X timespan. 

We also need to remove the job in some cases. 


The reason is we might have hundreds of posts to scrape and need to happen dynamically as our platform business logic has to register a spider upon event.


Workflow:

1. Register spider with the post link in scrapy cloud and start scraping every X timespan

2. If applicable delete the spider job

3. Retrieve the data extracted or notified that the job is finished with the data (API, webhook)


Any documentation on the above that can help accomplish the scenario would be helpful!

Login to post a comment