Dynamically register spider to periodically scrape IG post links

Posted over 5 years ago by george8

Post a topic
Un Answered
g
george8

We want to scrape IG post pages to gather insights with registering a spider periodic job for the specific post link every X timespan. 

We also need to remove the job in some cases. 


The reason is we might have hundreds of posts to scrape and need to happen dynamically as our platform business logic has to register a spider upon event.


Workflow:

1. Register spider with the post link in scrapy cloud and start scraping every X timespan

2. If applicable delete the spider job

3. Retrieve the data extracted or notified that the job is finished with the data (API, webhook)


Any documentation on the above that can help accomplish the scenario would be helpful!

0 Votes


0 Comments

Login to post a comment