Dynamically register spider to periodically scrape IG post links
g
george8
started a topic
over 3 years ago
We want to scrape IG post pages to gather insights with registering a spider periodic job for the specific post link every X timespan.
We also need to remove the job in some cases.
The reason is we might have hundreds of posts to scrape and need to happen dynamically as our platform business logic has to register a spider upon event.
Workflow:
1. Register spider with the post link in scrapy cloud and start scraping every X timespan
2. If applicable delete the spider job
3. Retrieve the data extracted or notified that the job is finished with the data (API, webhook)
Any documentation on the above that can help accomplish the scenario would be helpful!
george8
We want to scrape IG post pages to gather insights with registering a spider periodic job for the specific post link every X timespan.
We also need to remove the job in some cases.
The reason is we might have hundreds of posts to scrape and need to happen dynamically as our platform business logic has to register a spider upon event.
Workflow:
1. Register spider with the post link in scrapy cloud and start scraping every X timespan
2. If applicable delete the spider job
3. Retrieve the data extracted or notified that the job is finished with the data (API, webhook)
Any documentation on the above that can help accomplish the scenario would be helpful!