Is there any way to create or modify periodic jobs programmatically? I'd like to create a process where I autogenerate spiders and commit them to github, where they are pulled down automatically by ScrapingHub.
Then I'd like to script modifying a periodic job to add new spiders to the job to be run on a periodic basis.
This is a very popular request and we are already working on an API for Periodic Jobs, it should be ready sometime this year. For now, the only option is to do it via the UI.
We want to scrape IG post page with registering a spider for the specific post link every X timespan. We also need to remove the job in some cases. The reason is we can have hundreds of posts to scrape and need to happen dynamically.
Workflow:
1. Register spider in scrapy cloud and start scraping every X timespan
2. If applicable delete the spider job
3. Retrieve the data extracted or notified that the job is finished with the data
Any documentation on the above subjects that can help accomplish the scenario would be helpful
1 person likes this
A
Aaron Cowper
said
about 4 years ago
Hi, any update on when this feature will be available?
1 person likes this
nestor
said
over 4 years ago
Answer
This is a very popular request and we are already working on an API for Periodic Jobs, it should be ready sometime this year. For now, the only option is to do it via the UI.
Simon Mosk-Aoyama
Hello,
Is there any way to create or modify periodic jobs programmatically? I'd like to create a process where I autogenerate spiders and commit them to github, where they are pulled down automatically by ScrapingHub.
Then I'd like to script modifying a periodic job to add new spiders to the job to be run on a periodic basis.
Is this possible? The Jobs API only seems to be for one-off jobs (https://doc.scrapinghub.com/api/jobs.html).
thanks!
This is a very popular request and we are already working on an API for Periodic Jobs, it should be ready sometime this year. For now, the only option is to do it via the UI.
- Oldest First
- Popular
- Newest First
Sorted by PopularHareesh Kadali
any update on this requirement?
1 person likes this
george8
We are also looking at something similar.
We want to scrape IG post page with registering a spider for the specific post link every X timespan. We also need to remove the job in some cases. The reason is we can have hundreds of posts to scrape and need to happen dynamically.
Workflow:
1. Register spider in scrapy cloud and start scraping every X timespan
2. If applicable delete the spider job
3. Retrieve the data extracted or notified that the job is finished with the data
Any documentation on the above subjects that can help accomplish the scenario would be helpful
1 person likes this
Aaron Cowper
Hi, any update on when this feature will be available?
1 person likes this
nestor
This is a very popular request and we are already working on an API for Periodic Jobs, it should be ready sometime this year. For now, the only option is to do it via the UI.
Aaron Cowper
Any update on this? Been waiting 2+ years...
-
Unable to select Scrapy project in GitHub
-
ScrapyCloud can't call spider?
-
Unhandled error in Deferred
-
Item API - Filtering
-
newbie to web scraping but need data from zillow
-
ValueError: Invalid control character
-
Cancelling account
-
Best Practices
-
Beautifulsoup with ScrapingHub
-
Delete a project in ScrapingHub
See all 442 topics