We run CI/CD pipelines for automatic standing up and tearing down of our environments (development, qa, production, etc). Those pipelines auto create S3 buckets and access keys for us to use. We'd like to set the bucket & AWS keys as part of the deployment process. The keys are difficult to remember and once we are deployed we don't have an easy way to go look up the key up to kick off a job.
I think I've worked around this by generating a config file that settings.py can import.
Thanks.
nestor
said
about 5 years ago
There's no API to edit the settings that are stored in the UI, but once you set them in the UI they will be used for all future runs, so I'm not sure why you would need modify them once they're already set (perhaps you could elaborate more on it).
What options are not easy to remember? These settings in the UI are the same settings you would set in settings.py.
M
Matthew Eby
said
about 5 years ago
Thanks for pointing those out. This looks like it is configuring the options when running or scheduling a job, but would have to be passed each time a job is run.
What I'm trying to accomplish is configuring these settings one time as part of deploying the spiders so they will be available for all future runs. One reason I'd like to do it this way is that the options are not easy to remember. The job level APIs should be fine for jobs that are kicked off programatically, however we're also sometimes manually kicking off through the UI.
Matthew Eby
Is there a programatic way to configure the spider settings described here: https://support.scrapinghub.com/support/solutions/articles/22000200670-customizing-scrapy-settings-in-scrapy-cloud
We're building out our CI/CD pipeline and would like to auto configure these settings as part of our deployment.
You can supply job_settings dict when using Python-Scrapinghub https://python-scrapinghub.readthedocs.io/en/latest/client/overview.html#running-jobs
If you use SHUB to schedule you can use -s option: https://shub.readthedocs.io/en/stable/scheduling.html
And last but not least, job_settings via the API: https://doc.scrapinghub.com/api/jobs.html#run-json
- Oldest First
- Popular
- Newest First
Sorted by Newest FirstMatthew Eby
Ok, thanks for confirming.
We run CI/CD pipelines for automatic standing up and tearing down of our environments (development, qa, production, etc). Those pipelines auto create S3 buckets and access keys for us to use. We'd like to set the bucket & AWS keys as part of the deployment process. The keys are difficult to remember and once we are deployed we don't have an easy way to go look up the key up to kick off a job.
I think I've worked around this by generating a config file that settings.py can import.
Thanks.
nestor
There's no API to edit the settings that are stored in the UI, but once you set them in the UI they will be used for all future runs, so I'm not sure why you would need modify them once they're already set (perhaps you could elaborate more on it).
What options are not easy to remember? These settings in the UI are the same settings you would set in settings.py.
Matthew Eby
Thanks for pointing those out. This looks like it is configuring the options when running or scheduling a job, but would have to be passed each time a job is run.
What I'm trying to accomplish is configuring these settings one time as part of deploying the spiders so they will be available for all future runs. One reason I'd like to do it this way is that the options are not easy to remember. The job level APIs should be fine for jobs that are kicked off programatically, however we're also sometimes manually kicking off through the UI.
This can be done through the UI as described in this article: https://support.scrapinghub.com/support/solutions/articles/22000200670-customizing-scrapy-settings-in-scrapy-cloud
Can the same options as described in that article be set through the API?
nestor
You can supply job_settings dict when using Python-Scrapinghub https://python-scrapinghub.readthedocs.io/en/latest/client/overview.html#running-jobs
If you use SHUB to schedule you can use -s option: https://shub.readthedocs.io/en/stable/scheduling.html
And last but not least, job_settings via the API: https://doc.scrapinghub.com/api/jobs.html#run-json
-
Unable to select Scrapy project in GitHub
-
ScrapyCloud can't call spider?
-
Unhandled error in Deferred
-
Item API - Filtering
-
newbie to web scraping but need data from zillow
-
ValueError: Invalid control character
-
Cancelling account
-
Best Practices
-
Beautifulsoup with ScrapingHub
-
Delete a project in ScrapingHub
See all 453 topics