Configuring Spider Settings with the Python Client

M

Matthew Eby

started a topic about 5 years ago

Is there a programatic way to configure the spider settings described here: https://support.scrapinghub.com/support/solutions/articles/22000200670-customizing-scrapy-settings-in-scrapy-cloud

We're building out our CI/CD pipeline and would like to auto configure these settings as part of our deployment.

Best Answer

n

nestor said about 5 years ago

You can supply job_settings dict when using Python-Scrapinghub https://python-scrapinghub.readthedocs.io/en/latest/client/overview.html#running-jobs

If you use SHUB to schedule you can use -s option: https://shub.readthedocs.io/en/stable/scheduling.html

And last but not least, job_settings via the API: https://doc.scrapinghub.com/api/jobs.html#run-json

M

Matthew Eby

said about 5 years ago

Ok, thanks for confirming.

We run CI/CD pipelines for automatic standing up and tearing down of our environments (development, qa, production, etc). Those pipelines auto create S3 buckets and access keys for us to use. We'd like to set the bucket & AWS keys as part of the deployment process. The keys are difficult to remember and once we are deployed we don't have an easy way to go look up the key up to kick off a job.

I think I've worked around this by generating a config file that settings.py can import.

Thanks.

nestor

said about 5 years ago

There's no API to edit the settings that are stored in the UI, but once you set them in the UI they will be used for all future runs, so I'm not sure why you would need modify them once they're already set (perhaps you could elaborate more on it).

What options are not easy to remember? These settings in the UI are the same settings you would set in settings.py.

M

Matthew Eby

said about 5 years ago

Thanks for pointing those out. This looks like it is configuring the options when running or scheduling a job, but would have to be passed each time a job is run.

What I'm trying to accomplish is configuring these settings one time as part of deploying the spiders so they will be available for all future runs. One reason I'd like to do it this way is that the options are not easy to remember. The job level APIs should be fine for jobs that are kicked off programatically, however we're also sometimes manually kicking off through the UI.

This can be done through the UI as described in this article: https://support.scrapinghub.com/support/solutions/articles/22000200670-customizing-scrapy-settings-in-scrapy-cloud

Can the same options as described in that article be set through the API?