Start a new topic
Answered

Scrapinghub ignoring spider arguments

When running spiders locally, I can override custom values that I defined in settings.py using command line. For example, I can run:
scrapy crawl spider_name -s MY_SETTING=False

And this will work fine.


When launching spiders on scrapinghub, I can supply job arguments in web interface, but these are ignored in my case. I can also use Spider Settings page on Scrapinghub, but it doesn't let me edit custom settings, so my only choice is to edit job's arguments, and these are ignored.


Is this an intended behaviour? How can I pass command line arguments to scrapinghub jobs?


Best Answer

You can override Scrapy settings as given in the article Customizing Scrapy settings in Scrapy Cloud . This article shows customizing built in settings.


To provide settings other than built in, you can use the Scrapy Raw settings tab. 


To override settings for a specific spider you would need to navigate to the specific spider page example https://app.scrapinghub.com/p/projectid/spider# and then edit the settings. This would override the project settings. 


I think you may be confusing SCrapy settings with spider arguments. Spider arguments are passed via the `-a` argument, not via the `-s` argument. For example:

 

scrapy crawl spider_name -a argument=value

  

And that is how arguments are passed from the job running UI in Scrapinghub dashboard.




1 person likes this
Apparently, yes, I thought that "argument" in web interface is for overriding Scrapy settings.

To clarify, I want to override custom Scrapy setting, but Scrapy Settings page on Scrapinghub doesn't let me do so, it only lets me choose one of the default Scrapy options.

Therefore my questions are:
  1. Can I override custom Scrapy setting in Scrapinghub interface somehow?
  2. Can I override it per job run (instead of globally, for all spiders)?

 

Answer

You can override Scrapy settings as given in the article Customizing Scrapy settings in Scrapy Cloud . This article shows customizing built in settings.


To provide settings other than built in, you can use the Scrapy Raw settings tab. 


To override settings for a specific spider you would need to navigate to the specific spider page example https://app.scrapinghub.com/p/projectid/spider# and then edit the settings. This would override the project settings. 


1 person likes this

I tried both approaches and they both work - thanks!


It probably would be more convenient to override some Scrapy settings right from the job starting dialog since I need to do this frequently, and I want to override settings per job only. Modifying Raw Settings (or settings for specific spider) before each run is a little tedious, but at least it works.

2 people like this

could someone update these links?  Links are broken and I'm looking for similar guide.

Login to post a comment