Scrapinghub ignoring spider arguments

Posted about 7 years ago by park_gorev

Post a topic
Answered
p
park_gorev

When running spiders locally, I can override custom values that I defined in settings.py using command line. For example, I can run:
scrapy crawl spider_name -s MY_SETTING=False

And this will work fine.


When launching spiders on scrapinghub, I can supply job arguments in web interface, but these are ignored in my case. I can also use Spider Settings page on Scrapinghub, but it doesn't let me edit custom settings, so my only choice is to edit job's arguments, and these are ignored.


Is this an intended behaviour? How can I pass command line arguments to scrapinghub jobs?

0 Votes

thriveni

thriveni posted about 7 years ago Admin Best Answer

You can override Scrapy settings as given in the article Customizing Scrapy settings in Scrapy Cloud . This article shows customizing built in settings.


To provide settings other than built in, you can use the Scrapy Raw settings tab. 


To override settings for a specific spider you would need to navigate to the specific spider page example https://app.scrapinghub.com/p/projectid/spider# and then edit the settings. This would override the project settings. 

1 Votes


5 Comments

Sorted by
Pablo Hoffman

Pablo Hoffman posted about 7 years ago Admin

I think you may be confusing SCrapy settings with spider arguments. Spider arguments are passed via the `-a` argument, not via the `-s` argument. For example:

 

scrapy crawl spider_name -a argument=value

  

And that is how arguments are passed from the job running UI in Scrapinghub dashboard.



1 Votes

p

park_gorev posted about 7 years ago

Apparently, yes, I thought that "argument" in web interface is for overriding Scrapy settings.

To clarify, I want to override custom Scrapy setting, but Scrapy Settings page on Scrapinghub doesn't let me do so, it only lets me choose one of the default Scrapy options.

Therefore my questions are:
  1. Can I override custom Scrapy setting in Scrapinghub interface somehow?
  2. Can I override it per job run (instead of globally, for all spiders)?

 

0 Votes

thriveni

thriveni posted about 7 years ago Admin Answer

You can override Scrapy settings as given in the article Customizing Scrapy settings in Scrapy Cloud . This article shows customizing built in settings.


To provide settings other than built in, you can use the Scrapy Raw settings tab. 


To override settings for a specific spider you would need to navigate to the specific spider page example https://app.scrapinghub.com/p/projectid/spider# and then edit the settings. This would override the project settings. 

1 Votes

p

park_gorev posted about 7 years ago

I tried both approaches and they both work - thanks!


It probably would be more convenient to override some Scrapy settings right from the job starting dialog since I need to do this frequently, and I want to override settings per job only. Modifying Raw Settings (or settings for specific spider) before each run is a little tedious, but at least it works.

2 Votes

L

Lee Prevost posted about 1 year ago

could someone update these links?  Links are broken and I'm looking for similar guide.

0 Votes

Login to post a comment