When running spiders locally, I can override custom values that I defined in settings.py using command line. For example, I can run:
scrapy crawl spider_name -s MY_SETTING=False
And this will work fine.
When launching spiders on scrapinghub, I can supply job arguments in web interface, but these are ignored in my case. I can also use Spider Settings page on Scrapinghub, but it doesn't let me edit custom settings, so my only choice is to edit job's arguments, and these are ignored.
Is this an intended behaviour? How can I pass command line arguments to scrapinghub jobs?
To provide settings other than built in, you can use the Scrapy Raw settings tab.
To override settings for a specific spider you would need to navigate to the specific spider page example https://app.scrapinghub.com/p/projectid/spider# and then edit the settings. This would override the project settings.
I think you may be confusing SCrapy settings with spider arguments. Spider arguments are passed via the `-a` argument, not via the `-s` argument. For example:
scrapy crawl spider_name -a argument=value
And that is how arguments are passed from the job running UI in Scrapinghub dashboard.
1 person likes this
p
park_gorev
said
almost 7 years ago
Apparently, yes, I thought that "argument" in web interface is for overriding Scrapy settings.
To clarify, I want to override custom Scrapy setting, but Scrapy Settings page on Scrapinghub doesn't let me do so, it only lets me choose one of the default Scrapy options.
Therefore my questions are:
Can I override custom Scrapy setting in Scrapinghub interface somehow?
Can I override it per job run (instead of globally, for all spiders)?
To provide settings other than built in, you can use the Scrapy Raw settings tab.
To override settings for a specific spider you would need to navigate to the specific spider page example https://app.scrapinghub.com/p/projectid/spider# and then edit the settings. This would override the project settings.
1 person likes this
p
park_gorev
said
almost 7 years ago
I tried both approaches and they both work - thanks!
It probably would be more convenient to override some Scrapy settings right from the job starting dialog since I need to do this frequently, and I want to override settings per job only. Modifying Raw Settings (or settings for specific spider) before each run is a little tedious, but at least it works.
2 people like this
L
Lee Prevost
said
10 months ago
could someone update these links? Links are broken and I'm looking for similar guide.
park_gorev
And this will work fine.
When launching spiders on scrapinghub, I can supply job arguments in web interface, but these are ignored in my case. I can also use Spider Settings page on Scrapinghub, but it doesn't let me edit custom settings, so my only choice is to edit job's arguments, and these are ignored.
Is this an intended behaviour? How can I pass command line arguments to scrapinghub jobs?
You can override Scrapy settings as given in the article Customizing Scrapy settings in Scrapy Cloud . This article shows customizing built in settings.
To provide settings other than built in, you can use the Scrapy Raw settings tab.
To override settings for a specific spider you would need to navigate to the specific spider page example https://app.scrapinghub.com/p/projectid/spider# and then edit the settings. This would override the project settings.
- Oldest First
- Popular
- Newest First
Sorted by Oldest FirstPablo Hoffman
I think you may be confusing SCrapy settings with spider arguments. Spider arguments are passed via the `-a` argument, not via the `-s` argument. For example:
And that is how arguments are passed from the job running UI in Scrapinghub dashboard.
1 person likes this
park_gorev
To clarify, I want to override custom Scrapy setting, but Scrapy Settings page on Scrapinghub doesn't let me do so, it only lets me choose one of the default Scrapy options.
Therefore my questions are:
thriveni
You can override Scrapy settings as given in the article Customizing Scrapy settings in Scrapy Cloud . This article shows customizing built in settings.
To provide settings other than built in, you can use the Scrapy Raw settings tab.
To override settings for a specific spider you would need to navigate to the specific spider page example https://app.scrapinghub.com/p/projectid/spider# and then edit the settings. This would override the project settings.
1 person likes this
park_gorev
I tried both approaches and they both work - thanks!
It probably would be more convenient to override some Scrapy settings right from the job starting dialog since I need to do this frequently, and I want to override settings per job only. Modifying Raw Settings (or settings for specific spider) before each run is a little tedious, but at least it works.
2 people like this
Lee Prevost
could someone update these links? Links are broken and I'm looking for similar guide.
-
Unable to select Scrapy project in GitHub
-
ScrapyCloud can't call spider?
-
Unhandled error in Deferred
-
Item API - Filtering
-
newbie to web scraping but need data from zillow
-
ValueError: Invalid control character
-
Cancelling account
-
Best Practices
-
Beautifulsoup with ScrapingHub
-
Delete a project in ScrapingHub
See all 460 topics