I have a bunch of spiders running on Scrapy Cloud on a periodic basis. I need to be able to write the scraped data to multiple locations.
I am able to do so when I run the spiders on my local machine using the FEEDS variable that I set in custom settings:
custom_settings= { "FEEDS": {f"s3://systems_data/systems_sample_results_page/FILE1.jsonl": {"format": "jsonlines"}, f"s3://systems_data/systems_historical_results/FILE2.jsonl": {"format": "jsonlines"} } }
In Scrapy cloud, there is a custom setting for FEED_URI but not FEEDS. As far as I can tell, this only allows writing to one location.
How do I write scraped data to multiple locations in Scrapy cloud?
Locally, I am using Mac OSx, Scrapy 2.9.0 Python 3.8.8
You can use FEEDS in Scrapy Cloud as well, it allows arbitrary settings, not only the listed ones.
To enter an arbitrary setting name, select the first entry in the drop-down list, “Custom Name”.
You can also use the “Raw Settings” tab to edit all your settings as plain text, which is sometimes easier.
Aaron McGarvey
I have a bunch of spiders running on Scrapy Cloud on a periodic basis. I need to be able to write the scraped data to multiple locations.
I am able to do so when I run the spiders on my local machine using the FEEDS variable that I set in custom settings:
In Scrapy cloud, there is a custom setting for FEED_URI but not FEEDS. As far as I can tell, this only allows writing to one location.
How do I write scraped data to multiple locations in Scrapy cloud?
Locally, I am using Mac OSx, Scrapy 2.9.0 Python 3.8.8
You can use FEEDS in Scrapy Cloud as well, it allows arbitrary settings, not only the listed ones.
To enter an arbitrary setting name, select the first entry in the drop-down list, “Custom Name”.
You can also use the “Raw Settings” tab to edit all your settings as plain text, which is sometimes easier.
Adrian Chaves
You can use FEEDS in Scrapy Cloud as well, it allows arbitrary settings, not only the listed ones.
To enter an arbitrary setting name, select the first entry in the drop-down list, “Custom Name”.
You can also use the “Raw Settings” tab to edit all your settings as plain text, which is sometimes easier.
-
Crawlera 503 Ban
-
Amazon scraping speed
-
Website redirects
-
Error Code 429 Too Many Requests
-
Bing
-
Subscribed to Crawlera but saying Not Subscribed
-
Selenium with c#
-
Using Crawlera with browsermob
-
CRAWLERA_PRESERVE_DELAY leads to error
-
How to connect Selenium PhantomJS to Crawlera?
See all 399 topics