How can i save the data in scrapyinghub?

Posted almost 6 years ago by hwypengsir

Post a topic

Answered

hwypengsir

The `stockInfo.py` contains:

    import scrapy
    import re
    import pkgutil

    class QuotesSpider(scrapy.Spider):
        name = "stockInfo"
        data = pkgutil.get_data("tutorial", "resources/urls.txt")
        data = data.decode()
        start_urls = data.split("\r\n")

        def parse(self, response):
            company = re.findall("[0-9]{6}",response.url)[0]
            filename = '%s_info.html' % company
            with open(filename, 'wb') as f:
                f.write(response.body)

To execute the spider `stockInfo` in window's cmd.

    d:
    cd tutorial
    scrapy crawl stockInfo

Now all webpage of the url in `resources/urls.txt` will downloaded on the local pc's directory `d:/tutorial`.

Then to deploy the spider into `Scrapinghub`,and run `stockInfo spider`.

No error occur,where is the downloaded webpage?
How the following command lines executed in `Scrapinghub`?

            with open(filename, 'wb') as f:
                f.write(response.body)

How can i save the data in scrapinghub,and download it from scrapinghub when job is finished?

0 Votes

thriveni posted almost 6 years ago Admin Best Answer

There's no write access to Scrapy Cloud. You do have access to /scrapinghub and /tmp folders, but it gets cleared after job run. Instead, you'll need to use the alternative supported file storage provided by Files pipeline, S3 or GCS using Feed Export as given in https://docs.scrapy.org/en/latest/topics/feed-exports.html#storages and https://docs.scrapy.org/en/latest/topics/media-pipeline.html?highlight=gcs#supported-storage.

0 Votes

1 Comments

thriveni posted almost 6 years ago Admin Answer

0 Votes