class QuotesSpider(scrapy.Spider): name = "stockInfo" data = pkgutil.get_data("tutorial", "resources/urls.txt") data = data.decode() start_urls = data.split("\r\n")
def parse(self, response): company = re.findall("[0-9]{6}",response.url)[0] filename = '%s_info.html' % company with open(filename, 'wb') as f: f.write(response.body)
To execute the spider `stockInfo` in window's cmd.
d: cd tutorial scrapy crawl stockInfo
Now all webpage of the url in `resources/urls.txt` will downloaded on the local pc's directory `d:/tutorial`.
Then to deploy the spider into `Scrapinghub`,and run `stockInfo spider`.
No error occur,where is the downloaded webpage? How the following command lines executed in `Scrapinghub`?
with open(filename, 'wb') as f: f.write(response.body)
How can i save the data in scrapinghub,and download it from scrapinghub when job is finished?
The `stockInfo.py` contains:
import scrapy
import re
import pkgutil
class QuotesSpider(scrapy.Spider):
name = "stockInfo"
data = pkgutil.get_data("tutorial", "resources/urls.txt")
data = data.decode()
start_urls = data.split("\r\n")
def parse(self, response):
company = re.findall("[0-9]{6}",response.url)[0]
filename = '%s_info.html' % company
with open(filename, 'wb') as f:
f.write(response.body)
To execute the spider `stockInfo` in window's cmd.
d:
cd tutorial
scrapy crawl stockInfo
Now all webpage of the url in `resources/urls.txt` will downloaded on the local pc's directory `d:/tutorial`.
Then to deploy the spider into `Scrapinghub`,and run `stockInfo spider`.
No error occur,where is the downloaded webpage?
How the following command lines executed in `Scrapinghub`?
with open(filename, 'wb') as f:
f.write(response.body)
How can i save the data in scrapinghub,and download it from scrapinghub when job is finished?
0 Votes
thriveni posted over 5 years ago Admin Best Answer
There's no write access to Scrapy Cloud. You do have access to /scrapinghub and /tmp folders, but it gets cleared after job run. Instead, you'll need to use the alternative supported file storage provided by Files pipeline, S3 or GCS using Feed Export as given in https://docs.scrapy.org/en/latest/topics/feed-exports.html#storages and https://docs.scrapy.org/en/latest/topics/media-pipeline.html?highlight=gcs#supported-storage.
0 Votes
1 Comments
thriveni posted over 5 years ago Admin Answer
There's no write access to Scrapy Cloud. You do have access to /scrapinghub and /tmp folders, but it gets cleared after job run. Instead, you'll need to use the alternative supported file storage provided by Files pipeline, S3 or GCS using Feed Export as given in https://docs.scrapy.org/en/latest/topics/feed-exports.html#storages and https://docs.scrapy.org/en/latest/topics/media-pipeline.html?highlight=gcs#supported-storage.
0 Votes
Login to post a comment