Start a new topic

Items not returned in scrapy cloud

Hi all,


Sorry if this is a silly question. 


I'm not getting any items returned when I run the spider on scrapy cloud. It works fine on my local machine.


spider.py file

import scrapy
from ..items import AmazonItem

class AmazonSpiderSpider(scrapy.Spider):
    name = 'amazon_reviews'
    domain = 'https://amazon.com.au/'
    asin = 'B07KSCJNPB'
    product = 'LOr-Espresso-Coffee-Lungo-Estremo'
    start_urls = [domain+product+'/product-reviews/'+asin]

    def parse(self, response):
        items = AmazonItem()
        reviews = response.xpath("//div[@class='a-section review aok-relative']")
        review_asin = self.asin
        review_product = self.product

        for review in reviews:
            review_rating = review.css('.review-rating').css('::text').extract()
            review_title = review.css('.a-text-bold span').css('::text').extract()
            review_date = review.css('.review-date').css('::text').extract()
            review_text = review.css('.review-text-content span').css('::text').extract()

            items['review_rating'] = review_rating
            items['review_title'] = review_title
            items['review_date'] = review_date
            items['review_text'] = review_text
            items['review_asin'] = review_asin
            items['review_product'] = review_product

            yield items

            next_page = response.css('li.a-last a::attr(href)').get()

            if next_page is not None:
                yield response.follow(next_page, callback=self.parse)

scrapinghub.yml


requirements_file: requirements.txt
stacks:
  default: scrapy:1.7-py3

  

setting.py


# See also autothrottle settings and docs
DOWNLOAD_DELAY = 3

# Disable cookies (enabled by default)
COOKIES_ENABLED = False

DOWNLOADER_MIDDLEWARES = {
    'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': None,
    'scrapy_user_agents.middlewares.RandomUserAgentMiddleware': 400,
}

# Enable and configure the AutoThrottle extension (disabled by default)
# See https://docs.scrapy.org/en/latest/topics/autothrottle.html
AUTOTHROTTLE_ENABLED = True

I have tried enabling and disabling the auto-throttle. It returned items the first time I disabled however has stopped retuning items since. 


The log file is

2020-04-30 00:50:18 INFO Log opened.
2020-04-30 00:50:18 INFO [scrapy.utils.log] Scrapy 1.7.4 started (bot: amazon)
2020-04-30 00:50:18 INFO [scrapy.utils.log] Versions: lxml 4.4.1.0, libxml2 2.9.9, cssselect 1.1.0, parsel 1.5.2, w3lib 1.21.0, Twisted 19.7.0, Python 3.7.3 (default, Mar 27 2019, 23:40:30) - [GCC 6.3.0 20170516], pyOpenSSL 19.0.0 (OpenSSL 1.1.1c  28 May 2019), cryptography 2.7, Platform Linux-4.15.0-65-generic-x86_64-with-debian-9.8
2020-04-30 00:50:18 INFO [scrapy.crawler] Overridden settings: {'AUTOTHROTTLE_ENABLED': True, 'BOT_NAME': 'amazon', 'COOKIES_ENABLED': False, 'DOWNLOAD_DELAY': 3, 'LOG_ENABLED': False, 'LOG_LEVEL': 'INFO', 'MEMUSAGE_LIMIT_MB': 950, 'NEWSPIDER_MODULE': 'amazon.spiders', 'ROBOTSTXT_OBEY': True, 'SPIDER_MODULES': ['amazon.spiders'], 'STATS_CLASS': 'sh_scrapy.stats.HubStorageStatsCollector', 'TELNETCONSOLE_HOST': '0.0.0.0'}
2020-04-30 00:50:18 INFO [scrapy.extensions.telnet] Telnet Password: 4e6bdc1b1882c95d
2020-04-30 00:50:19 INFO [scrapy.middleware] Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.memusage.MemoryUsage',
 'scrapy.extensions.logstats.LogStats',
 'scrapy.extensions.spiderstate.SpiderState',
 'scrapy.extensions.throttle.AutoThrottle',
 'scrapy.extensions.debug.StackTraceDump',
 'sh_scrapy.extension.HubstorageExtension']
2020-04-30 00:50:20 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedBrowserType] Family: Android
2020-04-30 00:50:21 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedDeviceType] Family: Other, Brand: None, Model: None
2020-04-30 00:50:21 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedDeviceType] Family: Other, Brand: None, Model: None
2020-04-30 00:50:21 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedDeviceType] Family: Other, Brand: None, Model: None
2020-04-30 00:50:22 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedDeviceType] Family: Sony BDV13, Brand: Sony, Model: BDV13
2020-04-30 00:50:22 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedDeviceType] Family: Nintendo DSi, Brand: Nintendo, Model: DSi
2020-04-30 00:50:22 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedDeviceType] Family: Other, Brand: None, Model: None
2020-04-30 00:50:22 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedBrowserType] Family: Other
2020-04-30 00:50:22 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedDeviceType] Family: LYF F90M, Brand: LYF, Model: F90M
2020-04-30 00:50:22 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedDeviceType] Family: Sony BDV14, Brand: Sony, Model: BDV14
2020-04-30 00:50:23 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedDeviceType] Family: LG Web0S SmartTV, Brand: LG, Model: Web0S SmartTV
2020-04-30 00:50:23 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedDeviceType] Family: Other, Brand: None, Model: None
2020-04-30 00:50:23 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedDeviceType] Family: Other, Brand: None, Model: None
2020-04-30 00:50:23 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedDeviceType] Family: Other, Brand: None, Model: None
2020-04-30 00:50:24 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedDeviceType] Family: Other, Brand: None, Model: None
2020-04-30 00:50:24 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedDeviceType] Family: Other, Brand: None, Model: None
2020-04-30 00:50:24 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedDeviceType] Family: Other, Brand: None, Model: None
2020-04-30 00:50:24 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedDeviceType] Family: Other, Brand: None, Model: None
2020-04-30 00:50:24 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedDeviceType] Family: Sony BDV11, Brand: Sony, Model: BDV11
2020-04-30 00:50:24 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedDeviceType] Family: Other, Brand: None, Model: None
2020-04-30 00:50:24 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedDeviceType] Family: Other, Brand: None, Model: None
2020-04-30 00:50:24 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedDeviceType] Family: Other, Brand: None, Model: None
2020-04-30 00:50:24 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedDeviceType] Family: Other, Brand: None, Model: None
2020-04-30 00:50:25 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedDeviceType] Family: Nintendo DSi, Brand: Nintendo, Model: DSi
2020-04-30 00:50:25 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedDeviceType] Family: LYF LF-2403N, Brand: LYF, Model: LF-2403N
2020-04-30 00:50:25 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedDeviceType] Family: LYF F90M, Brand: LYF, Model: F90M
2020-04-30 00:50:26 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedDeviceType] Family: Other, Brand: None, Model: None
2020-04-30 00:50:26 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedDeviceType] Family: Other, Brand: None, Model: None
2020-04-30 00:50:26 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedBrowserType] Family: Applebot
2020-04-30 00:50:26 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedDeviceType] Family: Other, Brand: None, Model: None
2020-04-30 00:50:27 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedBrowserType] Family: Applebot
2020-04-30 00:50:27 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedBrowserType] Family: Other
2020-04-30 00:50:27 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedBrowserType] Family: Other
2020-04-30 00:50:28 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedBrowserType] Family: PhantomJS
2020-04-30 00:50:28 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedBrowserType] Family: SMTBot
2020-04-30 00:50:28 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedBrowserType] Family: PhantomJS
2020-04-30 00:50:28 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedBrowserType] Family: PhantomJS
2020-04-30 00:50:28 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedBrowserType] Family: WebKit Nightly
2020-04-30 00:50:29 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedBrowserType] Family: PhantomJS
2020-04-30 00:50:29 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedDeviceType] Family: Other, Brand: None, Model: None
2020-04-30 00:50:29 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedBrowserType] Family: PhantomJS
2020-04-30 00:50:29 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedBrowserType] Family: PhantomJS
2020-04-30 00:50:29 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedBrowserType] Family: Robot
2020-04-30 00:50:29 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedDeviceType] Family: Other, Brand: None, Model: None
2020-04-30 00:50:30 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedBrowserType] Family: Zune
2020-04-30 00:50:30 INFO [scrapy.middleware] Enabled downloader middlewares:
['sh_scrapy.diskquota.DiskQuotaDownloaderMiddleware',
 'scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware',
 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy_user_agents.middlewares.RandomUserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats',
 'sh_scrapy.middlewares.HubstorageDownloaderMiddleware']
2020-04-30 00:50:30 INFO [scrapy.middleware] Enabled spider middlewares:
['sh_scrapy.diskquota.DiskQuotaSpiderMiddleware',
 'sh_scrapy.middlewares.HubstorageSpiderMiddleware',
 'scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2020-04-30 00:50:30 INFO [scrapy.middleware] Enabled item pipelines:
[]
2020-04-30 00:50:30 INFO [scrapy.core.engine] Spider opened
2020-04-30 00:50:30 INFO [scrapy.extensions.logstats] Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2020-04-30 00:50:30 INFO TelnetConsole starting on 6023
2020-04-30 00:50:30 INFO [scrapy.extensions.telnet] Telnet console listening on 0.0.0.0:6023
2020-04-30 00:50:44 INFO [scrapy.core.engine] Closing spider (finished)
2020-04-30 00:50:44 INFO [scrapy.statscollectors] Dumping Scrapy stats:
{'downloader/request_bytes': 1594,
 'downloader/request_count': 5,
 'downloader/request_method_count/GET': 5,
 'downloader/response_bytes': 4465,
 'downloader/response_count': 5,
 'downloader/response_status_count/200': 3,
 'downloader/response_status_count/301': 2,
 'elapsed_time_seconds': 14.140176,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2020, 4, 30, 0, 50, 44, 478885),
 'log_count/INFO': 10,
 'log_count/WARNING': 45,
 'memusage/max': 61747200,
 'memusage/startup': 61747200,
 'response_received_count': 3,
 'robotstxt/request_count': 2,
 'robotstxt/response_count': 2,
 'robotstxt/response_status_count/200': 2,
 'scheduler/dequeued': 2,
 'scheduler/dequeued/disk': 2,
 'scheduler/enqueued': 2,
 'scheduler/enqueued/disk': 2,
 'start_time': datetime.datetime(2020, 4, 30, 0, 50, 30, 338709)}
2020-04-30 00:50:44 INFO [scrapy.core.engine] Spider closed (finished)
2020-04-30 00:50:44 INFO (TCP Port 6023 Closed)
2020-04-30 00:50:44 INFO Main loop terminated.

 


I welcome any thoughts and help from the community!


Thanks,

Adi


Login to post a comment