# See also autothrottle settings and docs
DOWNLOAD_DELAY = 3
# Disable cookies (enabled by default)
COOKIES_ENABLED = False
DOWNLOADER_MIDDLEWARES = {
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': None,
'scrapy_user_agents.middlewares.RandomUserAgentMiddleware': 400,
}
# Enable and configure the AutoThrottle extension (disabled by default)
# See https://docs.scrapy.org/en/latest/topics/autothrottle.html
AUTOTHROTTLE_ENABLED = True
I have tried enabling and disabling the auto-throttle. It returned items the first time I disabled however has stopped retuning items since.
Hi all,
Sorry if this is a silly question.
I'm not getting any items returned when I run the spider on scrapy cloud. It works fine on my local machine.
spider.py file
import scrapy from ..items import AmazonItem class AmazonSpiderSpider(scrapy.Spider): name = 'amazon_reviews' domain = 'https://amazon.com.au/' asin = 'B07KSCJNPB' product = 'LOr-Espresso-Coffee-Lungo-Estremo' start_urls = [domain+product+'/product-reviews/'+asin] def parse(self, response): items = AmazonItem() reviews = response.xpath("//div[@class='a-section review aok-relative']") review_asin = self.asin review_product = self.product for review in reviews: review_rating = review.css('.review-rating').css('::text').extract() review_title = review.css('.a-text-bold span').css('::text').extract() review_date = review.css('.review-date').css('::text').extract() review_text = review.css('.review-text-content span').css('::text').extract() items['review_rating'] = review_rating items['review_title'] = review_title items['review_date'] = review_date items['review_text'] = review_text items['review_asin'] = review_asin items['review_product'] = review_product yield items next_page = response.css('li.a-last a::attr(href)').get() if next_page is not None: yield response.follow(next_page, callback=self.parse)scrapinghub.yml
setting.py
# See also autothrottle settings and docs DOWNLOAD_DELAY = 3 # Disable cookies (enabled by default) COOKIES_ENABLED = False DOWNLOADER_MIDDLEWARES = { 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': None, 'scrapy_user_agents.middlewares.RandomUserAgentMiddleware': 400, } # Enable and configure the AutoThrottle extension (disabled by default) # See https://docs.scrapy.org/en/latest/topics/autothrottle.html AUTOTHROTTLE_ENABLED = TrueI have tried enabling and disabling the auto-throttle. It returned items the first time I disabled however has stopped retuning items since.
The log file is
2020-04-30 00:50:18 INFO Log opened. 2020-04-30 00:50:18 INFO [scrapy.utils.log] Scrapy 1.7.4 started (bot: amazon) 2020-04-30 00:50:18 INFO [scrapy.utils.log] Versions: lxml 4.4.1.0, libxml2 2.9.9, cssselect 1.1.0, parsel 1.5.2, w3lib 1.21.0, Twisted 19.7.0, Python 3.7.3 (default, Mar 27 2019, 23:40:30) - [GCC 6.3.0 20170516], pyOpenSSL 19.0.0 (OpenSSL 1.1.1c 28 May 2019), cryptography 2.7, Platform Linux-4.15.0-65-generic-x86_64-with-debian-9.8 2020-04-30 00:50:18 INFO [scrapy.crawler] Overridden settings: {'AUTOTHROTTLE_ENABLED': True, 'BOT_NAME': 'amazon', 'COOKIES_ENABLED': False, 'DOWNLOAD_DELAY': 3, 'LOG_ENABLED': False, 'LOG_LEVEL': 'INFO', 'MEMUSAGE_LIMIT_MB': 950, 'NEWSPIDER_MODULE': 'amazon.spiders', 'ROBOTSTXT_OBEY': True, 'SPIDER_MODULES': ['amazon.spiders'], 'STATS_CLASS': 'sh_scrapy.stats.HubStorageStatsCollector', 'TELNETCONSOLE_HOST': '0.0.0.0'} 2020-04-30 00:50:18 INFO [scrapy.extensions.telnet] Telnet Password: 4e6bdc1b1882c95d 2020-04-30 00:50:19 INFO [scrapy.middleware] Enabled extensions: ['scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.telnet.TelnetConsole', 'scrapy.extensions.memusage.MemoryUsage', 'scrapy.extensions.logstats.LogStats', 'scrapy.extensions.spiderstate.SpiderState', 'scrapy.extensions.throttle.AutoThrottle', 'scrapy.extensions.debug.StackTraceDump', 'sh_scrapy.extension.HubstorageExtension'] 2020-04-30 00:50:20 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedBrowserType] Family: Android 2020-04-30 00:50:21 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedDeviceType] Family: Other, Brand: None, Model: None 2020-04-30 00:50:21 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedDeviceType] Family: Other, Brand: None, Model: None 2020-04-30 00:50:21 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedDeviceType] Family: Other, Brand: None, Model: None 2020-04-30 00:50:22 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedDeviceType] Family: Sony BDV13, Brand: Sony, Model: BDV13 2020-04-30 00:50:22 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedDeviceType] Family: Nintendo DSi, Brand: Nintendo, Model: DSi 2020-04-30 00:50:22 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedDeviceType] Family: Other, Brand: None, Model: None 2020-04-30 00:50:22 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedBrowserType] Family: Other 2020-04-30 00:50:22 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedDeviceType] Family: LYF F90M, Brand: LYF, Model: F90M 2020-04-30 00:50:22 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedDeviceType] Family: Sony BDV14, Brand: Sony, Model: BDV14 2020-04-30 00:50:23 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedDeviceType] Family: LG Web0S SmartTV, Brand: LG, Model: Web0S SmartTV 2020-04-30 00:50:23 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedDeviceType] Family: Other, Brand: None, Model: None 2020-04-30 00:50:23 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedDeviceType] Family: Other, Brand: None, Model: None 2020-04-30 00:50:23 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedDeviceType] Family: Other, Brand: None, Model: None 2020-04-30 00:50:24 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedDeviceType] Family: Other, Brand: None, Model: None 2020-04-30 00:50:24 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedDeviceType] Family: Other, Brand: None, Model: None 2020-04-30 00:50:24 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedDeviceType] Family: Other, Brand: None, Model: None 2020-04-30 00:50:24 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedDeviceType] Family: Other, Brand: None, Model: None 2020-04-30 00:50:24 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedDeviceType] Family: Sony BDV11, Brand: Sony, Model: BDV11 2020-04-30 00:50:24 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedDeviceType] Family: Other, Brand: None, Model: None 2020-04-30 00:50:24 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedDeviceType] Family: Other, Brand: None, Model: None 2020-04-30 00:50:24 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedDeviceType] Family: Other, Brand: None, Model: None 2020-04-30 00:50:24 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedDeviceType] Family: Other, Brand: None, Model: None 2020-04-30 00:50:25 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedDeviceType] Family: Nintendo DSi, Brand: Nintendo, Model: DSi 2020-04-30 00:50:25 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedDeviceType] Family: LYF LF-2403N, Brand: LYF, Model: LF-2403N 2020-04-30 00:50:25 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedDeviceType] Family: LYF F90M, Brand: LYF, Model: F90M 2020-04-30 00:50:26 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedDeviceType] Family: Other, Brand: None, Model: None 2020-04-30 00:50:26 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedDeviceType] Family: Other, Brand: None, Model: None 2020-04-30 00:50:26 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedBrowserType] Family: Applebot 2020-04-30 00:50:26 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedDeviceType] Family: Other, Brand: None, Model: None 2020-04-30 00:50:27 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedBrowserType] Family: Applebot 2020-04-30 00:50:27 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedBrowserType] Family: Other 2020-04-30 00:50:27 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedBrowserType] Family: Other 2020-04-30 00:50:28 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedBrowserType] Family: PhantomJS 2020-04-30 00:50:28 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedBrowserType] Family: SMTBot 2020-04-30 00:50:28 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedBrowserType] Family: PhantomJS 2020-04-30 00:50:28 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedBrowserType] Family: PhantomJS 2020-04-30 00:50:28 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedBrowserType] Family: WebKit Nightly 2020-04-30 00:50:29 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedBrowserType] Family: PhantomJS 2020-04-30 00:50:29 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedDeviceType] Family: Other, Brand: None, Model: None 2020-04-30 00:50:29 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedBrowserType] Family: PhantomJS 2020-04-30 00:50:29 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedBrowserType] Family: PhantomJS 2020-04-30 00:50:29 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedBrowserType] Family: Robot 2020-04-30 00:50:29 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedDeviceType] Family: Other, Brand: None, Model: None 2020-04-30 00:50:30 WARNING [scrapy_user_agents.user_agent_picker] [UnsupportedBrowserType] Family: Zune 2020-04-30 00:50:30 INFO [scrapy.middleware] Enabled downloader middlewares: ['sh_scrapy.diskquota.DiskQuotaDownloaderMiddleware', 'scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware', 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', 'scrapy_user_agents.middlewares.RandomUserAgentMiddleware', 'scrapy.downloadermiddlewares.retry.RetryMiddleware', 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', 'scrapy.downloadermiddlewares.stats.DownloaderStats', 'sh_scrapy.middlewares.HubstorageDownloaderMiddleware'] 2020-04-30 00:50:30 INFO [scrapy.middleware] Enabled spider middlewares: ['sh_scrapy.diskquota.DiskQuotaSpiderMiddleware', 'sh_scrapy.middlewares.HubstorageSpiderMiddleware', 'scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', 'scrapy.spidermiddlewares.referer.RefererMiddleware', 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', 'scrapy.spidermiddlewares.depth.DepthMiddleware'] 2020-04-30 00:50:30 INFO [scrapy.middleware] Enabled item pipelines: [] 2020-04-30 00:50:30 INFO [scrapy.core.engine] Spider opened 2020-04-30 00:50:30 INFO [scrapy.extensions.logstats] Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2020-04-30 00:50:30 INFO TelnetConsole starting on 6023 2020-04-30 00:50:30 INFO [scrapy.extensions.telnet] Telnet console listening on 0.0.0.0:6023 2020-04-30 00:50:44 INFO [scrapy.core.engine] Closing spider (finished) 2020-04-30 00:50:44 INFO [scrapy.statscollectors] Dumping Scrapy stats: {'downloader/request_bytes': 1594, 'downloader/request_count': 5, 'downloader/request_method_count/GET': 5, 'downloader/response_bytes': 4465, 'downloader/response_count': 5, 'downloader/response_status_count/200': 3, 'downloader/response_status_count/301': 2, 'elapsed_time_seconds': 14.140176, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2020, 4, 30, 0, 50, 44, 478885), 'log_count/INFO': 10, 'log_count/WARNING': 45, 'memusage/max': 61747200, 'memusage/startup': 61747200, 'response_received_count': 3, 'robotstxt/request_count': 2, 'robotstxt/response_count': 2, 'robotstxt/response_status_count/200': 2, 'scheduler/dequeued': 2, 'scheduler/dequeued/disk': 2, 'scheduler/enqueued': 2, 'scheduler/enqueued/disk': 2, 'start_time': datetime.datetime(2020, 4, 30, 0, 50, 30, 338709)} 2020-04-30 00:50:44 INFO [scrapy.core.engine] Spider closed (finished) 2020-04-30 00:50:44 INFO (TCP Port 6023 Closed) 2020-04-30 00:50:44 INFO Main loop terminated.I welcome any thoughts and help from the community!
Thanks,
Adi
0 Votes
0 Comments
Login to post a comment