Start a new topic

Spider takes 30 seconds to start


I've got a spider on Scrapy Cloud that I request to start at 2021-04-10 12:05:45 UTC.

However from my logs I see that the spider doesn't even initiate until ~30 seconds later each time, then runs for another ~15 seconds or so before it starts actually crawling.  For my application, I really need to shorten this delay.

Is there any way to to minimize this delay, especially the time between my start command and for the process to start?


0:2021-04-10 12:06:21INFO

Log opened.

1:2021-04-10 12:06:21INFO

[scrapy.log] Scrapy 1.3.3 started

2:2021-04-10 12:06:21INFO

[scrapy.utils.log] Scrapy 1.3.3 started (bot: stocknews)

3:2021-04-10 12:06:21INFO

[scrapy.utils.log] Overridden settings: {'NEWSPIDER_MODULE': 'stocknews.spiders', 'STATS_CLASS': 'sh_scrapy.stats.HubStorageStatsCollector', 'LOG_LEVEL': 'INFO', 'SPIDER_MODULES': ['stocknews.spiders'], 'AUTOTHROTTLE_ENABLED': True, 'LOG_ENABLED': False, 'MEMUSAGE_LIMIT_MB': 950, 'TELNETCONSOLE_HOST': '', 'BOT_NAME': 'stocknews', 'MEMUSAGE_ENABLED': True}

4:2021-04-10 12:06:21INFO

[scrapy_dotpersistence] Syncing .scrapy directory from s3://scrapinghub-app-dash-addons/org-125731/184920/dot-scrapy/sec/

5:2021-04-10 12:06:34INFO

[scrapy.middleware] Enabled extensions:

6:2021-04-10 12:06:34INFO

[scrapy.middleware] Enabled downloader middlewares:

7:2021-04-10 12:06:34INFO

[scrapy.middleware] Enabled spider middlewares:

8:2021-04-10 12:06:34INFO

[scrapy.middleware] Enabled item pipelines:

9:2021-04-10 12:06:34INFO

[scrapy.core.engine] Spider opened

Login to post a comment