Why Not All Scrapy Cloud Units Used To Run Jobs?

Modified on Tue, 20 Apr, 2021 at 2:42 PM

Because it may take up to 2 minutes for the spider run to stop completely and the Unit's resources to be released.


This issue is often observed in projects running very fast jobs -- with runtimes around 1 minute. The job may appear as finished, but its resources are actually still being released, i.e. the Unit is still not available.


We are aware of the issue and we have it in our internal roadmap to improve this process.


In the interim, we recommend aiming for longer running jobs (within reasonable limits). For instance, it may be possible to aggregate a number of extraction tasks into a single spider, rather than scheduling a multitude of spiders with one task per each.

Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select at least one of the reasons
CAPTCHA verification is required.

Feedback sent

We appreciate your effort and will try to fix the article