Understanding Job Outcomes

Modified on Fri, 12 Feb, 2021 at 2:00 AM

The job outcome indicates whether the job succeeded or failed. By default, it contains the value of the spider close reason from Scrapy. It’s available in the table of completed jobs:

Available job outcomes

Here is a summary of the most common job outcomes. Click on the name for more details:

finished

The job finished successfully. However, it may have produced errors, which you can inspect through the logs.

failed

The job failed to start, typically due to a bug in the spider’s code. Check the last lines of the job log for more information.

cancelled

The job was cancelled from the dashboard, the API or by the system if it got inactive and failed to produce anything (not even log entries) for an hour.

cancelled (24h limit)

The job was cancelled because it exceeded the 24 hours time limit imposed on free organizations. The number may be different - e.g. if limit will change in the future.

cancelled (stalled)

The job was cancelled because it was in running state but wasn't producing any logs, requests or items for 1 hour.

cancel_timeout

The job has failed to shutdown gracefully after cancellation (taking more than 5 minutes).

shutdown

The spider was cancelled prematurely, typically from code. shutdown is the default close reason (outcome) used by Scrapy for such cases. It is what you get, for example, when you cancel a Scrapy spider pressing Ctrl-C.

memusage_exceeded

The job was consuming too much memory, exceeding the limit (1Gb for each unit), and it was cancelled by the system. This typically happens with spiders that don’t use memory efficiently (keeping state or references that grow quickly over time) and it’s most often manifested on long spider runs of many pages. This outcome is triggered by Scrapy’s Memory Usage Extension.

killed by oom

The job was killed because it tried to consume more memory than it was available to the process. This may happen if Scrapy’s Memory Usage Extension is disabled or when memory usage is growing so fast that Scrapy’s Memory Usage Extension cannot gracefully finish the process and set memusage_exceeded outcome. Available memory is proportional to the number of units used to run the job.

banned

The job was terminated because the spider got banned from the target website. This outcome is often set by the Zyte Smart Proxy Manager(formerly Crawlera) extension.

slybot_fewitems_scraped

This outcome is specific to Portia spiders. The job was cancelled because it wasn’t scraping enough items. This is used in portia to prevent infinite crawling loops. See Minimum items threshold for more details.

Note: Portia is no longer available for new users. It has been disabled for all the new organisations from August 20, 2018 onward.

closespider_*

The closespider_errorcount, closespider_pagecount, closespider_timeout and closespider_itemcount are set by the CloseSpider Scrapy extension. Refer to its documentation for more details.

project_deleted

The job was killed because project was deleted from Scrapy Cloud.

Deprecated job outcomes

The following outcomes were used in Scrapy Cloud but have been removed and should no longer be set.

no_reason

The job finished successfully but did not set an outcome explicitly. For Scrapy jobs the outcome is taken from the spider close reason (which defaults to finished) but on non-Scrapy jobs this is not the case and jobs will get this outcome.