Can I Configure My Periodic Job To Run a Spider Every Minute?

Modified on Wed, 3 Feb, 2021 at 8:36 AM

When a job in Scrapy Cloud is launched, first a container is created and then the job starts. This process of creating a container takes some time, up to a minute or two. If you configure a periodic job to run a spider every minute, you may find that it doesn't start every minute (skipping now and then). The reason being, the job scheduler detects that the "same" job is still running, hence a new one is not scheduled, since the same job cannot be scheduled more than once.

Even if a job's runtime is under a minute, it's not possible to guarantee the periodic job will indeed execute on every minute. We are working on improving the time taken to "spawn" a container, however we can't provide an ETA at this stage.

Here's some things you can try to work around the described limitation:

Launch jobs that run longer than a couple of minutes (recommended)
Schedule a job every 3 minutes (3 minutes might also not be 100% consistent but it will be more reliable than every minute)
Launch jobs with different arguments (the same job can only be scheduled more than once when it has different arguments passed to it)