Cancelled (stalled) Job outcome because of scrapy_dotpersistence syncing over an hour

Posted almost 8 years ago by chops

Post a topic

Answered

chops

My job outcome is cancelled (stalled) repeatedly after the scraping is over and the scrapy_dotpersitence addon stores the .scrapy directory to S3:

[scrapy_dotpersistence] Syncing .scrapy directory to s3://scrapinghub-app-dash-addons/org-176226/[...]/dot-scrapy/immo[...]/
1090: 	2017-12-26 17:50:02 	INFO 	

[scrapy.crawler] Received SIGTERM, shutting down gracefully. Send again to force

I tried to delete the httpcache folder in the console, but the syncing duration is over an hour and the job is getting canceled anyway.

How can I solve this issue? Can I "reset" the S3 folder directly?

0 Votes

nestor posted almost 8 years ago Admin Best Answer

Jobs will get cancelled if they're not doing anything for an hour, you could add some log every hour or so, so that the job doesn't get cancelled.

0 Votes

4 Comments

chops posted almost 8 years ago

I'm not facing this issue, because I've deleted the old project und created a new one.

Is it possible to insert own S3 Credentials for scrapy_dotpersistence?

0 Votes

thriveni posted almost 8 years ago Admin

Do let us know if you are still facing the issue? I do not see any jobs getting stalled in the account.

0 Votes

chops posted almost 8 years ago

Thanks for your answer. The spider is closed before the syncing starts. Where can I add this logging?

0 Votes

nestor posted almost 8 years ago Admin Answer

Jobs will get cancelled if they're not doing anything for an hour, you could add some log every hour or so, so that the job doesn't get cancelled.

0 Votes