Start a new topic
Answered

Cancelled (stalled) Job outcome because of scrapy_dotpersistence syncing over an hour

My job outcome is cancelled (stalled) repeatedly after the scraping is over and the scrapy_dotpersitence addon stores the .scrapy directory to S3:


[scrapy_dotpersistence] Syncing .scrapy directory to s3://scrapinghub-app-dash-addons/org-176226/[...]/dot-scrapy/immo[...]/
1090: 	2017-12-26 17:50:02 	INFO 	

[scrapy.crawler] Received SIGTERM, shutting down gracefully. Send again to force 

 

 I tried to delete the httpcache folder in the console, but the syncing duration is over an hour and the job is getting canceled anyway.


How can I solve this issue? Can I "reset" the S3 folder directly?


Best Answer

Jobs will get cancelled if they're not doing anything for an hour, you could add some log every hour or so, so that the job doesn't get cancelled.


Answer

Jobs will get cancelled if they're not doing anything for an hour, you could add some log every hour or so, so that the job doesn't get cancelled.

Thanks for your answer. The spider is closed before the syncing starts. Where can I add this logging?

 

Do let us know if you are still facing the issue? I do not see any jobs getting stalled in the account. 

I'm not facing this issue, because I've deleted the old project und created a new one.


Is it possible to insert own S3 Credentials for scrapy_dotpersistence?

 

Login to post a comment