Cancelled (stalled) Job outcome because of scrapy_dotpersistence syncing over an hour

Posted almost 7 years ago by chops

Post a topic
Answered
c
chops

My job outcome is cancelled (stalled) repeatedly after the scraping is over and the scrapy_dotpersitence addon stores the .scrapy directory to S3:


[scrapy_dotpersistence] Syncing .scrapy directory to s3://scrapinghub-app-dash-addons/org-176226/[...]/dot-scrapy/immo[...]/
1090: 	2017-12-26 17:50:02 	INFO 	

[scrapy.crawler] Received SIGTERM, shutting down gracefully. Send again to force 

 

 I tried to delete the httpcache folder in the console, but the syncing duration is over an hour and the job is getting canceled anyway.


How can I solve this issue? Can I "reset" the S3 folder directly?

0 Votes

nestor

nestor posted almost 7 years ago Admin Best Answer

Jobs will get cancelled if they're not doing anything for an hour, you could add some log every hour or so, so that the job doesn't get cancelled.

0 Votes


4 Comments

Sorted by
c

chops posted almost 7 years ago

I'm not facing this issue, because I've deleted the old project und created a new one.


Is it possible to insert own S3 Credentials for scrapy_dotpersistence?

 

0 Votes

thriveni

thriveni posted almost 7 years ago Admin

Do let us know if you are still facing the issue? I do not see any jobs getting stalled in the account. 

0 Votes

c

chops posted almost 7 years ago

Thanks for your answer. The spider is closed before the syncing starts. Where can I add this logging?

 

0 Votes

nestor

nestor posted almost 7 years ago Admin Answer

Jobs will get cancelled if they're not doing anything for an hour, you could add some log every hour or so, so that the job doesn't get cancelled.

0 Votes

Login to post a comment