periodic jobs and duplicate data

Posted almost 5 years ago by xiaobie

Post a topic
Answered
x
xiaobie

Hello. Could I ask a question about periodic jobs of Scrapinghub? Will the scraped duplicated data be automatically removed by Scrapinghub?

0 Votes

nestor

nestor posted almost 5 years ago Admin Best Answer

Scrapinghub doesn't remove duplicated data automatically. I would suggest you try out DeltaFetch to avoid crawling items that were crawled in previous jobs:  https://support.scrapinghub.com/support/solutions/articles/22000221912-incremental-crawls-with-scrapy-and-deltafetch-in-scrapy-cloud 

0 Votes


2 Comments

Sorted by
x

xiaobie posted almost 5 years ago

Thanks so much.

0 Votes

nestor

nestor posted almost 5 years ago Admin Answer

Scrapinghub doesn't remove duplicated data automatically. I would suggest you try out DeltaFetch to avoid crawling items that were crawled in previous jobs:  https://support.scrapinghub.com/support/solutions/articles/22000221912-incremental-crawls-with-scrapy-and-deltafetch-in-scrapy-cloud 

0 Votes

Login to post a comment