videocamWeb Data Extraction Summit - September 30th, 2021.
Join some of the greatest minds in web scraping to educate, inspire, and innovate.
Register for free!
Start a new topic
Answered

periodic jobs and duplicate data

Hello. Could I ask a question about periodic jobs of Scrapinghub? Will the scraped duplicated data be automatically removed by Scrapinghub?

Best Answer

Scrapinghub doesn't remove duplicated data automatically. I would suggest you try out DeltaFetch to avoid crawling items that were crawled in previous jobs:  https://support.scrapinghub.com/support/solutions/articles/22000221912-incremental-crawls-with-scrapy-and-deltafetch-in-scrapy-cloud 


Answer

Scrapinghub doesn't remove duplicated data automatically. I would suggest you try out DeltaFetch to avoid crawling items that were crawled in previous jobs:  https://support.scrapinghub.com/support/solutions/articles/22000221912-incremental-crawls-with-scrapy-and-deltafetch-in-scrapy-cloud 

Thanks so much.
Login to post a comment