How to launch a large-scale web scraping project? Find out how LexisNexis did it. Join the webinar on 29th March.Register now
Start a new topic
Answered

periodic jobs and duplicate data

Hello. Could I ask a question about periodic jobs of Scrapinghub? Will the scraped duplicated data be automatically removed by Scrapinghub?

Best Answer

Scrapinghub doesn't remove duplicated data automatically. I would suggest you try out DeltaFetch to avoid crawling items that were crawled in previous jobs:  https://support.scrapinghub.com/support/solutions/articles/22000221912-incremental-crawls-with-scrapy-and-deltafetch-in-scrapy-cloud 


Answer

Scrapinghub doesn't remove duplicated data automatically. I would suggest you try out DeltaFetch to avoid crawling items that were crawled in previous jobs:  https://support.scrapinghub.com/support/solutions/articles/22000221912-incremental-crawls-with-scrapy-and-deltafetch-in-scrapy-cloud 

Thanks so much.
Login to post a comment