Scrapinghub doesn't remove duplicated data automatically. I would suggest you try out DeltaFetch to avoid crawling items that were crawled in previous jobs: https://support.scrapinghub.com/support/solutions/articles/22000221912-incremental-crawls-with-scrapy-and-deltafetch-in-scrapy-cloud
xiaobie
Scrapinghub doesn't remove duplicated data automatically. I would suggest you try out DeltaFetch to avoid crawling items that were crawled in previous jobs: https://support.scrapinghub.com/support/solutions/articles/22000221912-incremental-crawls-with-scrapy-and-deltafetch-in-scrapy-cloud
- Oldest First
- Popular
- Newest First
Sorted by Oldest Firstnestor
Scrapinghub doesn't remove duplicated data automatically. I would suggest you try out DeltaFetch to avoid crawling items that were crawled in previous jobs: https://support.scrapinghub.com/support/solutions/articles/22000221912-incremental-crawls-with-scrapy-and-deltafetch-in-scrapy-cloud
xiaobie
-
Unable to select Scrapy project in GitHub
-
ScrapyCloud can't call spider?
-
Unhandled error in Deferred
-
Item API - Filtering
-
newbie to web scraping but need data from zillow
-
ValueError: Invalid control character
-
Cancelling account
-
Best Practices
-
Beautifulsoup with ScrapingHub
-
Delete a project in ScrapingHub
See all 446 topics