So we can see that item A is gone and items Z,T found in run N+1
Is there a way that I can calculate the items difference between jobs N and N+1 ?
Best Answer
v
vaz
said
almost 7 years ago
Hey Balder-man,
I think it can be done through these steps:
1. Using Delta Fetch Addon to avoid repeated items between N and N+1 spider
2. The difference now will be just items collected in (N+1) for any N from 1 to inf. Because every item collected in N+1 is not present in the previous one.
1. Using Delta Fetch Addon to avoid repeated items between N and N+1 spider
2. The difference now will be just items collected in (N+1) for any N from 1 to inf. Because every item collected in N+1 is not present in the previous one.
Avishay Balderman
I have a periodic job that collects items.
Lets assume that Job number N found the items:
A,B,C,D
Lets assume that Job number N+1 found the items:
B,C,D,Z,T
So we can see that item A is gone and items Z,T found in run N+1
Is there a way that I can calculate the items difference between jobs N and N+1 ?
Hey Balder-man,
I think it can be done through these steps:
1. Using Delta Fetch Addon to avoid repeated items between N and N+1 spider
2. The difference now will be just items collected in (N+1) for any N from 1 to inf. Because every item collected in N+1 is not present in the previous one.
Best regards,
Pablo
- Oldest First
- Popular
- Newest First
Sorted by Oldest Firstvaz
Hey Balder-man,
I think it can be done through these steps:
1. Using Delta Fetch Addon to avoid repeated items between N and N+1 spider
2. The difference now will be just items collected in (N+1) for any N from 1 to inf. Because every item collected in N+1 is not present in the previous one.
Best regards,
Pablo
Avishay Balderman
Thanks
I will check this addon
Avishay Balderman
Hi
I was reading https://blog.scrapinghub.com/2016/07/20/scrapy-tips-from-the-pros-july-2016/ and I am not sure it can work for me.
I want the spider to do all requests in run N but drop all items that were found in run N-1.
In my example above - only Z,T should be valid items since they are "new"
How can I do that?
-
Unable to select Scrapy project in GitHub
-
ScrapyCloud can't call spider?
-
Unhandled error in Deferred
-
Item API - Filtering
-
newbie to web scraping but need data from zillow
-
ValueError: Invalid control character
-
Cancelling account
-
Best Practices
-
Beautifulsoup with ScrapingHub
-
Delete a project in ScrapingHub
See all 458 topics