I am using scrapinghub python library. I ran spider using spider.jobs.run() and then I used job.items.iter() in the python script. The problem is when python request to run the spider, it is not waiting for its completion because of this it is giving me the blank list as a output.
So, please suggest how the request will wait to complete its scraping and then gives result. Is there any method which will wait for the job to complete its scraping and provide the scraped item as output?
Thanks
0 Votes
4 Comments
Sorted by
g
gauravinvolvesoftposted
over 6 years ago
Hi Scrapinghub Team,
Could you please provide you technical support contact number. It will help me a lot.
Thanks
0 Votes
g
gauravinvolvesoftposted
over 6 years ago
Please send the suggestion as soon as possible.
0 Votes
g
gauravinvolvesoftposted
over 6 years ago
Okay. I have already done this, when I was using scrapy. But how will I do this with scrapinghub library? In case of scraping hub:
Now, when I run this, It shows empty list. The request is not waiting for the job to complete its process. How will I get the result?
0 Votes
j
jwaterschootposted
over 6 years ago
When do you call spiders.job.run()? Are you using a pipeline so you can use the `close_spider` function? In the pipeline you could create an empty item list when you start the spider, append the items and iterate through them in `close_spider`?
Hi,
I am using scrapinghub python library. I ran spider using spider.jobs.run() and then I used job.items.iter() in the python script. The problem is when python request to run the spider, it is not waiting for its completion because of this it is giving me the blank list as a output.
So, please suggest how the request will wait to complete its scraping and then gives result. Is there any method which will wait for the job to complete its scraping and provide the scraped item as output?
Thanks
0 Votes
4 Comments
gauravinvolvesoft posted over 6 years ago
Hi Scrapinghub Team,
Could you please provide you technical support contact number. It will help me a lot.
Thanks
0 Votes
gauravinvolvesoft posted over 6 years ago
Please send the suggestion as soon as possible.
0 Votes
gauravinvolvesoft posted over 6 years ago
Okay. I have already done this, when I was using scrapy. But how will I do this with scrapinghub library?
In case of scraping hub:
from scrapinghub import ScrapinghubClient
client = ScrapinghubClient('API KEY')
project = client.get_project(PROJECT_ID)
spider = project.spiders.get(spider_name)
job = spider.jobs.run(site_id=site_id, url=url, scraper_id=scraper_id)
item_list = list()
for item in job.items.iter():
if 'opportunities' in item:
item_list.append(item['opportunities'])
print item_list
Now, when I run this, It shows empty list. The request is not waiting for the job to complete its process.
How will I get the result?
0 Votes
jwaterschoot posted over 6 years ago
When do you call spiders.job.run()? Are you using a pipeline so you can use the `close_spider` function? In the pipeline you could create an empty item list when you start the spider, append the items and iterate through them in `close_spider`?
0 Votes
Login to post a comment