Wait for scraped item

Posted about 7 years ago by gauravinvolvesoft

Post a topic

Un Answered

gauravinvolvesoft

Hi,

I am using scrapinghub python library. I ran spider using spider.jobs.run() and then I used job.items.iter() in the python script. The problem is when python request to run the spider, it is not waiting for its completion because of this it is giving me the blank list as a output.

So, please suggest how the request will wait to complete its scraping and then gives result. Is there any method which will wait for the job to complete its scraping and provide the scraped item as output?

Thanks

0 Votes

4 Comments

jwaterschoot posted about 7 years ago

When do you call spiders.job.run()? Are you using a pipeline so you can use the `close_spider` function? In the pipeline you could create an empty item list when you start the spider, append the items and iterate through them in `close_spider`?

0 Votes

gauravinvolvesoft posted about 7 years ago

Okay. I have already done this, when I was using scrapy. But how will I do this with scrapinghub library?
In case of scraping hub:

from scrapinghub import ScrapinghubClient
client = ScrapinghubClient('API KEY')
project = client.get_project(PROJECT_ID)
spider = project.spiders.get(spider_name)
job = spider.jobs.run(site_id=site_id, url=url, scraper_id=scraper_id)
item_list = list()

for item in job.items.iter():

if 'opportunities' in item:

item_list.append(item['opportunities'])

print item_list

Now, when I run this, It shows empty list. The request is not waiting for the job to complete its process.
How will I get the result?

0 Votes

gauravinvolvesoft posted about 7 years ago

Please send the suggestion as soon as possible.

0 Votes

gauravinvolvesoft posted about 7 years ago

Hi Scrapinghub Team,

Could you please provide you technical support contact number. It will help me a lot.

Thanks

0 Votes