Start a new topic

Wait for scraped item

Hi,

I am using scrapinghub python library. I ran spider using spider.jobs.run() and then I used job.items.iter() in the python script. The problem is when python request to run the spider, it is not waiting for its completion because of this it is giving me the blank list as a output.

So, please suggest how the request will wait to complete its scraping and then gives result. Is there any method which will wait for the job to complete its scraping and provide the scraped item as output?


Thanks


When do you call spiders.job.run()? Are you using a pipeline so you can use the `close_spider` function? In the pipeline you could create an empty item list when you start the spider, append the items and iterate through them in `close_spider`?

Okay. I have already done this, when I was using scrapy. But how will I do this with scrapinghub library?
In case of scraping hub:

from scrapinghub import ScrapinghubClient
client = ScrapinghubClient('API KEY')
project = client.get_project(PROJECT_ID)
spider = project.spiders.get(spider_name)
job = spider.jobs.run(site_id=site_id, url=url, scraper_id=scraper_id)
item_list = list()

            for item in job.items.iter():

                if 'opportunities' in item:

                    item_list.append(item['opportunities'])

            print item_list

Now, when I run this, It shows empty list. The request is not waiting for the job to complete its process.
How will I get the result?

Please send the suggestion as soon as possible.

Hi Scrapinghub Team,

Could you please provide you technical support contact number. It will help me a lot.

Thanks

Login to post a comment