Cannot run spiders from script scrapinghub

Posted almost 8 years ago by xavierdurandsmet

Post a topic

Answered

xavierdurandsmet

Hi,

I have a scrapy project which works great. I am trying to migrate it to ScrapingHub.

I want to be able to launch spiders from a script (see code below), but it is not working. (not accessing the Spider parse() function):

SCRIPT:

def main():

...

yield crawler.crawl(quotes_spider.QuotesSpider)

crawler.start()

Is it possible to do it this way? If so, how? If not, how can I run a script which calls Spiders?

Thank you

0 Votes

nestor posted almost 8 years ago Admin Best Answer

You should probably use the python-scrapinghub library: https://python-scrapinghub.readthedocs.io/en/latest/quickstart.html

1 Votes

2 Comments

xavierdurandsmet posted almost 8 years ago

Thank you, it works great!

However, in my script, I need the that the spiders run async to the code, like in Scrapy (because I wait for the spider to finish crawling to get the scraped data)

How can I do the following in scrapingHub?

Example:


# do some stuff

process.crawl(MySpider)
process.start() # the script will block here until the crawling is finished
# do other stuff after the the spiders are done running

0 Votes

nestor posted almost 8 years ago Admin Answer

You should probably use the python-scrapinghub library: https://python-scrapinghub.readthedocs.io/en/latest/quickstart.html

1 Votes