Cannot run spiders from script scrapinghub

Posted almost 7 years ago by xavierdurandsmet

Post a topic
Answered
x
xavierdurandsmet

Hi,


I have a scrapy project which works great. I am trying to migrate it to ScrapingHub.

I want to be able to launch spiders from a script (see code below), but it is not working. (not accessing the Spider parse() function):


SCRIPT:

def main():

...

   yield crawler.crawl(quotes_spider.QuotesSpider)

   crawler.start()


Is it possible to do it this way? If so, how? If not, how can I run a script which calls Spiders?


Thank you

0 Votes

nestor

nestor posted almost 7 years ago Admin Best Answer

You should probably use the python-scrapinghub library: https://python-scrapinghub.readthedocs.io/en/latest/quickstart.html

1 Votes


2 Comments

Sorted by
x

xavierdurandsmet posted almost 7 years ago

Thank you, it works great!


However, in my script, I need the that the spiders run async to the code, like in Scrapy (because I wait for the spider to finish crawling to get the scraped data)

How can I do the following in scrapingHub?

Example:


# do some stuff

process.crawl(MySpider)
process.start() # the script will block here until the crawling is finished
# do other stuff after the the spiders are done running

0 Votes

nestor

nestor posted almost 7 years ago Admin Answer

You should probably use the python-scrapinghub library: https://python-scrapinghub.readthedocs.io/en/latest/quickstart.html

1 Votes

Login to post a comment