Start a new topic
Answered

Cannot run spiders from script scrapinghub

Hi,


I have a scrapy project which works great. I am trying to migrate it to ScrapingHub.

I want to be able to launch spiders from a script (see code below), but it is not working. (not accessing the Spider parse() function):


SCRIPT:

def main():

...

   yield crawler.crawl(quotes_spider.QuotesSpider)

   crawler.start()


Is it possible to do it this way? If so, how? If not, how can I run a script which calls Spiders?


Thank you


Best Answer

You should probably use the python-scrapinghub library: https://python-scrapinghub.readthedocs.io/en/latest/quickstart.html


Answer

You should probably use the python-scrapinghub library: https://python-scrapinghub.readthedocs.io/en/latest/quickstart.html


1 person likes this

Thank you, it works great!


However, in my script, I need the that the spiders run async to the code, like in Scrapy (because I wait for the spider to finish crawling to get the scraped data)

How can I do the following in scrapingHub?

Example:


# do some stuff

process.crawl(MySpider)
process.start() # the script will block here until the crawling is finished
# do other stuff after the the spiders are done running
Login to post a comment