However, in my script, I need the that the spiders run async to the code, like in Scrapy (because I wait for the spider to finish crawling to get the scraped data)
How can I do the following in scrapingHub?
Example:
# do some stuff
process.crawl(MySpider)
process.start() # the script will block here until the crawling is finished
# do other stuff after the the spiders are done running
Hi,
I have a scrapy project which works great. I am trying to migrate it to ScrapingHub.
I want to be able to launch spiders from a script (see code below), but it is not working. (not accessing the Spider parse() function):
SCRIPT:
def main():
...
yield crawler.crawl(quotes_spider.QuotesSpider)
crawler.start()
Is it possible to do it this way? If so, how? If not, how can I run a script which calls Spiders?
Thank you
0 Votes
nestor posted almost 7 years ago Admin Best Answer
You should probably use the python-scrapinghub library: https://python-scrapinghub.readthedocs.io/en/latest/quickstart.html
1 Votes
2 Comments
xavierdurandsmet posted almost 7 years ago
Thank you, it works great!
However, in my script, I need the that the spiders run async to the code, like in Scrapy (because I wait for the spider to finish crawling to get the scraped data)
How can I do the following in scrapingHub?
Example:
0 Votes
nestor posted almost 7 years ago Admin Answer
You should probably use the python-scrapinghub library: https://python-scrapinghub.readthedocs.io/en/latest/quickstart.html
1 Votes
Login to post a comment