I am using scrapinhub cloud with a splash instance to scrape content and images from a large list of urls that are provided with the spider. There are around 50 000 urls that I wish to crawl.
The first time I ran it, the spider went for just under 13 hours then closed after only scraping 11k urls. The next time I ran it, it only went for 2 hours and scraped 2k urls.
The only message I got was the following:
(TCP Port 6023 Closed)
Please let me know any possible solutions or more info I can provide
Update: This has happened repeatedly, closing after only 1-2 hours each time
I am using scrapinhub cloud with a splash instance to scrape content and images from a large list of urls that are provided with the spider. There are around 50 000 urls that I wish to crawl.
The first time I ran it, the spider went for just under 13 hours then closed after only scraping 11k urls. The next time I ran it, it only went for 2 hours and scraped 2k urls.
The only message I got was the following:
(TCP Port 6023 Closed)
Please let me know any possible solutions or more info I can provide
Update: This has happened repeatedly, closing after only 1-2 hours each time
0 Votes
0 Comments
Login to post a comment