videocamWeb Data Extraction Summit - September 30th, 2021.
Join some of the greatest minds in web scraping to educate, inspire, and innovate.
Register for free!
Start a new topic
Answered

'>' string is added as url when spider starts

Hi, 


I am encountering this issue when running the spider:


[scrapy.core.scraper] Error downloading <GET http://www.yalwa.com>: Connection was refused by other side: 111: Connection refused.


As you notice, the string ">" is identified as part of the starting url. How should I fix this? 


My spider works when I run on my local machine so I am confused why it is not working in scrapinghub. 


Can you help me please? 


Thank you.

error.JPG
(120 KB)

Best Answer

Hi,


The ">" at the end is a known bug on how the logs are displayed. The connection refused error actually means that the target domain has the Scrapy Cloud IP(s) blocked, so the solution would be to use Crawlera as a proxy.

1 Comment

Answer

Hi,


The ">" at the end is a known bug on how the logs are displayed. The connection refused error actually means that the target domain has the Scrapy Cloud IP(s) blocked, so the solution would be to use Crawlera as a proxy.

Login to post a comment