Start a new topic

Read timeout

We have been scraping a site by using the C10 plan and have been getting the error below constantly:


Error:

requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='www.amazon.com', port=443): Read timed out. (read timeout=20)


Sample Code:

proxy_host = "proxy.crawlera.com"

proxy_port = "8010"

proxy_auth = "{key}:" # Make sure to include ':' at the end

proxies = {"https": "https://{}@{}:{}/".format(proxy_auth, proxy_host, proxy_port),

           "http": "http://{}@{}:{}/".format(proxy_auth, proxy_host, proxy_port)}

 

url = "https://www.example.com"

page = requests.get(url, proxies=proxies, verify=False,timeout=20 )


This error sometimes goes away by retrying. However, today after we processed about couple of thousands requests. It started to have this error for all the requests after that.


Is there any way to fix that?


3 people have this question
1 Comment

Here is more details:


HTTPSConnectionPool(host='www.amazon.com', port=443): Max retries exceeded with url: {url} (Caused by ProxyError('Cannot connect to proxy.', error('Tunnel connection failed: 503 Crawlera Server Unavailable',)))


1 person likes this
Login to post a comment