Read timeout

Posted about 7 years ago by riseinnovation

Post a topic

Un Answered

riseinnovation

We have been scraping a site by using the C10 plan and have been getting the error below constantly:

Error:

requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='www.amazon.com', port=443): Read timed out. (read timeout=20)

Sample Code:

proxy_host = "proxy.crawlera.com"

proxy_port = "8010"

proxy_auth = "{key}:" # Make sure to include ':' at the end

proxies = {"https": "https://{}@{}:{}/".format(proxy_auth, proxy_host, proxy_port),

"http": "http://{}@{}:{}/".format(proxy_auth, proxy_host, proxy_port)}

url = "https://www.example.com"

page = requests.get(url, proxies=proxies, verify=False,timeout=20 )

This error sometimes goes away by retrying. However, today after we processed about couple of thousands requests. It started to have this error for all the requests after that.

Is there any way to fix that?

3 Votes

1 Comments

riseinnovation posted about 7 years ago

Here is more details:

HTTPSConnectionPool(host='www.amazon.com', port=443): Max retries exceeded with url: {url} (Caused by ProxyError('Cannot connect to proxy.', error('Tunnel connection failed: 503 Crawlera Server Unavailable',)))

1 Votes