crawlera api is not working with python requests

d

deepakchauhan

started a topic almost 5 years ago

i am using crawlera from 2 months , it was woking fine but got this error :

"/home/vocso/.local/lib/python3.6/site-packages/urllib3/connection.py:362: SubjectAltNameWarning: Certificate for www.google.com has no `subjectAltName`, falling back to check for a `commonName` for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See https://github.com/shazow/urllib3/issues/497 for details.)

"

here is my code

"

proxy_host = "proxy.crawlera.com"

proxy_port = "8010"

proxy_auth = "<key>:"

proxies = {"https": "https://{}@{}:{}/".format(proxy_auth, proxy_host, proxy_port),

"http": "http://{}@{}:{}/".format(proxy_auth, proxy_host, proxy_port)}

photon_requests_session = requests.sessions.Session()

photon_requests_session.verify = certifi.where()

r = requests.get(url,proxies=proxies,verify="crawlera-ca.crt")

soup = BeautifulSoup(r.text,'html5lib')

"

Best Answer

n

nestor said almost 5 years ago

That's just a warning about an urllib3 feature support, not an error. The request still goes through the proxy and can get a response, so it can be safely ignored.

nestor

said almost 5 years ago

Answer

That's just a warning about an urllib3 feature support, not an error. The request still goes through the proxy and can get a response, so it can be safely ignored.

d

deepakchauhan

said almost 5 years ago

hi

I am getting the response on localhost but not getting any response in aws server. The code is same.

d

deepakchauhan

said almost 5 years ago

i got nothing in server

url = "https://www.google.com/search?q="+entry_keyword+"&gl="+entry_gl+"&start="+str(i*10)+"&as_qdr=y15"

proxy_host = "proxy.crawlera.com"

proxy_port = "8010"

proxy_auth = "<key>:"

proxies = {"https": "https://{}@{}:{}/".format(proxy_auth, proxy_host, proxy_port),

"http": "http://{}@{}:{}/".format(proxy_auth, proxy_host, proxy_port)}

photon_requests_session = requests.sessions.Session()

photon_requests_session.verify = certifi.where()

r = photon_requests_session.get(url,proxies=proxies,verify='crawlera-ca.crt')

print(r.text)

whole.png

(95.7 KB)

nestor

said almost 5 years ago

Please add a more verbose response like in the sample: https://support.scrapinghub.com/solution/articles/22000203567-using-crawlera-with-python-requests (e.g. response headers)

d

deepakchauhan

said almost 5 years ago

url = "https://www.google.com/search?q="+entry_keyword+"&gl="+entry_gl+"&start="+str(i*10)+"&as_qdr=y15"

proxy_host = "proxy.crawlera.com"

proxy_port = "8010"

proxy_auth = "<key>:"

proxies = {"https": "https://{}@{}:{}/".format(proxy_auth, proxy_host, proxy_port),

"http": "http://{}@{}:{}/".format(proxy_auth, proxy_host, proxy_port)}

photon_requests_session = requests.sessions.Session()

photon_requests_session.verify = certifi.where()

r = photon_requests_session.get(url,proxies=proxies,verify='crawlera-ca.crt')

soup = BeautifulSoup(r.text,'html5lib')

print("""

Requesting [{}]

through proxy [{}]

Request Headers:

{}

Response Time: {}

Response Code: {}

Response Headers:

{}

""".format(url, proxy_host, r.request.headers, r.elapsed.total_seconds(),

r.status_code, r.headers, r.text))

new.png

(88.7 KB)

d

deepakchauhan

said almost 5 years ago

showing bad proxy auth on server but working perfectly in local machine

nestor

said almost 5 years ago

Are you sure you're using the same script in both local and AWS?

d

deepakchauhan

said almost 5 years ago

yaa sir i am 100% sure

nestor

said almost 5 years ago

Bad Authentication is a client side error, if the API Key being used is the correct one, then the only I can think of is the python requests version installed which might be causing problems.

d

deepakchauhan

said almost 5 years ago

problem solved...problem is python requests version

d

deepakchauhan

said almost 5 years ago

and thank you so much for help

nestor

said almost 5 years ago

No problem.

G

Godfrey Jean

said over 3 years ago

import requests

url = "http://httpbin.org/ip"
proxy_host = "proxy.crawlera.com"
proxy_port = "8010"
proxy_auth = "<APIKEY>:" # Make sure to include ':' at the end
proxies = {"https": "https://{}@{}:{}/".format(proxy_auth, proxy_host, proxy_port),
      "http": "http://{}@{}:{}/".format(proxy_auth, proxy_host, proxy_port)}

r = requests.get(url, proxies=proxies,
                 verify=False)

print("""
Requesting [{}]
through proxy [{}]

Request Headers:
{}

Response Time: {}
Response Code: {}
Response Headers:
{}

""".format(url, proxy_host, r.request.headers, r.elapsed.total_seconds(),
           r.status_code, r.headers, r.text))

OUTPUT

Requesting [http://httpbin.org/ip]

through proxy [proxy.crawlera.com]

Request Headers:

{'User-Agent': 'python-requests/2.19.0', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}

Response Time: 0.894725

Response Code: 407

Response Headers:

{'X-Crawlera-Error': 'bad_proxy_auth', 'Proxy-Authenticate': 'Basic realm="Crawlera"', 'Content-Length': '0', 'Date': 'Fri, 16 Oct 2020 05:48:29 GMT', 'Proxy-Connection': 'close', 'Connection': 'close'}

PLEASE HELP: I AM USING REQUESTS VERSION 2.19.0

G

Godfrey Jean

said over 3 years ago

Can you please help me in the on this above question,

I am getting 'bad_proxy_auth' error while trying to test run the code. I am using requests version 2.19.0

url = "http://httpbin.org/ip"
proxy_host = "proxy.crawlera.com"
proxy_port = "8010"
proxy_auth = "<APIKEY>:" # Make sure to include ':' at the end
proxies = {"https": "https://{}@{}:{}/".format(proxy_auth, proxy_host, proxy_port),
      "http": "http://{}@{}:{}/".format(proxy_auth, proxy_host, proxy_port)}

r = requests.get(url, proxies=proxies,
                 verify=False)

Zyte Support Center

How can we help you today?