crawlera api is not working with python requests

Posted over 5 years ago by deepakchauhan

Post a topic
Answered
d
deepakchauhan

 i am using crawlera from 2 months , it was woking fine but  got this error :

"/home/vocso/.local/lib/python3.6/site-packages/urllib3/connection.py:362: SubjectAltNameWarning: Certificate for www.google.com has no `subjectAltName`, falling back to check for a `commonName` for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See https://github.com/shazow/urllib3/issues/497 for details.)

here is my code

"

 

        proxy_host = "proxy.crawlera.com"


 proxy_port = "8010"

 proxy_auth = "<key>:" 

 proxies = {"https""https://{}@{}:{}/".format(proxy_auth, proxy_host, proxy_port),

 "http""http://{}@{}:{}/".format(proxy_auth, proxy_host, proxy_port)}

 photon_requests_session = requests.sessions.Session()

 photon_requests_session.verify = certifi.where()

 r = requests.get(url,proxies=proxies,verify="crawlera-ca.crt")

 soup = BeautifulSoup(r.text,'html5lib')

"

0 Votes

nestor

nestor posted over 5 years ago Admin Best Answer

That's just a warning about an urllib3 feature support, not an error. The request still goes through the proxy and can get a response, so it can be safely ignored.

0 Votes


14 Comments

Sorted by
G

Godfrey Jean posted about 4 years ago

Can you please help me in the on this above question,

I am getting 'bad_proxy_auth' error while trying to test run the code. I am using requests version 2.19.0

url = "http://httpbin.org/ip"
proxy_host = "proxy.crawlera.com"
proxy_port = "8010"
proxy_auth = "<APIKEY>:" # Make sure to include ':' at the end
proxies = {"https": "https://{}@{}:{}/".format(proxy_auth, proxy_host, proxy_port),
"http": "http://{}@{}:{}/".format(proxy_auth, proxy_host, proxy_port)}

r = requests.get(url, proxies=proxies,
verify=False)

0 Votes

G

Godfrey Jean posted about 4 years ago

import requests

url = "http://httpbin.org/ip"
proxy_host = "proxy.crawlera.com"
proxy_port = "8010"
proxy_auth = "<APIKEY>:" # Make sure to include ':' at the end
proxies = {"https": "https://{}@{}:{}/".format(proxy_auth, proxy_host, proxy_port),
"http": "http://{}@{}:{}/".format(proxy_auth, proxy_host, proxy_port)}

r = requests.get(url, proxies=proxies,
verify=False)

print("""
Requesting [{}]
through proxy [{}]

Request Headers:
{}

Response Time: {}
Response Code: {}
Response Headers:
{}

""".format(url, proxy_host, r.request.headers, r.elapsed.total_seconds(),
r.status_code, r.headers, r.text)) OUTPUT Requesting [http://httpbin.org/ip]
through proxy [proxy.crawlera.com]
Request Headers:
{'User-Agent': 'python-requests/2.19.0', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}
Response Time: 0.894725
Response Code: 407
Response Headers:
{'X-Crawlera-Error': 'bad_proxy_auth', 'Proxy-Authenticate': 'Basic realm="Crawlera"', 'Content-Length': '0', 'Date': 'Fri, 16 Oct 2020 05:48:29 GMT', 'Proxy-Connection': 'close', 'Connection': 'close'}
PLEASE HELP: I AM USING REQUESTS VERSION 2.19.0

0 Votes

nestor

nestor posted over 5 years ago Admin

No problem.

0 Votes

d

deepakchauhan posted over 5 years ago

and thank you so much for help

0 Votes

d

deepakchauhan posted over 5 years ago

problem solved...problem is  python requests version 

0 Votes

nestor

nestor posted over 5 years ago Admin

Bad Authentication is a client side error, if the API Key being used is the correct one, then the only I can think of is the python requests version installed which might be causing problems.

0 Votes

d

deepakchauhan posted over 5 years ago

yaa sir  i am 100% sure

0 Votes

nestor

nestor posted over 5 years ago Admin

Are you sure you're using the same script in both local and AWS?

0 Votes

d

deepakchauhan posted over 5 years ago

showing bad proxy auth on server but working perfectly in local machine

0 Votes

d

deepakchauhan posted over 5 years ago

 

            url = "https://www.google.com/search?q="+entry_keyword+"&gl="+entry_gl+"&start="+str(i*10)+"&as_qdr=y15"

proxy_host = "proxy.crawlera.com"

proxy_port = "8010"

proxy_auth = "<key>:"

proxies = {"https": "https://{}@{}:{}/".format(proxy_auth, proxy_host, proxy_port),

"http": "http://{}@{}:{}/".format(proxy_auth, proxy_host, proxy_port)}

photon_requests_session = requests.sessions.Session()

photon_requests_session.verify = certifi.where()

r = photon_requests_session.get(url,proxies=proxies,verify='crawlera-ca.crt')

soup = BeautifulSoup(r.text,'html5lib')

print("""

 

Requesting [{}]

through proxy [{}]


Request Headers:

{}


Response Time: {}

Response Code: {}

Response Headers:

{}


""".format(url, proxy_host, r.request.headers, r.elapsed.total_seconds(),

r.status_code, r.headers, r.text))

Attachments (1)

0 Votes

nestor

nestor posted over 5 years ago Admin

Please add a more verbose response like in the sample: https://support.scrapinghub.com/solution/articles/22000203567-using-crawlera-with-python-requests (e.g. response headers)

0 Votes

d

deepakchauhan posted over 5 years ago

 

i got nothing in server 
url = "https://www.google.com/search?q="+entry_keyword+"&gl="+entry_gl+"&start="+str(i*10)+"&as_qdr=y15"

proxy_host = "proxy.crawlera.com"

proxy_port = "8010"

proxy_auth = "<key>:"

proxies = {"https": "https://{}@{}:{}/".format(proxy_auth, proxy_host, proxy_port),

"http": "http://{}@{}:{}/".format(proxy_auth, proxy_host, proxy_port)}

photon_requests_session = requests.sessions.Session()

photon_requests_session.verify = certifi.where()

r = photon_requests_session.get(url,proxies=proxies,verify='crawlera-ca.crt')

print(r.text)

Attachments (1)

0 Votes

d

deepakchauhan posted over 5 years ago

hi


I am getting the response on localhost but not getting any response in aws server. The code is same. 

0 Votes

nestor

nestor posted over 5 years ago Admin Answer

That's just a warning about an urllib3 feature support, not an error. The request still goes through the proxy and can get a response, so it can be safely ignored.

0 Votes

Login to post a comment