videocamWeb Data Extraction Summit - September 30th, 2021.
Join some of the greatest minds in web scraping to educate, inspire, and innovate.
Register for free!
Start a new topic
Answered

crawlera api is not working with python requests

 i am using crawlera from 2 months , it was woking fine but  got this error :

"/home/vocso/.local/lib/python3.6/site-packages/urllib3/connection.py:362: SubjectAltNameWarning: Certificate for www.google.com has no `subjectAltName`, falling back to check for a `commonName` for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See https://github.com/shazow/urllib3/issues/497 for details.)

here is my code

"

 

        proxy_host = "proxy.crawlera.com"


 proxy_port = "8010"

 proxy_auth = "<key>:" 

 proxies = {"https""https://{}@{}:{}/".format(proxy_auth, proxy_host, proxy_port),

 "http""http://{}@{}:{}/".format(proxy_auth, proxy_host, proxy_port)}

 photon_requests_session = requests.sessions.Session()

 photon_requests_session.verify = certifi.where()

 r = requests.get(url,proxies=proxies,verify="crawlera-ca.crt")

 soup = BeautifulSoup(r.text,'html5lib')

"


Best Answer

That's just a warning about an urllib3 feature support, not an error. The request still goes through the proxy and can get a response, so it can be safely ignored.


Answer

That's just a warning about an urllib3 feature support, not an error. The request still goes through the proxy and can get a response, so it can be safely ignored.

hi


I am getting the response on localhost but not getting any response in aws server. The code is same. 

 

i got nothing in server 
url = "https://www.google.com/search?q="+entry_keyword+"&gl="+entry_gl+"&start="+str(i*10)+"&as_qdr=y15"

proxy_host = "proxy.crawlera.com"

proxy_port = "8010"

proxy_auth = "<key>:"

proxies = {"https": "https://{}@{}:{}/".format(proxy_auth, proxy_host, proxy_port),

"http": "http://{}@{}:{}/".format(proxy_auth, proxy_host, proxy_port)}

photon_requests_session = requests.sessions.Session()

photon_requests_session.verify = certifi.where()

r = photon_requests_session.get(url,proxies=proxies,verify='crawlera-ca.crt')

print(r.text)

whole.png
(95.7 KB)

Please add a more verbose response like in the sample: https://support.scrapinghub.com/solution/articles/22000203567-using-crawlera-with-python-requests (e.g. response headers)

 

            url = "https://www.google.com/search?q="+entry_keyword+"&gl="+entry_gl+"&start="+str(i*10)+"&as_qdr=y15"

proxy_host = "proxy.crawlera.com"

proxy_port = "8010"

proxy_auth = "<key>:"

proxies = {"https": "https://{}@{}:{}/".format(proxy_auth, proxy_host, proxy_port),

"http": "http://{}@{}:{}/".format(proxy_auth, proxy_host, proxy_port)}

photon_requests_session = requests.sessions.Session()

photon_requests_session.verify = certifi.where()

r = photon_requests_session.get(url,proxies=proxies,verify='crawlera-ca.crt')

soup = BeautifulSoup(r.text,'html5lib')

print("""

 

Requesting [{}]

through proxy [{}]


Request Headers:

{}


Response Time: {}

Response Code: {}

Response Headers:

{}


""".format(url, proxy_host, r.request.headers, r.elapsed.total_seconds(),

r.status_code, r.headers, r.text))

new.png
(88.7 KB)

showing bad proxy auth on server but working perfectly in local machine

Are you sure you're using the same script in both local and AWS?

yaa sir  i am 100% sure

Bad Authentication is a client side error, if the API Key being used is the correct one, then the only I can think of is the python requests version installed which might be causing problems.

problem solved...problem is  python requests version 

and thank you so much for help

No problem.

import requests

url = "http://httpbin.org/ip"
proxy_host = "proxy.crawlera.com"
proxy_port = "8010"
proxy_auth = "<APIKEY>:" # Make sure to include ':' at the end
proxies = {"https": "https://{}@{}:{}/".format(proxy_auth, proxy_host, proxy_port),
"http": "http://{}@{}:{}/".format(proxy_auth, proxy_host, proxy_port)}

r = requests.get(url, proxies=proxies,
verify=False)

print("""
Requesting [{}]
through proxy [{}]

Request Headers:
{}

Response Time: {}
Response Code: {}
Response Headers:
{}

""".format(url, proxy_host, r.request.headers, r.elapsed.total_seconds(),
r.status_code, r.headers, r.text)) OUTPUT Requesting [http://httpbin.org/ip]
through proxy [proxy.crawlera.com]
Request Headers:
{'User-Agent': 'python-requests/2.19.0', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}
Response Time: 0.894725
Response Code: 407
Response Headers:
{'X-Crawlera-Error': 'bad_proxy_auth', 'Proxy-Authenticate': 'Basic realm="Crawlera"', 'Content-Length': '0', 'Date': 'Fri, 16 Oct 2020 05:48:29 GMT', 'Proxy-Connection': 'close', 'Connection': 'close'}
PLEASE HELP: I AM USING REQUESTS VERSION 2.19.0

Can you please help me in the on this above question,

I am getting 'bad_proxy_auth' error while trying to test run the code. I am using requests version 2.19.0

url = "http://httpbin.org/ip"
proxy_host = "proxy.crawlera.com"
proxy_port = "8010"
proxy_auth = "<APIKEY>:" # Make sure to include ':' at the end
proxies = {"https": "https://{}@{}:{}/".format(proxy_auth, proxy_host, proxy_port),
"http": "http://{}@{}:{}/".format(proxy_auth, proxy_host, proxy_port)}

r = requests.get(url, proxies=proxies,
verify=False)
Login to post a comment