No recent searches
Popular Articles
Sorry! nothing found for
Posted almost 3 years ago by Muzzi Aldean
I am trying to scrape some data from 2 URLs on the domain https://fedsfm.ru/:
1) https://fedsfm.ru/documents/terrorists-catalog-portal-act2) https://fedsfm.ru/documents/omu-list
Before I integrated Zyte, I was hitting 403 HTTP errors every time. Now that I am using Zyte Proxy Manager, I am hitting 504s a lot of the time. The odd request is successful, roughly 1 in 5.
Is there a way I can get passed this block?My rough implementation is:
HEADERS = { "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9", "Accept-Encoding": "gzip, deflate, br", "Accept-Language": "en-GB,en-US;q=0.9,en;q=0.8", "Cache-Control": "max-age=0", "Connection": "keep-alive", "Host": "fedsfm.ru", "sec-ch-ua": '" Not A;Brand";v="99", "Chromium";v="96", "Google Chrome";v="96"', "sec-ch-ua-platform": "Linux", "Sec-Fetch-Dest": "document", "Sec-Fetch-Mode": "navigate", "Sec-Fetch-Site": "cross-site", "Sec-Fetch-User": "?1", "Upgrade-Insecure-Requests": 1, "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36", } PROXY_HEADERS = urllib3.make_headers(proxy_basic_auth=os.getenv("ZYTE_PROXY_API_KEY")) http_manager = urllib3.ProxyManager( "http://proxy.crawlera.com:8010/", cert_reqs="CERT_NONE", proxy_headers=PROXY_HEADERS, headers=HEADERS, ) http_manager.request("GET", "https://fedsfm.ru/documents/terrorists-catalog-portal-act")
0 Votes
0 Comments
Login to post a comment
People who like this
This post will be deleted permanently. Are you sure?
I am trying to scrape some data from 2 URLs on the domain https://fedsfm.ru/:
1) https://fedsfm.ru/documents/terrorists-catalog-portal-act
2) https://fedsfm.ru/documents/omu-list
Before I integrated Zyte, I was hitting 403 HTTP errors every time. Now that I am using Zyte Proxy Manager, I am hitting 504s a lot of the time. The odd request is successful, roughly 1 in 5.
Is there a way I can get passed this block?
My rough implementation is:
0 Votes
0 Comments
Login to post a comment