Start a new topic

Intermittent 504 Errors

I am trying to scrape some data from 2 URLs on the domain https://fedsfm.ru/:

1) https://fedsfm.ru/documents/terrorists-catalog-portal-act
2) https://fedsfm.ru/documents/omu-list


Before I integrated Zyte, I was hitting 403 HTTP errors every time. Now that I am using Zyte Proxy Manager, I am hitting 504s a lot of the time. The odd request is successful, roughly 1 in 5.


Is there a way I can get passed this block?

My rough implementation is:


   

HEADERS = {
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
    "Accept-Encoding": "gzip, deflate, br",
    "Accept-Language": "en-GB,en-US;q=0.9,en;q=0.8",
    "Cache-Control": "max-age=0",
    "Connection": "keep-alive",
    "Host": "fedsfm.ru",
    "sec-ch-ua": '" Not A;Brand";v="99", "Chromium";v="96", "Google Chrome";v="96"',
    "sec-ch-ua-platform": "Linux",
    "Sec-Fetch-Dest": "document",
    "Sec-Fetch-Mode": "navigate",
    "Sec-Fetch-Site": "cross-site",
    "Sec-Fetch-User": "?1",
    "Upgrade-Insecure-Requests": 1,
    "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36",
}

PROXY_HEADERS = urllib3.make_headers(proxy_basic_auth=os.getenv("ZYTE_PROXY_API_KEY"))

http_manager = urllib3.ProxyManager(
    "http://proxy.crawlera.com:8010/",
     cert_reqs="CERT_NONE",
     proxy_headers=PROXY_HEADERS,
     headers=HEADERS,
)

http_manager.request("GET", "https://fedsfm.ru/documents/terrorists-catalog-portal-act")

   



Login to post a comment