Learn all about the latest trends and best practices in data extraction - Join us at Extract SummitGet tickets
Start a new topic

Use Crawlera with Selenium issues

Hi, 


I followed the instructions to implement Crawlera with Selenium. 


https://support.scrapinghub.com/support/solutions/articles/22000203564-using-crawlera-with-selenium


When using Chrome, there is a warning NET::ERR_CERT_AUTHORITY_INVALID.


When using Firefox, it goes to the url with a security exception but the page loads very slowly and it times out. 


My code as below


headless_proxy = "localhost:3128"

 

#Chrome'

proxy = Proxy({

    'proxyType': ProxyType.MANUAL,

    'httpProxy': headless_proxy,

    'ftpProxy' : headless_proxy,

    'sslProxy' : headless_proxy,

    'noProxy' : ''

})

 

chrome_options = Options()

chrome_options.add_argument('--start-fullscreen')

chrome_options.add_experimental_option("excludeSwitches",["ignore-certificate-errors"])

capabilities = dict(DesiredCapabilities.CHROME)

proxy.add_to_capabilities(capabilities)

driver = webdriver.Chrome(desired_capabilities=capabilities, executable_path='chromedriver', options=chrome_options)

driver.set_page_load_timeout(600)


# Firefox

firefox_capabilities = webdriver.DesiredCapabilities.FIREFOX

firefox_capabilities['marionette'] = True

 

firefox_capabilities['proxy'] = {

    "proxyType": "MANUAL",

    "httpProxy": headless_proxy,

    "ftpProxy": headless_proxy,

    "sslProxy": headless_proxy

}

 

driver = webdriver.Firefox(capabilities=firefox_capabilities)

driver.set_page_load_timeout(600)


I am using scrapy with crawlera and also tried Splash + scrapy-splash but there is a warning " scrapy-splash Call to deprecated function to_native_str. Use to_unicode instead". I followed this instructions https://support.scrapinghub.com/support/solutions/articles/22000234854-how-to-use-crawlera-with-headless-browsers


I just need to find a solution that works for scrapy+crawlera+rendering or crawelra+selenium. 


1 person has this question
Login to post a comment