Scrapy crawl yahoo sites redirect to guce.oath.com/collectConsent

Posted over 6 years ago by Leon Liang

Post a topic

Un Answered

Leon Liang

When I trying to crawl yahoo shop: https://hk.shop.yahoo.com/shop/CityLink-%E9%A0%98%E5%9F%9F-11756.

The code runs properly on my local machine. But when deployed to scrapinghub, the page will be redirected.

This might because when using local machine, there are cookies existing. But scrapinghub is using dynamic IP address. Then when I try to using splash to click the OK button on the redirected page, it seems not working. Here is what I have found https://stackoverflow.com/questions/51085067/using-scrapy-splash-clicking-a-button

And below is what I have tried to fix this issue.

function main(splash)
    splash:wait(1)
    splash:runjs('document.querySelector("button.primary").click()')
    splash:wait(1)
    return {
        html = splash:html(),
    }
end

Attachments (1)

yahoo.py
2.89 KB

0 Votes

0 Comments