Start a new topic

Scrapy crawl yahoo sites redirect to guce.oath.com/collectConsent

When I trying to crawl yahoo shop: https://hk.shop.yahoo.com/shop/CityLink-%E9%A0%98%E5%9F%9F-11756.


The code runs properly on my local machine. But when deployed to scrapinghub, the page will be redirected.

image


This might because when using local machine, there are cookies existing. But scrapinghub is using dynamic IP address. Then when I try to using splash to click the OK button on the redirected page, it seems not working. Here is what I have found https://stackoverflow.com/questions/51085067/using-scrapy-splash-clicking-a-button


And below is what I have tried to fix this issue.

function main(splash)
splash:wait(1)
splash:runjs('document.querySelector("button.primary").click()')
splash:wait(1)
return {
html = splash:html(),
}
end
py
(2.89 KB)
Login to post a comment