different page served to scrapinghub ips

Posted about 6 years ago by ma

Post a topic
Answered

I start to suspect some website shadowban scrapinghub.

An example is 

https://www.myrecipes.com/recipe/chocolate-cream-martini


If I try by hand with scrapy to get the content of json+ld I get a certain json. If I let scrapinghub read it, I will get something else.


The simplest 

yield {

'url' : response.url,

'body': response.body

}


Shows that the body on scrapy shell contains a long json ld.

This will help find the issue easier.

results = response.css("script[type='application/ld+json']").extract()


What could I do? It's not a  matter of user agent imho.

0 Votes

nestor

nestor posted about 6 years ago Admin Best Answer

You probably need a proxy like Crawlera: https://scrapinghub.com/crawlera

0 Votes


1 Comments

nestor

nestor posted about 6 years ago Admin Answer

You probably need a proxy like Crawlera: https://scrapinghub.com/crawlera

0 Votes

Login to post a comment