No recent searches
Popular Articles
Sorry! nothing found for
Posted over 6 years ago by ma
I start to suspect some website shadowban scrapinghub.
An example is
https://www.myrecipes.com/recipe/chocolate-cream-martini
If I try by hand with scrapy to get the content of json+ld I get a certain json. If I let scrapinghub read it, I will get something else.
The simplest
yield {
'url' : response.url,
'body': response.body
}
Shows that the body on scrapy shell contains a long json ld.
This will help find the issue easier.
results = response.css("script[type='application/ld+json']").extract()
What could I do? It's not a matter of user agent imho.
0 Votes
nestor posted over 6 years ago Admin Best Answer
You probably need a proxy like Crawlera: https://scrapinghub.com/crawlera
1 Comments
nestor posted over 6 years ago Admin Answer
Login to post a comment
People who like this
This post will be deleted permanently. Are you sure?
I start to suspect some website shadowban scrapinghub.
An example is
https://www.myrecipes.com/recipe/chocolate-cream-martini
If I try by hand with scrapy to get the content of json+ld I get a certain json. If I let scrapinghub read it, I will get something else.
The simplest
yield {
'url' : response.url,
'body': response.body
}
Shows that the body on scrapy shell contains a long json ld.
This will help find the issue easier.
results = response.css("script[type='application/ld+json']").extract()
What could I do? It's not a matter of user agent imho.
0 Votes
nestor posted over 6 years ago Admin Best Answer
You probably need a proxy like Crawlera: https://scrapinghub.com/crawlera
0 Votes
1 Comments
nestor posted over 6 years ago Admin Answer
You probably need a proxy like Crawlera: https://scrapinghub.com/crawlera
0 Votes
Login to post a comment