Redirects outside of allowed_domains

Posted about 4 years ago by A.

Post a topic
Un Answered

Hi,

When scraping a site I noticed many redirect attempts outside of allowed-domains. They are mostly to sites requesting authentication, like twitter.api, facebook etc. On the other hand other sites do get filtered by offsiterequestes middleware.

This is my spider:

class ScriptScrapy(CrawlSpider): 

name = 'scriptscrapy'

allowed_domains = ['eldorado.ru']

start_urls = ['http://eldorado.ru']  

rules = ( Rule(LinkExtractor(), callback='parse_item', follow=True), )


And  this is a sample redirect I get 2020-11-04 15:00:49 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://api.twitter.com/oauth/authorize?oauth_token=2NS7MgAAAAAA7tOrAAABdZNYb9Q> (referer: https://www.eldorado.ru/cat/detail/smartfon-apple-iphone-12-pro-256gb-pacific-blue-mgmt3ru-a/?show=response)


When I visit the same URL via my browser then no redirects happen

0 Votes


0 Comments

Login to post a comment