Redirects outside of allowed_domains

Posted almost 5 years ago by A.

Post a topic

Un Answered

Hi,

When scraping a site I noticed many redirect attempts outside of allowed-domains. They are mostly to sites requesting authentication, like twitter.api, facebook etc. On the other hand other sites do get filtered by offsiterequestes middleware.

This is my spider:

class ScriptScrapy(CrawlSpider):

name = 'scriptscrapy'

allowed_domains = ['eldorado.ru']

start_urls = ['http://eldorado.ru']

rules = ( Rule(LinkExtractor(), callback='parse_item', follow=True), )

And this is a sample redirect I get 2020-11-04 15:00:49 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://api.twitter.com/oauth/authorize?oauth_token=2NS7MgAAAAAA7tOrAAABdZNYb9Q> (referer: https://www.eldorado.ru/cat/detail/smartfon-apple-iphone-12-pro-256gb-pacific-blue-mgmt3ru-a/?show=response)

When I visit the same URL via my browser then no redirects happen

0 Votes

0 Comments