Start a new topic

Redirects outside of allowed_domains


When scraping a site I noticed many redirect attempts outside of allowed-domains. They are mostly to sites requesting authentication, like twitter.api, facebook etc. On the other hand other sites do get filtered by offsiterequestes middleware.

This is my spider:

class ScriptScrapy(CrawlSpider): 

name = 'scriptscrapy'

allowed_domains = ['']

start_urls = ['']  

rules = ( Rule(LinkExtractor(), callback='parse_item', follow=True), )

And  this is a sample redirect I get 2020-11-04 15:00:49 [scrapy.core.engine] DEBUG: Crawled (200) <GET> (referer:

When I visit the same URL via my browser then no redirects happen

Login to post a comment