Start a new topic

How to handle 302 redirects?

I read that Crawlera treats a 302 redirect as a successful request, but what if it's actually an anti-spider response from the server? This happened to me when I tried to use the POST method, only to be rebuffed and redirected to an authentication page. Is there a way to manually ask Crawlera to retry with a new IP address when that happens?

2020-06-24 20:37:56 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET http://wenshuapp.court.gov.cn/authenticatin/require> from <POST http://wenshuapp.court.gov.cn/appinterface/rest.q4w/>
2020-06-24 20:38:00 [scrapy.core.engine] DEBUG: Crawled (401) <GET http://wenshuapp.court.gov.cn/authenticatin/require> (referer: None)

 

1 Comment

Hello,


We can add the case as a ban rule so that when such a response is received Crawlera would treat it as ban and retry. 

However, we would need to check the flow of the pages and if the redirects are valid. Does it need authentication. Does the retry to the request gives successful response. 

Login to post a comment