Specific site using http 404 to circumvent ban detection
x
xingzhouliu
started a topic
over 6 years ago
I've recently seen some important sites returning http 404's instead of other codes to ban ip's. This behavior appears only when I use crawlera, not other proxies or ips using the same headers, randomized intervals, etc., and circumvents crawlera's ban detection.
Anyone else run into this, and are there any possible fixes down the road?
Best Answer
n
nestor
said
over 6 years ago
I've added a ban rule to handle this cases of 404s so that Crawlera will retry the request with a different IP if it receives this response.
xingzhouliu
I've recently seen some important sites returning http 404's instead of other codes to ban ip's. This behavior appears only when I use crawlera, not other proxies or ips using the same headers, randomized intervals, etc., and circumvents crawlera's ban detection.
Anyone else run into this, and are there any possible fixes down the road?
I've added a ban rule to handle this cases of 404s so that Crawlera will retry the request with a different IP if it receives this response.
- Oldest First
- Popular
- Newest First
Sorted by Oldest Firstxingzhouliu
About 50-75% of pages need to be re-tried multiple times before success.
nestor
I've added a ban rule to handle this cases of 404s so that Crawlera will retry the request with a different IP if it receives this response.
2 people like this
-
Crawlera 503 Ban
-
Amazon scraping speed
-
Website redirects
-
Error Code 429 Too Many Requests
-
Bing
-
Subscribed to Crawlera but saying Not Subscribed
-
Selenium with c#
-
Using Crawlera with browsermob
-
CRAWLERA_PRESERVE_DELAY leads to error
-
How to connect Selenium PhantomJS to Crawlera?
See all 399 topics