I'm using crawlera's c50 to crawl a bunch of websites and most of them are fine with it, except there are some websites which return captcha in response with 200 response code. In simple words the websites know that this is not a user but a crawler scraping data.
Crawlera should be able to handle this on its own right?
How the websites are detecting that I am using a crawler to scrap data when I have different proxy on each request and with a valid user agent from crawlera?
As far as I know that websites can detect bots based on:
IP
User Agent
Access Frequency
But as I'm using crawlera so IP and UA should not be an issue, and regarding access frequency I am sure that each website is crawled at least once in 10 minutes so the delay in simultaneous hits to the website's server is more than enough.
I'm using crawlera's c50 to crawl a bunch of websites and most of them are fine with it, except there are some websites which return captcha in response with 200 response code. In simple words the websites know that this is not a user but a crawler scraping data.
These websites are using fun captcha.
I am facing this issue with the following websites:
How am I supposed to bypass this situation?
Crawlera should be able to handle this on its own right?
How the websites are detecting that I am using a crawler to scrap data when I have different proxy on each request and with a valid user agent from crawlera?
As far as I know that websites can detect bots based on:
7 Votes
0 Comments
Login to post a comment