I'm scanning a major retailer that has ~400k SKUs and would like to be able to scrape all of these daily in the future.
Right now, I'm doing a small subset of that. After the first few days of scraping without errors I'm now getting a lot of 429s. I'm using 4 heroku worker dynos that scan 100 urls in a row.
Is this just an issues of there aren't enough proxy IPs? Or is there something I can do to get around these errors?
Best Answer
t
thriveni
said
about 6 years ago
Yes 429s errors are thrown when the parallel connection limit have reached for the plan. And the limit is cumulative of all domains.
atomant
I'm scanning a major retailer that has ~400k SKUs and would like to be able to scrape all of these daily in the future.
Right now, I'm doing a small subset of that. After the first few days of scraping without errors I'm now getting a lot of 429s. I'm using 4 heroku worker dynos that scan 100 urls in a row.
Is this just an issues of there aren't enough proxy IPs? Or is there something I can do to get around these errors?
Yes 429s errors are thrown when the parallel connection limit have reached for the plan. And the limit is cumulative of all domains.
Glad to know that you could resolve the issue.
- Oldest First
- Popular
- Newest First
Sorted by Oldest Firstatomant
It looks like the 429 is coming from Crawlera, not the site?
I currently have the C10 plan, which says 10 concurrent connections. I'm using 4 dynos so I'm not sure why this wouldn't work
atomant
I think I figured it out, each dyno had multiple threads
thriveni
Yes 429s errors are thrown when the parallel connection limit have reached for the plan. And the limit is cumulative of all domains.
Glad to know that you could resolve the issue.
1 person likes this
Khoa Nguyen
Hi,
I am on a C10 plan and I already specified CONCURRENT_REQUESTS = 10 so why I am still getting 429 error?
-
Crawlera 503 Ban
-
Amazon scraping speed
-
Website redirects
-
Bing
-
Subscribed to Crawlera but saying Not Subscribed
-
Selenium with c#
-
Using Crawlera with browsermob
-
CRAWLERA_PRESERVE_DELAY leads to error
-
How to connect Selenium PhantomJS to Crawlera?
See all 381 topics