I'm scanning a major retailer that has ~400k SKUs and would like to be able to scrape all of these daily in the future.
Right now, I'm doing a small subset of that. After the first few days of scraping without errors I'm now getting a lot of 429s. I'm using 4 heroku worker dynos that scan 100 urls in a row.
Is this just an issues of there aren't enough proxy IPs? Or is there something I can do to get around these errors?
0 Votes
thriveni posted
over 7 years ago
AdminBest Answer
Yes 429s errors are thrown when the parallel connection limit have reached for the plan. And the limit is cumulative of all domains.
Glad to know that you could resolve the issue.
1 Votes
4 Comments
Sorted by
K
Khoa Nguyenposted
about 7 years ago
Hi, I am on a C10 plan and I already specified CONCURRENT_REQUESTS = 10 so why I am still getting 429 error?
0 Votes
thriveniposted
over 7 years ago
AdminAnswer
Yes 429s errors are thrown when the parallel connection limit have reached for the plan. And the limit is cumulative of all domains.
Glad to know that you could resolve the issue.
1 Votes
a
atomantposted
over 7 years ago
I think I figured it out, each dyno had multiple threads
0 Votes
a
atomantposted
over 7 years ago
It looks like the 429 is coming from Crawlera, not the site?
I currently have the C10 plan, which says 10 concurrent connections. I'm using 4 dynos so I'm not sure why this wouldn't work
I'm scanning a major retailer that has ~400k SKUs and would like to be able to scrape all of these daily in the future.
Right now, I'm doing a small subset of that. After the first few days of scraping without errors I'm now getting a lot of 429s. I'm using 4 heroku worker dynos that scan 100 urls in a row.
Is this just an issues of there aren't enough proxy IPs? Or is there something I can do to get around these errors?
0 Votes
thriveni posted over 7 years ago Admin Best Answer
Yes 429s errors are thrown when the parallel connection limit have reached for the plan. And the limit is cumulative of all domains.
Glad to know that you could resolve the issue.
1 Votes
4 Comments
Khoa Nguyen posted about 7 years ago
Hi,
I am on a C10 plan and I already specified CONCURRENT_REQUESTS = 10 so why I am still getting 429 error?
0 Votes
thriveni posted over 7 years ago Admin Answer
Yes 429s errors are thrown when the parallel connection limit have reached for the plan. And the limit is cumulative of all domains.
Glad to know that you could resolve the issue.
1 Votes
atomant posted over 7 years ago
I think I figured it out, each dyno had multiple threads
0 Votes
atomant posted over 7 years ago
It looks like the 429 is coming from Crawlera, not the site?
I currently have the C10 plan, which says 10 concurrent connections. I'm using 4 dynos so I'm not sure why this wouldn't work
0 Votes
Login to post a comment