My spider has a high AUTOTHROTTLE_TARGET_CONCURRENCY and CONCURRENT_REQUESTS, the concurrent transferring requests should be high (in my local PC it is very high indeed). But in scrapyinghub, I found the concurrent transferring requests can not be larger than 4. All other active requests are waiting to be transferred.
So is there any limitation on maximum socket connection for one unit container?
Currently I am using the free plan.
THANKS a lot.
0 Votes
2 Comments
Sorted by
thriveniposted
about 7 years ago
Admin
From the logs can see that AUTOTHROTTLE_ENABLED is set to True. For the concurrent requests to be used autothrottle needs to be disabled.
But do note to crawl the websites too fast as that would increase the probability of being blocked by target site or your Scrapinghub account getting suspended.
Regards,
Thriveni Patil
0 Votes
j
jxltomposted
about 7 years ago
Hi thriveni,
Thanks for your reply.
But I have set a high value of AUTOTHROTTLE_TARGET_CONCURRENCY , so even I enable AUTOTHROTTLE_ENABLED I should also have high concurrent requests which number should be the value of AUTOTHROTTLE_TARGET_CONCURRENCY.
Can you explain why I should disable autothrottle but not instead to set a high AUTOTHROTTLE_TARGET_CONCURRENCY to achieve concurrent requests?
Or is there any special treatment with autothrottle in Scrapinghub which does not respect the behavious of scrapy's default?
My spider has a high AUTOTHROTTLE_TARGET_CONCURRENCY and CONCURRENT_REQUESTS, the concurrent transferring requests should be high (in my local PC it is very high indeed). But in scrapyinghub, I found the concurrent transferring requests can not be larger than 4. All other active requests are waiting to be transferred.
So is there any limitation on maximum socket connection for one unit container?
Currently I am using the free plan.
THANKS a lot.
0 Votes
2 Comments
thriveni posted about 7 years ago Admin
From the logs can see that AUTOTHROTTLE_ENABLED is set to True. For the concurrent requests to be used autothrottle needs to be disabled.
But do note to crawl the websites too fast as that would increase the probability of being blocked by target site or your Scrapinghub account getting suspended.
Regards,
Thriveni Patil
0 Votes
jxltom posted about 7 years ago
Hi thriveni,
Thanks for your reply.
But I have set a high value of AUTOTHROTTLE_TARGET_CONCURRENCY , so even I enable AUTOTHROTTLE_ENABLED I should also have high concurrent requests which number should be the value of AUTOTHROTTLE_TARGET_CONCURRENCY.
Can you explain why I should disable autothrottle but not instead to set a high AUTOTHROTTLE_TARGET_CONCURRENCY to achieve concurrent requests?
Or is there any special treatment with autothrottle in Scrapinghub which does not respect the behavious of scrapy's default?
THANKS.
0 Votes
Login to post a comment