My spider has a high AUTOTHROTTLE_TARGET_CONCURRENCY and CONCURRENT_REQUESTS, the concurrent transferring requests should be high (in my local PC it is very high indeed). But in scrapyinghub, I found the concurrent transferring requests can not be larger than 4. All other active requests are waiting to be transferred.
So is there any limitation on maximum socket connection for one unit container?
Currently I am using the free plan.
THANKS a lot.
0 Votes
2 Comments
Sorted by
j
jxltomposted
almost 7 years ago
Hi thriveni,
Thanks for your reply.
But I have set a high value of AUTOTHROTTLE_TARGET_CONCURRENCY , so even I enable AUTOTHROTTLE_ENABLED I should also have high concurrent requests which number should be the value of AUTOTHROTTLE_TARGET_CONCURRENCY.
Can you explain why I should disable autothrottle but not instead to set a high AUTOTHROTTLE_TARGET_CONCURRENCY to achieve concurrent requests?
Or is there any special treatment with autothrottle in Scrapinghub which does not respect the behavious of scrapy's default?
THANKS.
0 Votes
thriveniposted
almost 7 years ago
Admin
From the logs can see that AUTOTHROTTLE_ENABLED is set to True. For the concurrent requests to be used autothrottle needs to be disabled.
But do note to crawl the websites too fast as that would increase the probability of being blocked by target site or your Scrapinghub account getting suspended.
My spider has a high AUTOTHROTTLE_TARGET_CONCURRENCY and CONCURRENT_REQUESTS, the concurrent transferring requests should be high (in my local PC it is very high indeed). But in scrapyinghub, I found the concurrent transferring requests can not be larger than 4. All other active requests are waiting to be transferred.
So is there any limitation on maximum socket connection for one unit container?
Currently I am using the free plan.
THANKS a lot.
0 Votes
2 Comments
jxltom posted almost 7 years ago
Hi thriveni,
Thanks for your reply.
But I have set a high value of AUTOTHROTTLE_TARGET_CONCURRENCY , so even I enable AUTOTHROTTLE_ENABLED I should also have high concurrent requests which number should be the value of AUTOTHROTTLE_TARGET_CONCURRENCY.
Can you explain why I should disable autothrottle but not instead to set a high AUTOTHROTTLE_TARGET_CONCURRENCY to achieve concurrent requests?
Or is there any special treatment with autothrottle in Scrapinghub which does not respect the behavious of scrapy's default?
THANKS.
0 Votes
thriveni posted almost 7 years ago Admin
From the logs can see that AUTOTHROTTLE_ENABLED is set to True. For the concurrent requests to be used autothrottle needs to be disabled.
But do note to crawl the websites too fast as that would increase the probability of being blocked by target site or your Scrapinghub account getting suspended.
Regards,
Thriveni Patil
0 Votes
Login to post a comment