Start a new topic

Scrapy Cloud Technical Details

 I'm currently underway with a fairly heavy web scraping project, which involves a blind traversal of a few thousand domains in order to find certain downloadable files somewhere therein.  My questions are as follows:

  • I noticed that the Smart Proxy Manager is automatically integrated with spiders deployed through Scrapy Cloud.  Are there restrictions on the number of concurrent requests permitted for a given spider?  I would need at least 25-50 concurrent requests/minute.
  • On a similar chord, is there a bandwidth limit on requests?
  • The way in which I traverse websites uses an informed heuristic based on keywords;  based on this heuristic, I perform a best first search through the website in order to minimize the work needed to find the files I want.  Is Scrapy capable of traversing documents in this atypical way?  It seems as though Scrapy spiders have a somewhat rigid structure, and I'm skeptical as to whether it can work in the way I want.
If anyone is able to advise me on these issues I've presented, I'd appreciate it a lot.

Much thanks.

Login to post a comment