This is my spider code:
import re import scrapy class TwitterSpiderSpider(scrapy.Spider): name = 'twitter_spider' allowed_domains = ['twitter.com'] start_urls = ['https://twitter.com'] def start_requests(self): url = 'https://twitter.com/account/begin_password_reset?account_identifier=starksaya' yield scrapy.Request(url,callback=self.parse, dont_filter=True) def parse(self, response): token = response.xpath("//input[@type='hidden']/@value").extract_first() print(token) print("&"*100) re_name = re.match(r".*account_identifier=(.*)", response.url) if re_name: name = re_name.group(1) post_data = { "authenticity_token": token, "account_identifier": name } yield scrapy.FormRequest( "https://twitter.com/account/begin_password_reset", formdata=post_data, callback=self.parse_detail, dont_filter=True) def parse_detail(self,response): print(response.text)
Setting profile:
DOWNLOAD_DELAY = 5 COOKIES_ENABLED = True DOWNLOADER_MIDDLEWARES = {'scrapy_crawlera.CrawleraMiddleware': 300} CRAWLERA_ENABLED = True CRAWLERA_APIKEY = 'ad9defxxxxxxxxxxxx'
Start crawler
No crawling task until the request timeout ends.
But I can crawl normally with other proxy services.
Where is my configuration wrong?
This problem has plagued me for several days.
I hope that you can solve this problem.
remove use-https header, that header is deprecated.
python 3.6.3
requests 2.20
ArjunPython