Meanwhile, same Splash server successfully scrapes every site I try.
If I try to cURL or `scrapy.Request` the above url from my server, it works, the site does not block no matter how many times I scrape via cURL or `scrapy.Request`
Then I had idea to see if there are some headers Splash is sending, I debugged Splash request headers via http://httpbin.org/get and found out that it automatically adds few headers
So now I know that Splash is sending `"Host": "businesswire.com"` to the target site, which makes that website not scrape.
Question is, how do I make Splash not send any headers automatically? Or at least stop Splash from sending `Host` header?
Umair Ayub
I had just deployed Splash (in Docker) like a month ago on my dedicated server.
I am trying to scrape [this site](https://www.businesswire.com/portal/site/home/news/subject/?vnsId=31333) with Scrapy Splash, but I get following error no matter how many time I try that url
`([scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.businesswire.com/portal/site/home/news/subject/?vnsId=31333 via http://localhost:8050/render.html> (failed 1 times): User timeout caused connection failure: Getting http://localhost:8050/render.html took longer than 80.0 seconds..)`
Meanwhile, same Splash server successfully scrapes every site I try.
If I try to cURL or `scrapy.Request` the above url from my server, it works, the site does not block no matter how many times I scrape via cURL or `scrapy.Request`
Then I had idea to see if there are some headers Splash is sending, I debugged Splash request headers via http://httpbin.org/get and found out that it automatically adds few headers
So now I know that Splash is sending `"Host": "businesswire.com"` to the target site, which makes that website not scrape.
Question is, how do I make Splash not send any headers automatically? Or at least stop Splash from sending `Host` header?