Start a new topic

How to prevent Splash sending its default headers i.e. 'Host'?

I had just deployed Splash (in Docker) like a month ago on my dedicated server.


I am trying to scrape [this site]( with Scrapy Splash, but I get following error no matter how many time I try that url


`([scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET via http://localhost:8050/render.html> (failed 1 times): User timeout caused connection failure: Getting http://localhost:8050/render.html took longer than 80.0 seconds..)`


Meanwhile, same Splash server successfully scrapes every site I try.


If I try to cURL or `scrapy.Request` the above url from my server, it works, the site does not block no matter how many times I scrape via cURL or `scrapy.Request`


Then I had idea to see if there are some headers Splash is sending, I debugged Splash request headers via and found out that it automatically adds few headers


So now I know that Splash is sending `"Host": ""` to the target site, which makes that website not scrape.


Question is, how do I make Splash not send any headers automatically? Or at least stop Splash from sending `Host` header?


Login to post a comment