You can use Splash with Zyte Smart Proxy Manager(formerly Crawlera) to render JavaScript and proxy all requests issued from Splash. This may be necessary if your uses Splash heavily and target website throttles or blocks requests from Splash.
How to do it?
You need to send your requests to Splash. Splash must proxy its requests via Smart Proxy Manager.
This is best achieved by using Splash /execute
endpoint. You need to create Lua script that will tell Splash to use the proxy for requests. Splash provides splash:on_request
callback function that can be used for this purpose.
function main(splash)
local host = "proxy.zyte.com"
local port = 8010
local user = "<API key>"
local password = ""
local session_header = "X-Crawlera-Session"
local session_id = "create"
splash:on_request(function (request)
request:set_header("X-Crawlera-Profile", "desktop")
request:set_header(session_header, session_id)
request:set_proxy{host, port, username=user, password=password}
end)
splash:on_response_headers(function (response)
if response.headers[session_header] ~= nil then
session_id = response.headers[session_header]
end
end)
splash:go(splash.args.url)
return splash:png()
end
The previous example rendered a page as a PNG image, and the binary content is returned in the HTTP request. The /execute
endpoint reads the automation script in the lua_source
parameter (which is a string containing the full script).
An example (using Python Requests library):
# coding: utf-8
import requests
splash_server = 'http://0.0.0.0:8050'
url = "https://twitter.com"
with open('crawlera-splash.lua') as lua:
lua_source = ''.join(lua.readlines())
splash_url = '{}/execute'.format(splash_server)
r = requests.post(
splash_url,
json={
'lua_source': lua_source,
'url': url,
},
timeout=100,
)
fp = open("crawlera-splash.png", "wb")
fp.write(r.content)
fp.close()
Note: in the previous Python script Splash was running at address 0.0.0.0, i.e. Splash was launched
from the Docker container.