Here's how far down the rabbit hole I've been so far:
The best I've come up with so far is below, but it's untested as I've yet to install docker locally to attempt to compile it. This was actually adapted mainly from two sources: a Zyte support article on deploying custom images (here) and a post (here) on the Private Internet Access (PIA) forums discussing how to configure openvpn, so that it can connect and authentic (unprompted) with PIA credentials.
FROM scrapinghub/scrapinghub-stack-scrapy:2.3 RUN apt-get update RUN apt-get upgrade -y RUN apt-get install -y openvpn \ && mkdir /pia COPY ./us_new_york.ovpn /pia/us_new_york.ovpn COPY ./.secrets /pia/.secrets RUN chmod 600 .secrets \ && openvpn /pia/us_new_york.ovpn ENV TERM xterm ENV SCRAPY_SETTINGS_MODULE daapy.settings RUN mkdir -p /app WORKDIR /app COPY ./scrapinghub_requirements.txt /app/requirements.txt RUN pip install --no-cache-dir -r requirements.txt COPY . /app RUN python setup.py install
Michael Hill
I came across this docker image itsdaspecialk/pia-openvpn that looks like it will allow me to use my existing Private Internet Access VPN subscription to mask the IP of my scrapy-cloud instances, since some sites naturely block scrapinghub/zyte's ips. I've been reviewing the shub documentation on Deploying custom Docker Images and Custom Images Contract. However, I'm not well-versed in how to work with docker images, so I was hoping that someone could provide some sort of instructions-for-dummies walkthrough.
My current scrapinghub.yml for the project deployment looks like this: