videocamWeb Data Extraction Summit - September 30th, 2021.
Join some of the greatest minds in web scraping to educate, inspire, and innovate.
Register for free!
Start a new topic

How can I use pia-openvpn Docker Image to incorporate PIA VPN into my Scrapy project?

I came across this docker image itsdaspecialk/pia-openvpn that looks like it will allow me to use my existing Private Internet Access VPN subscription to mask the IP of my scrapy-cloud instances, since some sites naturely block scrapinghub/zyte's ips. I've been reviewing the shub documentation on Deploying custom Docker Images and Custom Images Contract. However, I'm not well-versed in how to work with docker images, so I was hoping that someone could provide some sort of instructions-for-dummies walkthrough.


My current scrapinghub.yml for the project deployment looks like this:

project: myprojectid
apikey: myapikey
image: false
stack: scrapy:2.3
requirements:
file: ./scrapinghub_requirements.txt

The best I've come up with so far is below, but it's untested as I've yet to install docker locally to attempt to compile it. This was actually adapted mainly from two sources: a Zyte support article on deploying custom images (here) and a post (here) on the Private Internet Access (PIA) forums discussing how to configure openvpn, so that it can connect and authentic (unprompted) with PIA credentials.

 

FROM scrapinghub/scrapinghub-stack-scrapy:2.3
RUN apt-get update
RUN apt-get upgrade -y

RUN apt-get install -y openvpn \
    && mkdir /pia
COPY ./us_new_york.ovpn /pia/us_new_york.ovpn
COPY ./.secrets /pia/.secrets
RUN chmod 600 .secrets \
    && openvpn /pia/us_new_york.ovpn

ENV TERM xterm
ENV SCRAPY_SETTINGS_MODULE daapy.settings
RUN mkdir -p /app
WORKDIR /app
COPY ./scrapinghub_requirements.txt /app/requirements.txt
RUN pip install --no-cache-dir -r requirements.txt
COPY . /app
RUN python setup.py install

 

Login to post a comment