See shub documentation for the custom Docker image deployment instructions.
Chrome
Dockerfile Example for Chrome
Replace <PROJECT_NAME> in ENV SCRAPY_SETTINGS_MODULE instruction with the actual name of the Scrapy project being deployed.
Spider Code Example for Chrome
Firefox
Dockerfile Example for Firefox
Alternatively, for using Firefox with Python 3:
FROM scrapinghub/scrapinghub-stack-scrapy:1.8-py3
RUN apt-get update
RUN apt-get install -y unzip procps
RUN printf "deb [trusted=yes] http://archive.debian.org/debian/ jessie main\ndeb-src [trusted=yes] http://archive.debian.org/debian/ jessie main\ndeb [trusted=yes] http://security.debian.org jessie/updates main\ndeb-src [trusted=yes] http://security.debian.org jessie/updates main" > /etc/apt/sources.list
#============================================
# Firefox and Geckodriver
#============================================
RUN apt-get update \
&& apt-get install -y --no-install-recommends \
ca-certificates curl firefox-esr \
&& rm -fr /var/lib/apt/lists/* \
&& curl -L https://github.com/mozilla/geckodriver/releases/download/v0.29.1/geckodriver-v0.29.1-linux64.tar.gz | tar xz -C /usr/local/bin \
&& apt-get purge -y ca-certificates curl ENV TERM xterm ENV SCRAPY_SETTINGS_MODULE <PROJECT_NAME>.settings RUN mkdir -p /app WORKDIR /app COPY ./requirements.txt /app/requirements.txt RUN pip install --no-cache-dir -r requirements.txt COPY . /app RUN python setup.py install
Replace <PROJECT_NAME> in ENV SCRAPY_SETTINGS_MODULE instruction with the actual name of the Scrapy project being deployed.
Spider Code Example for Firefox
Quick Start Steps
- For a jump start, download https://github.com/scrapinghub/sample-projects/tree/master/sc_custom_image.
- Create a Dockerfile in sc_custom_image root folder (where scrapy.cfg is), copy/paste the content of either Dockerfile example above, and replace <PROJECT_NAME> with sc_custom_image.
- Update scrapinghub.yml with the numerical ID of the Scrapy Cloud project that will contain the spider being deployed.
- Create setup.py file, taking this one as an example: https://github.com/scrapinghub/sample-projects/blob/master/splash_crawlera_example/setup.py (omit package_data declaration line).
- Replace demo.py located at sc_custom_image/sc_custom_image/spiders with the content of either spider code example above.
- Run shub deploy.
Was this article helpful?
That’s Great!
Thank you for your feedback
Sorry! We couldn't be helpful
Thank you for your feedback
Feedback sent
We appreciate your effort and will try to fix the article