Deploying non-code files

Modified on Wed, 3 Feb, 2021 at 7:20 AM

You need to declare the files in the package_data  section of your setup.py  file.


For example, if your Scrapy project has the following structure:


myproject/
  __init__.py
  settings.py
  resources/
    cities.txt
scrapy.cfg
setup.py


You would use the following in your setup.py  to include the cities.txt  file:


from setuptools import setup, find_packages

setup(
    name='myproject',
    version='1.0',
    packages=find_packages(),
    package_data={
        'myproject': ['resources/*.txt']
    },
    entry_points={
        'scrapy': ['settings = myproject.settings']
    },
    zip_safe=False,
)


NOTE 1: The zip_safe flag is set to False , as this may be needed in some cases.

NOTE 2: Please ensure that the name of your project doesn't overlap with existing package names, such as scrapy, as a crawler is a regular python package and it may cause conflicts/unwanted side-effects when deploying.


Now you can access the cities.txt  file content in the spider code like this:


import pkgutil

data = pkgutil.get_data("myproject", "resources/cities.txt")


Note that this code works for the example Scrapy project structure defined at the beginning of the article. If your project has different structure - you will need to adjust package_data section and your code accordingly.


For advanced resource access take a look at setuptools pkg_resources module.

Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select at least one of the reasons
CAPTCHA verification is required.

Feedback sent

We appreciate your effort and will try to fix the article