You need to declare the files in the package_data
section of your setup.py
file.
For example, if your Scrapy project has the following structure:
myproject/ __init__.py settings.py resources/ cities.txt scrapy.cfg setup.py
You would use the following in your setup.py
to include the cities.txt
file:
from setuptools import setup, find_packages setup( name='myproject', version='1.0', packages=find_packages(), package_data={ 'myproject': ['resources/*.txt'] }, entry_points={ 'scrapy': ['settings = myproject.settings'] }, zip_safe=False, )
NOTE 1: The zip_safe
flag is set to False
, as this may be needed in some cases.
NOTE 2: Please ensure that the name of your project doesn't overlap with existing package names, such as scrapy, as a crawler is a regular python package and it may cause conflicts/unwanted side-effects when deploying.
Now you can access the cities.txt
file content in the spider code like this:
import pkgutil data = pkgutil.get_data("myproject", "resources/cities.txt")
Note that this code works for the example Scrapy project structure defined at the beginning of the article. If your project has different structure - you will need to adjust package_data
section and your code accordingly.
For advanced resource access take a look at setuptools pkg_resources module.