You need to declare the files in the package_data
section of your setup.py
file.
For example, if your Scrapy project has the following structure:
myproject/ __init__.py settings.py resources/ cities.txt scrapy.cfg setup.py
You would use the following in your setup.py
to include the cities.txt
file:
from setuptools import setup, find_packages setup( name='myproject', version='1.0', packages=find_packages(), package_data={ 'myproject': ['resources/*.txt'] }, entry_points={ 'scrapy': ['settings = myproject.settings'] }, zip_safe=False, )
NOTE 1: The zip_safe
flag is set to False
, as this may be needed in some cases.
NOTE 2: Please ensure that the name of your project doesn't overlap with existing package names, such as scrapy, as a crawler is a regular python package and it may cause conflicts/unwanted side-effects when deploying.
Now you can access the cities.txt
file content in the spider code like this:
import pkgutil data = pkgutil.get_data("myproject", "resources/cities.txt")
Note that this code works for the example Scrapy project structure defined at the beginning of the article. If your project has different structure - you will need to adjust package_data
section and your code accordingly.
For advanced resource access take a look at setuptools pkg_resources module.
Was this article helpful?
That’s Great!
Thank you for your feedback
Sorry! We couldn't be helpful
Thank you for your feedback
Feedback sent
We appreciate your effort and will try to fix the article