I want to ensure everything works using the Starter plan before I purchase the Professional plan.
My settings.py file in my scrapy project opens a file (user_agent_list.txt) to read a list of user agents which will be used to populate the USER_AGENT_LIST property. The file is located in the resources directory, which is a directory where the settings.py file is located. Here's my code snippet:
user_agent_list_directory: Path = Path(__file__).parent / "resources"
user_agent_list_file: Path = user_agent_list_directory / "user_agent_list.txt"
file: TextIO
with user_agent_list_file.open() as file:
USER_AGENT_LIST = [user_agent.rstrip('\n') for user_agent in file]
I know I can probably set the USER_AGENT_LIST string directly in my settings.py file, but it would be a little cleaner to read a file to get the user agent list.
I get the following error when I run the shub deploy <project number> command. Note, I've changed some strings to remove any identifying information:
Packing version 99be743-master
Deploying to Scrapy Cloud project "######"
Deploy log last 30 lines:
File "/usr/local/lib/python3.8/site-packages/sh_scrapy/crawl.py", line 209, in shub_image_info
_run_usercode(None, ['scrapy', 'shub_image_info'] + sys.argv[1:],
File "/usr/local/lib/python3.8/site-packages/sh_scrapy/crawl.py", line 138, in _run_usercode
settings = populate_settings(apisettings_func(), spider)
File "/usr/local/lib/python3.8/site-packages/sh_scrapy/settings.py", line 243, in populate_settings
return _populate_settings_base(apisettings, _load_default_settings, spider)
File "/usr/local/lib/python3.8/site-packages/sh_scrapy/settings.py", line 172, in _populate_settings_base
settings = get_project_settings().copy()
File "/usr/local/lib/python3.8/site-packages/scrapy/utils/project.py", line 69, in get_project_settings
settings.setmodule(settings_module_path, priority='project')
File "/usr/local/lib/python3.8/site-packages/scrapy/settings/__init__.py", line 287, in setmodule
module = import_module(module)
File "/usr/local/lib/python3.8/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
File "<frozen importlib._bootstrap>", line 991, in _find_and_load
File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 783, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/tmp/unpacked-eggs/__main__.egg/my_project_name/settings.py", line 24, in <module>
with user_agent_list_file.open() as file:
File "/usr/local/lib/python3.8/pathlib.py", line 1213, in open
return io.open(self, mode, buffering, encoding, errors, newline,
File "/usr/local/lib/python3.8/pathlib.py", line 1069, in _opener
return self._accessor.open(self, flags, mode)
NotADirectoryError: [Errno 20] Not a directory: '/tmp/unpacked-eggs/__main__.egg/my_project_name/settings.py/my_project_name/resources/user_agent_list.txt'
{"message": "shub-image-info exit code: 1", "details": null, "error": "image_info_error"}
{"status": "error", "message": "Internal error"}
Deploy log location: /tmp/shub_deploy_xzmb457v.log
Error: Deploy failed: b'{"status": "error", "message": "Internal error"}'
This is the only issue in my deploy because when I replace the above code snippet with the following code snippet which sets the USER_AGENT instead:
USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36'
I want to ensure everything works using the Starter plan before I purchase the Professional plan.
My settings.py file in my scrapy project opens a file (user_agent_list.txt) to read a list of user agents which will be used to populate the USER_AGENT_LIST property. The file is located in the resources directory, which is a directory where the settings.py file is located. Here's my code snippet:
I know I can probably set the USER_AGENT_LIST string directly in my settings.py file, but it would be a little cleaner to read a file to get the user agent list.
I get the following error when I run the shub deploy <project number> command. Note, I've changed some strings to remove any identifying information:
This is the only issue in my deploy because when I replace the above code snippet with the following code snippet which sets the USER_AGENT instead:
The entire deploy works. Please help. Thanks!
0 Votes
0 Comments
Login to post a comment