Hi all, Do you know how to add pyquery to my scraper when deploying to the coud? Please see my error log below:
Login succeeded
Building an image:
Step 1/12 : FROM scrapinghub/scrapinghub-stack-scrapy:1.8-py3
# Executing 5 build trigger ---> Using cache
---> Using cache
---> Using cache
---> Using cache
---> Using cache
---> 2683923a2567
Step 2/12 : ENV PYTHONUSERBASE=/app/python
---> Using cache
---> 661d895651d3
Step 3/12 : ADD eggbased-entrypoint /usr/local/sbin/
---> Using cache
---> a3f30f89c482
Step 4/12 : ADD run-pipcheck /usr/local/bin/
---> Using cache
---> 30b672976273
Step 5/12 : RUN chmod +x /usr/local/bin/run-pipcheck
---> Using cache
---> 4a25dd48718a
Step 6/12 : RUN chmod +x /usr/local/sbin/eggbased-entrypoint && ln -sf /usr/local/sbin/eggbased-entrypoint /usr/local/sbin/start-crawl && ln -sf /usr/local/sbin/eggbased-entrypoint /usr/local/sbin/scrapy-list && ln -sf /usr/local/sbin/eggbased-entrypoint /usr/local/sbin/shub-image-info && ln -sf /usr/local/sbin/eggbased-entrypoint /usr/local/sbin/run-pipcheck
---> Using cache
---> de378e69d7ac
Step 7/12 : ADD requirements.txt /app/requirements.txt
---> 9a8ff3134203
Step 8/12 : RUN mkdir /app/python && chown nobody:nogroup /app/python
---> Running in 3feb1775d517
Removing intermediate container 3feb1775d517
---> c0c7faf25571
Step 9/12 : RUN sudo -u nobody -E PYTHONUSERBASE=$PYTHONUSERBASE -E PIP_NO_CACHE_DIR=0 pip install --user --no-cache-dir -r /app/requirements.txt
---> Running in 6579d51f87c2
�[91mWARNING: You are using pip version 19.3.1; however, version 20.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
�[0m
Removing intermediate container 6579d51f87c2
---> d7a0c5607ed9
Step 10/12 : COPY *.egg /app/
---> 4b9d5a800db5
Step 11/12 : RUN if [ -d "/app/addons_eggs" ]; then rm -f /app/*.dash-addon.egg; fi
---> Running in a87887cf5a01
Removing intermediate container a87887cf5a01
---> e4802252a6a3
Step 12/12 : ENV PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
---> Running in 7cf281b783b5
Removing intermediate container 7cf281b783b5
---> 51d81f351328
Successfully built 51d81f351328
Successfully tagged i.scrapinghub.com/kumo_project/447128:10
Step 1/3 : FROM alpine:3.5
---> f80194ae2e0c
Step 2/3 : ADD kumo-entrypoint /kumo-entrypoint
---> Using cache
---> e21044a6c922
Step 3/3 : RUN chmod +x /kumo-entrypoint
---> Using cache
---> 5704fbd802f9
Successfully built 5704fbd802f9
Successfully tagged kumo-entrypoint:latest
Entrypoint container is created successfully
>>> Checking python dependencies
Requirement already up-to-date: pip<20.0,>=9.0.3 in /usr/local/lib/python3.8/site-packages (19.3.1)
No broken requirements found.
>>> Getting spiders list:
>>> Trying to get spiders from shub-image-info command
WARNING: There're some errors on shub-image-info call:
ERROR:root:Job runtime exception
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/sh_scrapy/crawl.py", line 148, in _run_usercode
_run(args, settings)
File "/usr/local/lib/python3.8/site-packages/sh_scrapy/crawl.py", line 103, in _run
_run_scrapy(args, settings)
File "/usr/local/lib/python3.8/site-packages/sh_scrapy/crawl.py", line 111, in _run_scrapy
execute(settings=settings)
File "/usr/local/lib/python3.8/site-packages/scrapy/cmdline.py", line 145, in execute
cmd.crawler_process = CrawlerProcess(settings)
File "/usr/local/lib/python3.8/site-packages/scrapy/crawler.py", line 267, in __init__
super(CrawlerProcess, self).__init__(settings)
File "/usr/local/lib/python3.8/site-packages/scrapy/crawler.py", line 145, in __init__
self.spider_loader = _get_spider_loader(settings)
File "/usr/local/lib/python3.8/site-packages/scrapy/crawler.py", line 347, in _get_spider_loader
return loader_cls.from_settings(settings.frozencopy())
File "/usr/local/lib/python3.8/site-packages/scrapy/spiderloader.py", line 61, in from_settings
return cls(settings)
File "/usr/local/lib/python3.8/site-packages/scrapy/spiderloader.py", line 25, in __init__
self._load_all_spiders()
File "/usr/local/lib/python3.8/site-packages/scrapy/spiderloader.py", line 47, in _load_all_spiders
for module in walk_modules(name):
File "/usr/local/lib/python3.8/site-packages/scrapy/utils/misc.py", line 73, in walk_modules
submod = import_module(fullpath)
File "/usr/local/lib/python3.8/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
File "<frozen importlib._bootstrap>", line 991, in _find_and_load
File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 655, in _load_unlocked
File "<frozen importlib._bootstrap>", line 618, in _load_backward_compatible
File "<frozen zipimport>", line 259, in load_module
File "/app/__main__.egg/centris/spiders/fscr.py", line 3, in <module>
from pyquery import PyQuery
ModuleNotFoundError: No module named 'pyquery'
Traceback (most recent call last):
File "/usr/local/bin/shub-image-info", line 8, in <module>
sys.exit(shub_image_info())
File "/usr/local/lib/python3.8/site-packages/sh_scrapy/crawl.py", line 209, in shub_image_info
_run_usercode(None, ['scrapy', 'shub_image_info'] + sys.argv[1:],
File "/usr/local/lib/python3.8/site-packages/sh_scrapy/crawl.py", line 148, in _run_usercode
_run(args, settings)
File "/usr/local/lib/python3.8/site-packages/sh_scrapy/crawl.py", line 103, in _run
_run_scrapy(args, settings)
File "/usr/local/lib/python3.8/site-packages/sh_scrapy/crawl.py", line 111, in _run_scrapy
execute(settings=settings)
File "/usr/local/lib/python3.8/site-packages/scrapy/cmdline.py", line 145, in execute
cmd.crawler_process = CrawlerProcess(settings)
File "/usr/local/lib/python3.8/site-packages/scrapy/crawler.py", line 267, in __init__
super(CrawlerProcess, self).__init__(settings)
File "/usr/local/lib/python3.8/site-packages/scrapy/crawler.py", line 145, in __init__
self.spider_loader = _get_spider_loader(settings)
File "/usr/local/lib/python3.8/site-packages/scrapy/crawler.py", line 347, in _get_spider_loader
return loader_cls.from_settings(settings.frozencopy())
File "/usr/local/lib/python3.8/site-packages/scrapy/spiderloader.py", line 61, in from_settings
return cls(settings)
File "/usr/local/lib/python3.8/site-packages/scrapy/spiderloader.py", line 25, in __init__
self._load_all_spiders()
File "/usr/local/lib/python3.8/site-packages/scrapy/spiderloader.py", line 47, in _load_all_spiders
for module in walk_modules(name):
File "/usr/local/lib/python3.8/site-packages/scrapy/utils/misc.py", line 73, in walk_modules
submod = import_module(fullpath)
File "/usr/local/lib/python3.8/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
File "<frozen importlib._bootstrap>", line 991, in _find_and_load
File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 655, in _load_unlocked
File "<frozen importlib._bootstrap>", line 618, in _load_backward_compatible
File "<frozen zipimport>", line 259, in load_module
File "/app/__main__.egg/centris/spiders/fscr.py", line 3, in <module>
ModuleNotFoundError: No module named 'pyquery'
{"message": "shub-image-info exit code: 1", "details": null, "error": "image_info_error"}
Hi all,
Do you know how to add pyquery to my scraper when deploying to the coud?
Please see my error log below:
Login succeeded Building an image: Step 1/12 : FROM scrapinghub/scrapinghub-stack-scrapy:1.8-py3 # Executing 5 build trigger ---> Using cache ---> Using cache ---> Using cache ---> Using cache ---> Using cache ---> 2683923a2567 Step 2/12 : ENV PYTHONUSERBASE=/app/python ---> Using cache ---> 661d895651d3 Step 3/12 : ADD eggbased-entrypoint /usr/local/sbin/ ---> Using cache ---> a3f30f89c482 Step 4/12 : ADD run-pipcheck /usr/local/bin/ ---> Using cache ---> 30b672976273 Step 5/12 : RUN chmod +x /usr/local/bin/run-pipcheck ---> Using cache ---> 4a25dd48718a Step 6/12 : RUN chmod +x /usr/local/sbin/eggbased-entrypoint && ln -sf /usr/local/sbin/eggbased-entrypoint /usr/local/sbin/start-crawl && ln -sf /usr/local/sbin/eggbased-entrypoint /usr/local/sbin/scrapy-list && ln -sf /usr/local/sbin/eggbased-entrypoint /usr/local/sbin/shub-image-info && ln -sf /usr/local/sbin/eggbased-entrypoint /usr/local/sbin/run-pipcheck ---> Using cache ---> de378e69d7ac Step 7/12 : ADD requirements.txt /app/requirements.txt ---> 9a8ff3134203 Step 8/12 : RUN mkdir /app/python && chown nobody:nogroup /app/python ---> Running in 3feb1775d517 Removing intermediate container 3feb1775d517 ---> c0c7faf25571 Step 9/12 : RUN sudo -u nobody -E PYTHONUSERBASE=$PYTHONUSERBASE -E PIP_NO_CACHE_DIR=0 pip install --user --no-cache-dir -r /app/requirements.txt ---> Running in 6579d51f87c2 �[91mWARNING: You are using pip version 19.3.1; however, version 20.1 is available. You should consider upgrading via the 'pip install --upgrade pip' command. �[0m Removing intermediate container 6579d51f87c2 ---> d7a0c5607ed9 Step 10/12 : COPY *.egg /app/ ---> 4b9d5a800db5 Step 11/12 : RUN if [ -d "/app/addons_eggs" ]; then rm -f /app/*.dash-addon.egg; fi ---> Running in a87887cf5a01 Removing intermediate container a87887cf5a01 ---> e4802252a6a3 Step 12/12 : ENV PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin ---> Running in 7cf281b783b5 Removing intermediate container 7cf281b783b5 ---> 51d81f351328 Successfully built 51d81f351328 Successfully tagged i.scrapinghub.com/kumo_project/447128:10 Step 1/3 : FROM alpine:3.5 ---> f80194ae2e0c Step 2/3 : ADD kumo-entrypoint /kumo-entrypoint ---> Using cache ---> e21044a6c922 Step 3/3 : RUN chmod +x /kumo-entrypoint ---> Using cache ---> 5704fbd802f9 Successfully built 5704fbd802f9 Successfully tagged kumo-entrypoint:latest Entrypoint container is created successfully >>> Checking python dependencies Requirement already up-to-date: pip<20.0,>=9.0.3 in /usr/local/lib/python3.8/site-packages (19.3.1) No broken requirements found. >>> Getting spiders list: >>> Trying to get spiders from shub-image-info command WARNING: There're some errors on shub-image-info call: ERROR:root:Job runtime exception Traceback (most recent call last): File "/usr/local/lib/python3.8/site-packages/sh_scrapy/crawl.py", line 148, in _run_usercode _run(args, settings) File "/usr/local/lib/python3.8/site-packages/sh_scrapy/crawl.py", line 103, in _run _run_scrapy(args, settings) File "/usr/local/lib/python3.8/site-packages/sh_scrapy/crawl.py", line 111, in _run_scrapy execute(settings=settings) File "/usr/local/lib/python3.8/site-packages/scrapy/cmdline.py", line 145, in execute cmd.crawler_process = CrawlerProcess(settings) File "/usr/local/lib/python3.8/site-packages/scrapy/crawler.py", line 267, in __init__ super(CrawlerProcess, self).__init__(settings) File "/usr/local/lib/python3.8/site-packages/scrapy/crawler.py", line 145, in __init__ self.spider_loader = _get_spider_loader(settings) File "/usr/local/lib/python3.8/site-packages/scrapy/crawler.py", line 347, in _get_spider_loader return loader_cls.from_settings(settings.frozencopy()) File "/usr/local/lib/python3.8/site-packages/scrapy/spiderloader.py", line 61, in from_settings return cls(settings) File "/usr/local/lib/python3.8/site-packages/scrapy/spiderloader.py", line 25, in __init__ self._load_all_spiders() File "/usr/local/lib/python3.8/site-packages/scrapy/spiderloader.py", line 47, in _load_all_spiders for module in walk_modules(name): File "/usr/local/lib/python3.8/site-packages/scrapy/utils/misc.py", line 73, in walk_modules submod = import_module(fullpath) File "/usr/local/lib/python3.8/importlib/__init__.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "<frozen importlib._bootstrap>", line 1014, in _gcd_import File "<frozen importlib._bootstrap>", line 991, in _find_and_load File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 655, in _load_unlocked File "<frozen importlib._bootstrap>", line 618, in _load_backward_compatible File "<frozen zipimport>", line 259, in load_module File "/app/__main__.egg/centris/spiders/fscr.py", line 3, in <module> from pyquery import PyQuery ModuleNotFoundError: No module named 'pyquery' Traceback (most recent call last): File "/usr/local/bin/shub-image-info", line 8, in <module> sys.exit(shub_image_info()) File "/usr/local/lib/python3.8/site-packages/sh_scrapy/crawl.py", line 209, in shub_image_info _run_usercode(None, ['scrapy', 'shub_image_info'] + sys.argv[1:], File "/usr/local/lib/python3.8/site-packages/sh_scrapy/crawl.py", line 148, in _run_usercode _run(args, settings) File "/usr/local/lib/python3.8/site-packages/sh_scrapy/crawl.py", line 103, in _run _run_scrapy(args, settings) File "/usr/local/lib/python3.8/site-packages/sh_scrapy/crawl.py", line 111, in _run_scrapy execute(settings=settings) File "/usr/local/lib/python3.8/site-packages/scrapy/cmdline.py", line 145, in execute cmd.crawler_process = CrawlerProcess(settings) File "/usr/local/lib/python3.8/site-packages/scrapy/crawler.py", line 267, in __init__ super(CrawlerProcess, self).__init__(settings) File "/usr/local/lib/python3.8/site-packages/scrapy/crawler.py", line 145, in __init__ self.spider_loader = _get_spider_loader(settings) File "/usr/local/lib/python3.8/site-packages/scrapy/crawler.py", line 347, in _get_spider_loader return loader_cls.from_settings(settings.frozencopy()) File "/usr/local/lib/python3.8/site-packages/scrapy/spiderloader.py", line 61, in from_settings return cls(settings) File "/usr/local/lib/python3.8/site-packages/scrapy/spiderloader.py", line 25, in __init__ self._load_all_spiders() File "/usr/local/lib/python3.8/site-packages/scrapy/spiderloader.py", line 47, in _load_all_spiders for module in walk_modules(name): File "/usr/local/lib/python3.8/site-packages/scrapy/utils/misc.py", line 73, in walk_modules submod = import_module(fullpath) File "/usr/local/lib/python3.8/importlib/__init__.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "<frozen importlib._bootstrap>", line 1014, in _gcd_import File "<frozen importlib._bootstrap>", line 991, in _find_and_load File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 655, in _load_unlocked File "<frozen importlib._bootstrap>", line 618, in _load_backward_compatible File "<frozen zipimport>", line 259, in load_module File "/app/__main__.egg/centris/spiders/fscr.py", line 3, in <module> ModuleNotFoundError: No module named 'pyquery' {"message": "shub-image-info exit code: 1", "details": null, "error": "image_info_error"}
0 Votes
0 Comments
Login to post a comment