Start a new topic
Answered

ScrapyCloud can't call spider?

For some reason, it is searching for a Python 3.6 directory specifically, when it should be able to search for any 3.x Python directory. My spider is written on Python 3.5, and this is an issue. Scrapinghub says that identifying "scrapy:1.4-py3" will work for 3.x Python set, but this is obviously not true.

Also, for some reason, it can't seem to find my spider in the project. Is this related to the issue with the 3.6 directory.

Finally, I have installed everything needed from the requirements file. Below is my code: ;

C:\Users\martin.ortega\Desktop\Empery Code\YahooScrape>shub deploy
Packing version 1.0
Deploying to Scrapy Cloud project "205357"
Deploy log last 30 lines:

Deploy log location: C:\Users\MARTIN~1.ORT\AppData\Local\Temp\shub_deploy_of5_m4
qg.log
Error: Deploy failed: b'{"status": "error", "message": "Internal build error"}'
    _run(args, settings)
  File "/usr/local/lib/python3.6/site-packages/sh_scrapy/crawl.py", line 103, in
 _run
    _run_scrapy(args, settings)
  File "/usr/local/lib/python3.6/site-packages/sh_scrapy/crawl.py", line 111, in
 _run_scrapy
    execute(settings=settings)
  File "/usr/local/lib/python3.6/site-packages/scrapy/cmdline.py", line 148, in
execute
    cmd.crawler_process = CrawlerProcess(settings)
  File "/usr/local/lib/python3.6/site-packages/scrapy/crawler.py", line 243, in
__init__
    super(CrawlerProcess, self).__init__(settings)
  File "/usr/local/lib/python3.6/site-packages/scrapy/crawler.py", line 134, in
__init__
    self.spider_loader = _get_spider_loader(settings)
  File "/usr/local/lib/python3.6/site-packages/scrapy/crawler.py", line 330, in
_get_spider_loader
    return loader_cls.from_settings(settings.frozencopy())
  File "/usr/local/lib/python3.6/site-packages/scrapy/spiderloader.py", line 61,
 in from_settings
    return cls(settings)
  File "/usr/local/lib/python3.6/site-packages/scrapy/spiderloader.py", line 25,
 in __init__
    self._load_all_spiders()
  File "/usr/local/lib/python3.6/site-packages/scrapy/spiderloader.py", line 47,
 in _load_all_spiders
    for module in walk_modules(name):
  File "/usr/local/lib/python3.6/site-packages/scrapy/utils/misc.py", line 63, i
n walk_modules
    mod = import_module(path)
  File "/usr/local/lib/python3.6/importlib/__init__.py", line 126, in import_mod
ule
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 978, in _gcd_import
  File "<frozen importlib._bootstrap>", line 961, in _find_and_load
  File "<frozen importlib._bootstrap>", line 948, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'YahooScrape.spiders'
{"message": "list-spiders exit code: 1", "details": null, "error": "build_error"
}
{"status": "error", "message": "Internal build error"}

Best Answer
$ tree
.
├── YahooScrape
│   ├── __init__.py
│   ├── items.py
│   ├── middlewares.py
│   ├── pipelines.py
│   ├── settings.py
│   └── spiders
│       ├── yahoo.py
│       └── __init__.py
├── requirements.txt
├── scrapinghub.yml
├── scrapy.cfg
└── setup.py

Pay special attention to YahooScrape/spiders/. It should contain a __init__.py file (an empty one is fine), and your different spiders, usually as seperate .py files. Otherwise YahooScrape.spiders cannot be understood as a Python module, hence the "ModuleNotFoundError: No module named 'YahooScrape.spiders'" message.

1 Comment

Answer
$ tree
.
├── YahooScrape
│   ├── __init__.py
│   ├── items.py
│   ├── middlewares.py
│   ├── pipelines.py
│   ├── settings.py
│   └── spiders
│       ├── yahoo.py
│       └── __init__.py
├── requirements.txt
├── scrapinghub.yml
├── scrapy.cfg
└── setup.py

Pay special attention to YahooScrape/spiders/. It should contain a __init__.py file (an empty one is fine), and your different spiders, usually as seperate .py files. Otherwise YahooScrape.spiders cannot be understood as a Python module, hence the "ModuleNotFoundError: No module named 'YahooScrape.spiders'" message.

Login to post a comment