Learn all about the latest trends and best practices in data extraction - Join us at Extract SummitGet tickets
Start a new topic

Can't deploy to Scrapy cloud, problem with custom modules

I am attempting to deploy a spider to Scrapy cloud, but I am repeatedly running into requirements problems. I am using Python 3.5. My `scrapinghub.yml` file contains these lines: projects: default: 358310 stacks: default: scrapy:1.3-py3 requirements: file: requirements.txt My `requirements.txt` file contains these lines: geth_doc_miner==1.0 geth_feature_detector==1.0 geth_indexer==1.0 geth_synset==1.0 indexer==1.0 comparse==1.0 file_io==1.0 This is the error I keep getting: 

Collecting geth_doc_miner-python==1.0 (from -r /app/requirements.txt (line 1))
�[91m  Could not find a version that satisfies the requirement geth_doc_miner-python==1.0 (from -r /app/requirements.txt (line 1)) (from versions: )
�[0m
�[91mNo matching distribution found for geth_doc_miner-python==1.0 (from -r /app/requirements.txt (line 1))
�[0m
�[91mYou are using pip version 9.0.3, however version 18.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
�[0m
{"message": "The command '/bin/sh -c sudo -u nobody -E PYTHONUSERBASE=$PYTHONUSERBASE     pip install --user --no-cache-dir -r /app/requirements.txt' returned a non-zero code: 1", "details": {"message": "The command '/bin/sh -c sudo -u nobody -E PYTHONUSERBASE=$PYTHONUSERBASE     pip install --user --no-cache-dir -r /app/requirements.txt' returned a non-zero code: 1", "code": 1}, "error": "requirements_error"}


 Where am I going wrong? BTW, I have the latest version of pip installed (contrary to what the error message states).


Those are all custom python scripts that contain methods and functions that the spider calls on. For instance, if the spider comes across a PDF file, it’ll call a PDF parser from a separate script to parse the file.


I read the help files about deploying custom Python scripts to scrapinghub, but it does not state how to call a custom script from another, for instance, if I had a custom Python script called module_B which called a method from module_A like this:

from module_A import method_A 

Do I simply call modules like above or is there anything else I have to do?

I couldn't find any of those packages in pypi, that's why pip fails trying to install it in your Scrapy Cloud project.


Note: I've trimmed your log to the relevant part, please use code block next time.

Login to post a comment