I used NLTK package in spider pipeline file. However, the NLTK dependency data is not downloaded in scrapinghub cloud. In local python, we just use nltk.download() to download them. Any way to download the NLTK data on scrapinghub? I paste the processing error as below.
Traceback (most recent call last):
File "/app/python/lib/python3.6/site-packages/sumy/nlp/tokenizers.py", line 79, in _get_sentence_tokenizer
return nltk.data.load(path)
File "/app/python/lib/python3.6/site-packages/nltk/data.py", line 836, in load
opened_resource = _open(resource_url)
File "/app/python/lib/python3.6/site-packages/nltk/data.py", line 954, in _open
return find(path_, path + ['']).open()
File "/app/python/lib/python3.6/site-packages/nltk/data.py", line 675, in find
raise LookupError(resource_not_found)
LookupError:
**********************************************************************
Resource �[93mpunkt�[0m not found.
Please use the NLTK Downloader to obtain the resource:
�[31m>>> import nltk
>>> nltk.download('punkt')
�[0m
Searched in:
- '/scrapinghub/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
- '/usr/local/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/local/lib/nltk_data'
- ''
0 Votes
2 Comments
Sorted by
J
Juan Idroboposted
over 4 years ago
Hi, I have the same problem, the error output is :
Resource �[93mpunkt�[0m not found.
Please use the NLTK Downloader to obtain the resource:
�[31m>>> import nltk
>>> nltk.download('punkt')
�[0m
Searched in:
- '/scrapinghub/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
- '/usr/local/nltk_data'
- '/usr/local/lib/nltk_data'
I am using requirements.txt Where to put this command to install this nltkmodule??
0 Votes
j
jwaterschootposted
about 6 years ago
How are you deploying? With a requirements.txt file or did you make your own Docker image containing this data?
I used NLTK package in spider pipeline file. However, the NLTK dependency data is not downloaded in scrapinghub cloud. In local python, we just use nltk.download() to download them. Any way to download the NLTK data on scrapinghub? I paste the processing error as below.
0 Votes
2 Comments
Juan Idrobo posted over 4 years ago
Hi, I have the same problem, the error output is :
I am using requirements.txt
Where to put this command to install this nltkmodule??
0 Votes
jwaterschoot posted about 6 years ago
How are you deploying? With a requirements.txt file or did you make your own Docker image containing this data?
0 Votes
Login to post a comment