videocamWeb Data Extraction Summit - September 30th, 2021.
Join some of the greatest minds in web scraping to educate, inspire, and innovate.
Register for free!
Start a new topic
Answered

Cannot deploy spider using urllib

Hi trying to deploy a spider that uses the urllib python library (python 2.7). added the requirements.txt file to scrapinghub.yml and "urllib" to requirements.txt but I am getting this when I deploy. See below.



Configs-MacBook-Pro:eric_spider bpecheux$ shub deploy 375014
Packing version 1.0
Deploying to Scrapy Cloud project "375014"
Deploy log last 30 lines:
 ---> Using cache
 ---> 3327d201e5b8
Step 4/12 : ADD run-pipcheck /usr/local/bin/
 ---> Using cache
 ---> 0f1bb3083976
Step 5/12 : RUN chmod +x /usr/local/bin/run-pipcheck
 ---> Using cache
 ---> 7ef8537c8057
Step 6/12 : RUN chmod +x /usr/local/sbin/eggbased-entrypoint &&     ln -sf /usr/local/sbin/eggbased-entrypoint /usr/local/sbin/start-crawl &&     ln -sf /usr/local/sbin/eggbased-entrypoint /usr/local/sbin/scrapy-list &&     ln -sf /usr/local/sbin/eggbased-entrypoint /usr/local/sbin/shub-image-info &&     ln -sf /usr/local/sbin/eggbased-entrypoint /usr/local/sbin/run-pipcheck
 ---> Using cache
 ---> 87d8a61b7e28
Step 7/12 : ADD requirements.txt /app/requirements.txt
 ---> 066d9f3f675c
Step 8/12 : RUN mkdir /app/python && chown nobody:nogroup /app/python
 ---> Running in 583fa173b54c
Removing intermediate container 583fa173b54c
 ---> 0b229e6d34d8
Step 9/12 : RUN sudo -u nobody -E PYTHONUSERBASE=$PYTHONUSERBASE     pip install --user --no-cache-dir -r /app/requirements.txt
 ---> Running in 07fbc5d93822
Collecting urllib (from -r /app/requirements.txt (line 1))
  Could not find a version that satisfies the requirement urllib (from -r /app/requirements.txt (line 1)) (from versions: )

No matching distribution found for urllib (from -r /app/requirements.txt (line 1))

You are using pip version 18.1, however version 19.0.2 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.

{"message": "The command '/bin/sh -c sudo -u nobody -E PYTHONUSERBASE=$PYTHONUSERBASE     pip install --user --no-cache-dir -r /app/requirements.txt' returned a non-zero code: 1", "details": {"message": "The command '/bin/sh -c sudo -u nobody -E PYTHONUSERBASE=$PYTHONUSERBASE     pip install --user --no-cache-dir -r /app/requirements.txt' returned a non-zero code: 1", "code": 1}, "error": "requirements_error"}

{"status": "error", "message": "Requirements error"}
Deploy log location: /var/folders/wj/0tf8pc215x9f_ybkv4mr0p040000gp/T/shub_deploy_SE6Vqg.log
Error: Deploy failed: {"status": "error", "message": "Requirements error"}

 


Not quite sure what is going on, I am guessing different version of python being use on Scrapinghub. but urllib should be available. The spider works well locally on mac using Python 2.7.15 |Anaconda


 

Configs-MacBook-Pro:eric_spider bpecheux$ scrapy crawl journal_spider -t csv -o test.csv
2019-02-17 12:36:34 [scrapy.utils.log] INFO: Scrapy 1.6.0 started (bot: eric_spider)
2019-02-17 12:36:34 [scrapy.utils.log] INFO: Versions: lxml 4.3.0.0, libxml2 2.9.9, cssselect 1.0.3, parsel 1.5.1, w3lib 1.20.0, Twisted 17.5.0, Python 2.7.15 |Anaconda, Inc.| (default, Dec 14 2018, 13:10:39) - [GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)], pyOpenSSL 19.0.0 (OpenSSL 1.1.1a  20 Nov 2018), cryptography 2.5, Platform Darwin-18.2.0-x86_64-i386-64bit
2019-02-17 12:36:34 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'eric_spider.spiders', 'FEED_URI': 'test.csv', 'SPIDER_MODULES': ['eric_spider.spiders'], 'BOT_NAME': 'eric_spider', 'ROBOTSTXT_OBEY': True, 'FEED_FORMAT': 'csv', 'AUTOTHROTTLE_ENABLED': True}
2019-02-17 12:36:34 [scrapy.extensions.telnet] INFO: Telnet Password: bfce59df74f65d4e
2019-02-17 12:36:34 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.feedexport.FeedExporter',
 'scrapy.extensions.memusage.MemoryUsage',
 'scrapy.extensions.logstats.LogStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.throttle.AutoThrottle']
2019-02-17 12:36:35 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware',
 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2019-02-17 12:36:35 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2019-02-17 12:36:35 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2019-02-17 12:36:35 [scrapy.core.engine] INFO: Spider opened
2019-02-17 12:36:35 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2019-02-17 12:36:35 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2019-02-17 12:36:36 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://eric.ed.gov/robots.txt> (referer: None)
2019-02-17 12:36:40 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://eric.ed.gov/?journals> (referer: None)
2019-02-17 12:36:42 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://eric.ed.gov/?nonjournals> from <GET http://eric.ed.gov/?nonjournals>
2019-02-17 12:36:45 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://eric.ed.gov/?q=source%3A%22AASA+Journal+of+Scholarship+%26+Practice%22> from <GET http://eric.ed.gov/?q=source%3A%22AASA+Journal+of+Scholarship+%26+Practice%22>
2019-02-17 12:36:47 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://eric.ed.gov/?q=source%3A%22Applied+Linguistics%22> from <GET http://eric.ed.gov/?q=source%3A%22Applied+Linguistics%22>
2019-02-17 12:36:49 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://eric.ed.gov/?q=source%3A%22Applied+Language+Learning%22> from <GET http://eric.ed.gov/?q=source%3A%22Applied+Language+Learning%22>
2019-02-17 12:36:51 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://eric.ed.gov/?q=source%3A%22Applied+Environmental+Education+and+Communication%22> from <GET http://eric.ed.gov/?q=source%3A%22Applied+Environmental+Education+and+Communication%22>
2019-02-17 12:36:54 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://eric.ed.gov/?q=source%3A%22Applied+Developmental+Science%22> from <GET http://eric.ed.gov/?q=source%3A%22Applied+Developmental+Science%22>
2019-02-17 12:36:57 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://eric.ed.gov/?q=source%3A%22Anthropology+%26+Education+Quarterly%22> from <GET http://eric.ed.gov/?q=source%3A%22Anthropology+%26+Education+Quarterly%22>
2019-02-17 12:36:59 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://eric.ed.gov/?q=source:%22Annual+Review+of+Economics%22> from <GET http://eric.ed.gov/?q=source:%22Annual+Review+of+Economics%22>
2019-02-17 12:37:01 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://eric.ed.gov/?q=source%3A%22Annals+of+Dyslexia%22> from <GET http://eric.ed.gov/?q=source%3A%22Annals+of+Dyslexia%22>
2019-02-17 12:37:03 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://eric.ed.gov/?q=source%3A%22Anatomical+Sciences+Education%22> from <GET http://eric.ed.gov/?q=source%3A%22Anatomical+Sciences+Education%22>
2019-02-17 12:37:04 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://eric.ed.gov/?q=source%3A%22Analysis+of+Verbal+Behavior%22> from <GET http://eric.ed.gov/?q=source%3A%22Analysis+of+Verbal+Behavior%22>
2019-02-17 12:37:07 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://eric.ed.gov/?q=source%3A%22American+Journal+on+Intellectual+and+Developmental+Disabilities%22> from <GET http://eric.ed.gov/?q=source%3A%22American+Journal+on+Intellectual+and+Developmental+Disabilities%22>
2019-02-17 12:37:10 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://eric.ed.gov/?q=source%3A%22American+Journal+of+Sexuality+Education%22> from <GET http://eric.ed.gov/?q=source%3A%22American+Journal+of+Sexuality+Education%22>
2019-02-17 12:37:12 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://eric.ed.gov/?q=source%3A%22American+Journal+of+Play%22> from <GET http://eric.ed.gov/?q=source%3A%22American+Journal+of+Play%22>
2019-02-17 12:37:15 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://eric.ed.gov/?q=source%3A%22American+Journal+of+Health+Education%22> from <GET http://eric.ed.gov/?q=source%3A%22American+Journal+of+Health+Education%22>
2019-02-17 12:37:17 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://eric.ed.gov/?q=source%3A%22American+Journal+of+Evaluation%22> from <GET http://eric.ed.gov/?q=source%3A%22American+Journal+of+Evaluation%22>
2019-02-17 12:37:20 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://eric.ed.gov/?nonjournals> (referer: None)
2019-02-17 12:37:22 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://eric.ed.gov/?q=source%3A%22AASA+Journal+of+Scholarship+%26+Practice%22> (referer: None)
2019-02-17 12:37:22 [scrapy.dupefilters] DEBUG: Filtered duplicate request: <GET https://eric.ed.gov/?q=source%3A%22AASA+Journal+of+Scholarship+%26+Practice%22> - no more duplicates will be shown (see DUPEFILTER_DEBUG to show all duplicates)
2019-02-17 12:37:23 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://eric.ed.gov/?q=source%3A%22Applied+Linguistics%22> (referer: None)
2019-02-17 12:37:23 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://eric.ed.gov/?q=source%3A%22Applied+Language+Learning%22> (referer: None)
2019-02-17 12:37:24 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://eric.ed.gov/?q=source%3A%22Applied+Environmental+Education+and+Communication%22> (referer: None)
2019-02-17 12:37:24 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://eric.ed.gov/?q=source%3A%22Applied+Developmental+Science%22> (referer: None)
2019-02-17 12:37:24 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://eric.ed.gov/?q=source%3A%22Anthropology+%26+Education+Quarterly%22> (referer: None)
2019-02-17 12:37:25 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://eric.ed.gov/?q=source:%22Annual+Review+of+Economics%22> (referer: None)
2019-02-17 12:37:25 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://eric.ed.gov/?q=source%3A%22Annals+of+Dyslexia%22> (referer: None)
2019-02-17 12:37:25 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://eric.ed.gov/?q=source%3A%22Anatomical+Sciences+Education%22> (referer: None)
2019-02-17 12:37:25 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://eric.ed.gov/?q=source%3A%22Analysis+of+Verbal+Behavior%22> (referer: None)
2019-02-17 12:37:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://eric.ed.gov/?q=source%3A%22American+Journal+on+Intellectual+and+Developmental+Disabilities%22> (referer: None)
2019-02-17 12:37:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://eric.ed.gov/?q=source%3A%22American+Journal+of+Sexuality+Education%22> (referer: None)
2019-02-17 12:37:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://eric.ed.gov/?q=source%3A%22American+Journal+of+Play%22> (referer: None)
2019-02-17 12:37:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://eric.ed.gov/?q=source%3A%22American+Journal+of+Health+Education%22> (referer: None)
2019-02-17 12:37:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://eric.ed.gov/?q=source%3A%22Zero+To+Three%22> (referer: https://eric.ed.gov/?journals)
2019-02-17 12:37:27 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://eric.ed.gov/?q=source%3A%22ZERO+TO+THREE%22+-ISSN-0736-8038> from <GET http://eric.ed.gov/?q=source%3A%22ZERO+TO+THREE%22+-ISSN-0736-8038>
2019-02-17 12:37:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://eric.ed.gov/?q=source%3A%22American+Journal+of+Evaluation%22> (referer: None)
2019-02-17 12:37:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://eric.ed.gov/?q=source%3a%22AASA+Journal+of+Scholarship+%26+Practice%22&id=EJ1137642> (referer: https://eric.ed.gov/?q=source%3A%22AASA+Journal+of+Scholarship+%26+Practice%22)
2019-02-17 12:37:27 [journal_spider] INFO: Hi, this is an item page! https://eric.ed.gov/?q=source%3a%22AASA+Journal+of+Scholarship+%26+Practice%22&id=EJ1137642
2019-02-17 12:37:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://eric.ed.gov/?q=source%3a%22AASA+Journal+of+Scholarship+%26+Practice%22&id=EJ1137642>
{'Accession_Number': u' EJ1137642',
 'Link': u'http://www.aasa.org/jsp.aspx',
 'Link_Type': u'Direct link',
 'Publication_Type': u' Journal Articles; Reports - Research',
 'Record_Type': u' Journal',
 'Source': u'AASA Journal of Scholarship & Practice',
 'Title': u"Secondary School Administrators' Perceptions of Louisiana's Compass System as a Framework for Teacher Evaluation"}
2019-02-17 12:37:28 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://eric.ed.gov/?q=source%3a%22Applied+Linguistics%22&id=EJ1142817> (referer: https://eric.ed.gov/?q=source%3A%22Applied+Linguistics%22)
2019-02-17 12:37:28 [journal_spider] INFO: Hi, this is an item page! https://eric.ed.gov/?q=source%3a%22Applied+Linguistics%22&id=EJ1142817
2019-02-17 12:37:28 [scrapy.core.scraper] DEBUG: Scraped from <200 https://eric.ed.gov/?q=source%3a%22Applied+Linguistics%22&id=EJ1142817>
{'Accession_Number': u' EJ1142817',
 'Link': u'http://dx.doi.org/10.1093/applin/amu079',
 'Link_Type': u'Direct link',
 'Publication_Type': u' Journal Articles; Reports - Research',
 'Record_Type': u' Journal',
 'Source': u'Applied Linguistics',
 'Title': u'Comprehension and Knowledge Components That Predict L2 Reading: A Latent-Trait Approach'}
2019-02-17 12:37:28 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://eric.ed.gov/?q=source%3a%22Applied+Language+Learning%22&id=EJ1087293> (referer: https://eric.ed.gov/?q=source%3A%22Applied+Language+Learning%22)
2019-02-17 12:37:28 [journal_spider] INFO: Hi, this is an item page! https://eric.ed.gov/?q=source%3a%22Applied+Language+Learning%22&id=EJ1087293
2019-02-17 12:37:28 [scrapy.core.scraper] DEBUG: Scraped from <200 https://eric.ed.gov/?q=source%3a%22Applied+Language+Learning%22&id=EJ1087293>
{'Accession_Number': u' EJ1087293',
 'Link': u'http://www.dliflc.edu/academic-journals-applied-language-learning/',
 'Link_Type': u'Direct link',
 'Publication_Type': u' Journal Articles; Reports - Research',
 'Record_Type': u' Journal',
 'Source': u'Applied Language Learning',
 'Title': u'A Comparison of Two Approaches for Assessing L2 Writing: Process-\xadBased and Impromptu Timed Writing Exams'}
2019-02-17 12:37:28 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://eric.ed.gov/?q=source%3a%22Applied+Environmental+Education+and+Communication%22&id=EJ1176738> (referer: https://eric.ed.gov/?q=source%3A%22Applied+Environmental+Education+and+Communication%22)
2019-02-17 12:37:28 [journal_spider] INFO: Hi, this is an item page! https://eric.ed.gov/?q=source%3a%22Applied+Environmental+Education+and+Communication%22&id=EJ1176738
2019-02-17 12:37:28 [scrapy.core.scraper] DEBUG: Scraped from <200 https://eric.ed.gov/?q=source%3a%22Applied+Environmental+Education+and+Communication%22&id=EJ1176738>
{'Accession_Number': u' EJ1176738',
 'Link': u'http://dx.doi.org/10.1080/1533015X.2017.1366882',
 'Link_Type': u'Direct link',
 'Publication_Type': u' Journal Articles; Reports - Research',
 'Record_Type': u' Journal',
 'Source': u'Applied Environmental Education and Communication',
 'Title': u'Utilizing Project-Based Learning to Increase Sustainability Attitudes among Students'}
^C2019-02-17 12:37:28 [scrapy.crawler] INFO: Received SIGINT, shutting down gracefully. Send again to force 
2019-02-17 12:37:28 [scrapy.core.engine] INFO: Closing spider (shutdown)
2019-02-17 12:37:29 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://eric.ed.gov/?q=source%3a%22Applied+Developmental+Science%22&id=EJ1172535> (referer: https://eric.ed.gov/?q=source%3A%22Applied+Developmental+Science%22)
2019-02-17 12:37:29 [journal_spider] INFO: Hi, this is an item page! https://eric.ed.gov/?q=source%3a%22Applied+Developmental+Science%22&id=EJ1172535
2019-02-17 12:37:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://eric.ed.gov/?q=source%3a%22Applied+Developmental+Science%22&id=EJ1172535>
{'Accession_Number': u' EJ1172535',
 'Link': u'http://dx.doi.org/10.1080/10888691.2016.1231579',
 'Link_Type': u'Direct link',
 'Publication_Type': u' Journal Articles; Reports - Research',
 'Record_Type': u' Journal',
 'Source': u'Applied Developmental Science',
 'Title': u'Evaluation of a Leadership Program for First Nations, M\xe9tis, and Inuit Youth: Stories of Positive Youth Development and Community Engagement'}
2019-02-17 12:37:29 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://eric.ed.gov/?q=source%3a%22Anthropology+%26+Education+Quarterly%22&id=EJ1196776> (referer: https://eric.ed.gov/?q=source%3A%22Anthropology+%26+Education+Quarterly%22)
2019-02-17 12:37:29 [journal_spider] INFO: Hi, this is an item page! https://eric.ed.gov/?q=source%3a%22Anthropology+%26+Education+Quarterly%22&id=EJ1196776
2019-02-17 12:37:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://eric.ed.gov/?q=source%3a%22Anthropology+%26+Education+Quarterly%22&id=EJ1196776>
{'Accession_Number': u' EJ1196776',
 'Link': u'http://dx.doi.org/10.1111/aeq.12268',
 'Link_Type': u'Direct link',
 'Publication_Type': u' Journal Articles; Reports - Descriptive',
 'Record_Type': u' Journal',
 'Source': u'Anthropology & Education Quarterly',
 'Title': u"A Mam\xe1 No la Vas a Llevar en la Maleta: Undocumented Mothers Crossing and Contesting Borders for Their Children's Education"}
2019-02-17 12:37:29 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://eric.ed.gov/?q=source%3a%22Annual+Review+of+Economics%22&id=EJ1072173> (referer: https://eric.ed.gov/?q=source:%22Annual+Review+of+Economics%22)
2019-02-17 12:37:29 [journal_spider] INFO: Hi, this is an item page! https://eric.ed.gov/?q=source%3a%22Annual+Review+of+Economics%22&id=EJ1072173
2019-02-17 12:37:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://eric.ed.gov/?q=source%3a%22Annual+Review+of+Economics%22&id=EJ1072173>
{'Accession_Number': u' EJ1072173',
 'Link': u'http://dx.doi.org/10.1146/annurev-economics-080614-115748',
 'Link_Type': u'Direct link',
 'Publication_Type': u' Journal Articles; Reports - Research; Information Analyses',
 'Record_Type': u' Journal',
 'Source': u'Annual Review of Economics',
 'Title': u'Knowledge-Based Hierarchies: Using Organizations to Understand the Economy'}
2019-02-17 12:37:29 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://eric.ed.gov/?q=source%3a%22Annals+of+Dyslexia%22&id=EJ1177966> (referer: https://eric.ed.gov/?q=source%3A%22Annals+of+Dyslexia%22)
2019-02-17 12:37:29 [journal_spider] INFO: Hi, this is an item page! https://eric.ed.gov/?q=source%3a%22Annals+of+Dyslexia%22&id=EJ1177966
2019-02-17 12:37:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://eric.ed.gov/?q=source%3a%22Annals+of+Dyslexia%22&id=EJ1177966>
{'Accession_Number': u' EJ1177966',
 'Link': u'http://dx.doi.org/10.1007/s11881-018-0155-0',
 'Link_Type': u'Direct link',
 'Publication_Type': u' Journal Articles; Reports - Research',
 'Record_Type': u' Journal',
 'Source': u'Annals of Dyslexia',
 'Title': u'Bias in Dyslexia Screening in a Dutch Multicultural Population'}
2019-02-17 12:37:29 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://eric.ed.gov/?q=source%3a%22Anatomical+Sciences+Education%22&id=EJ1165673> (referer: https://eric.ed.gov/?q=source%3A%22Anatomical+Sciences+Education%22)
2019-02-17 12:37:30 [journal_spider] INFO: Hi, this is an item page! https://eric.ed.gov/?q=source%3a%22Anatomical+Sciences+Education%22&id=EJ1165673
2019-02-17 12:37:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://eric.ed.gov/?q=source%3a%22Anatomical+Sciences+Education%22&id=EJ1165673>
{'Accession_Number': u' EJ1165673',
 'Link': u'http://dx.doi.org/10.1002/ase.1709',
 'Link_Type': u'Direct link',
 'Publication_Type': u' Journal Articles; Reports - Research',
 'Record_Type': u' Journal',
 'Source': u'Anatomical Sciences Education',
 'Title': u'Student and Recent Graduate Perspectives on Radiological Imaging Instruction during Basic Anatomy Courses'}
2019-02-17 12:37:30 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://eric.ed.gov/?q=source%3a%22Analysis+of+Verbal+Behavior%22&id=EJ1163091> (referer: https://eric.ed.gov/?q=source%3A%22Analysis+of+Verbal+Behavior%22)
2019-02-17 12:37:30 [journal_spider] INFO: Hi, this is an item page! https://eric.ed.gov/?q=source%3a%22Analysis+of+Verbal+Behavior%22&id=EJ1163091
2019-02-17 12:37:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://eric.ed.gov/?q=source%3a%22Analysis+of+Verbal+Behavior%22&id=EJ1163091>
{'Accession_Number': u' EJ1163091',
 'Link': u'http://dx.doi.org/10.1007/s40616-017-0090-x',
 'Link_Type': u'Direct link',
 'Publication_Type': u' Journal Articles; Reports - Research',
 'Record_Type': u' Journal',
 'Source': u'Analysis of Verbal Behavior',
 'Title': u'The Generalization of Mands'}
2019-02-17 12:37:30 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://eric.ed.gov/?q=source%3a%22American+Journal+on+Intellectual+and+Developmental+Disabilities%22&id=EJ1197250> (referer: https://eric.ed.gov/?q=source%3A%22American+Journal+on+Intellectual+and+Developmental+Disabilities%22)
2019-02-17 12:37:30 [journal_spider] INFO: Hi, this is an item page! https://eric.ed.gov/?q=source%3a%22American+Journal+on+Intellectual+and+Developmental+Disabilities%22&id=EJ1197250
2019-02-17 12:37:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://eric.ed.gov/?q=source%3a%22American+Journal+on+Intellectual+and+Developmental+Disabilities%22&id=EJ1197250>
{'Accession_Number': u' EJ1197250',
 'Link': u'https://doi.org/10.1352/1944-7558-123.6.529',
 'Link_Type': u'Direct link',
 'Publication_Type': u' Journal Articles; Information Analyses',
 'Record_Type': u' Journal',
 'Source': u'American Journal on Intellectual and Developmental Disabilities',
 'Title': u"The Relationship between Children's Exposure to Intimate Partner Violence and Intellectual and Developmental Disabilities: A Systematic Review of the Literature"}
2019-02-17 12:37:30 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://eric.ed.gov/?q=source%3a%22American+Journal+of+Sexuality+Education%22&id=EJ1180235> (referer: https://eric.ed.gov/?q=source%3A%22American+Journal+of+Sexuality+Education%22)
2019-02-17 12:37:30 [journal_spider] INFO: Hi, this is an item page! https://eric.ed.gov/?q=source%3a%22American+Journal+of+Sexuality+Education%22&id=EJ1180235
2019-02-17 12:37:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://eric.ed.gov/?q=source%3a%22American+Journal+of+Sexuality+Education%22&id=EJ1180235>
{'Accession_Number': u' EJ1180235',
 'Link': u'http://dx.doi.org/10.1080/15546128.2018.1457462',
 'Link_Type': u'Direct link',
 'Publication_Type': u' Journal Articles; Reports - Descriptive',
 'Record_Type': u' Journal',
 'Source': u'American Journal of Sexuality Education',
 'Title': u"Let's Talk about Sex \u2026 Education"}
2019-02-17 12:37:30 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://eric.ed.gov/?q=source%3a%22American+Journal+of+Play%22&id=EJ1192055> (referer: https://eric.ed.gov/?q=source%3A%22American+Journal+of+Play%22)
2019-02-17 12:37:30 [journal_spider] INFO: Hi, this is an item page! https://eric.ed.gov/?q=source%3a%22American+Journal+of+Play%22&id=EJ1192055
2019-02-17 12:37:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://eric.ed.gov/?q=source%3a%22American+Journal+of+Play%22&id=EJ1192055>
{'Accession_Number': u' EJ1192055',
 'Link': u'http://files.eric.ed.gov/fulltext/EJ1192055.pdf',
 'Link_Type': u' Download full text',
 'Publication_Type': u' Journal Articles; Reports - Evaluative; Information Analyses',
 'Record_Type': u' Journal',
 'Source': u'American Journal of Play',
 'Title': u'Problem Gaming: A Short Primer'}
2019-02-17 12:37:30 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://eric.ed.gov/?q=source%3a%22American+Journal+of+Health+Education%22&id=EJ1176732> (referer: https://eric.ed.gov/?q=source%3A%22American+Journal+of+Health+Education%22)
2019-02-17 12:37:30 [journal_spider] INFO: Hi, this is an item page! https://eric.ed.gov/?q=source%3a%22American+Journal+of+Health+Education%22&id=EJ1176732
2019-02-17 12:37:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://eric.ed.gov/?q=source%3a%22American+Journal+of+Health+Education%22&id=EJ1176732>
{'Accession_Number': u' EJ1176732',
 'Link': u'http://dx.doi.org/10.1080/19325037.2018.1449683',
 'Link_Type': u'Direct link',
 'Publication_Type': u' Journal Articles; Reports - Research; Information Analyses',
 'Record_Type': u' Journal',
 'Source': u'American Journal of Health Education',
 'Title': u'The Relationship between Stress and Maladaptive Weight-Related Behaviors in College Students: A Review of the Literature'}
2019-02-17 12:37:31 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://eric.ed.gov/?q=source%3A%22ZERO+TO+THREE%22+-ISSN-0736-8038> (referer: None)
2019-02-17 12:37:31 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://eric.ed.gov/?q=source%3a%22Zero+To+Three%22&id=EJ1166858> (referer: https://eric.ed.gov/?q=source%3A%22Zero+To+Three%22)
2019-02-17 12:37:31 [journal_spider] INFO: Hi, this is an item page! https://eric.ed.gov/?q=source%3a%22Zero+To+Three%22&id=EJ1166858
2019-02-17 12:37:31 [scrapy.core.scraper] DEBUG: Scraped from <200 https://eric.ed.gov/?q=source%3a%22Zero+To+Three%22&id=EJ1166858>
{'Accession_Number': u' EJ1166858',
 'Link': u'https://www.zerotothree.org/resources/series/journal-archive',
 'Link_Type': u'Direct link',
 'Publication_Type': u' Journal Articles; Reports - Descriptive',
 'Record_Type': u' Journal',
 'Source': u'ZERO TO THREE',
 'Title': u'Effective Mental Health Interventions and Treatments for Young Children with Diverse Needs'}
2019-02-17 12:37:31 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://eric.ed.gov/?q=source%3a%22Zero+To+Three%22&id=EJ1166844> (referer: https://eric.ed.gov/?q=source%3A%22Zero+To+Three%22)
2019-02-17 12:37:31 [journal_spider] INFO: Hi, this is an item page! https://eric.ed.gov/?q=source%3a%22Zero+To+Three%22&id=EJ1166844
2019-02-17 12:37:31 [scrapy.core.scraper] DEBUG: Scraped from <200 https://eric.ed.gov/?q=source%3a%22Zero+To+Three%22&id=EJ1166844>
{'Accession_Number': u' EJ1166844',
 'Link': u'https://www.zerotothree.org/resources/series/journal-archive',
 'Link_Type': u'Direct link',
 'Publication_Type': u' Journal Articles; Reports - Descriptive',
 'Record_Type': u' Journal',
 'Source': u'ZERO TO THREE',
 'Title': u'The Missing Ingredients in Reflective Supervision: Helping Staff Members Learn about and Fully Participate in the Supervisory Process'}
2019-02-17 12:37:31 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://eric.ed.gov/?q=source%3a%22American+Journal+of+Evaluation%22&id=EJ1179468> (referer: https://eric.ed.gov/?q=source%3A%22American+Journal+of+Evaluation%22)
2019-02-17 12:37:31 [journal_spider] INFO: Hi, this is an item page! https://eric.ed.gov/?q=source%3a%22American+Journal+of+Evaluation%22&id=EJ1179468
2019-02-17 12:37:31 [scrapy.core.scraper] DEBUG: Scraped from <200 https://eric.ed.gov/?q=source%3a%22American+Journal+of+Evaluation%22&id=EJ1179468>
{'Accession_Number': u' EJ1179468',
 'Link': u'http://dx.doi.org/10.1177/1098214017727364',
 'Link_Type': u'Direct link',
 'Publication_Type': u' Journal Articles; Reports - Research',
 'Record_Type': u' Journal',
 'Source': u'American Journal of Evaluation',
 'Title': u'Twinning "Practices of Change" with "Theory of Change": Room for Emergence in Advocacy Evaluation'}
2019-02-17 12:37:31 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://eric.ed.gov/?q=source%3a%22American+Journal+of+Evaluation%22&id=EJ1179432> (referer: https://eric.ed.gov/?q=source%3A%22American+Journal+of+Evaluation%22)
2019-02-17 12:37:32 [journal_spider] INFO: Hi, this is an item page! https://eric.ed.gov/?q=source%3a%22American+Journal+of+Evaluation%22&id=EJ1179432
2019-02-17 12:37:32 [scrapy.core.scraper] DEBUG: Scraped from <200 https://eric.ed.gov/?q=source%3a%22American+Journal+of+Evaluation%22&id=EJ1179432>
{'Accession_Number': u' EJ1179432',
 'Link': u'http://dx.doi.org/10.1177/1098214017722857',
 'Link_Type': u'Direct link',
 'Publication_Type': u' Journal Articles; Reports - Research',
 'Record_Type': u' Journal',
 'Source': u'American Journal of Evaluation',
 'Title': u'Defining "Community" in Community Health Evaluation: Perspectives from a Sample of Nonprofit Appalachian Hospitals'}
2019-02-17 12:37:32 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://eric.ed.gov/?q=source%3a%22American+Journal+of+Evaluation%22&id=EJ1179428> (referer: https://eric.ed.gov/?q=source%3A%22American+Journal+of+Evaluation%22)
2019-02-17 12:37:32 [journal_spider] INFO: Hi, this is an item page! https://eric.ed.gov/?q=source%3a%22American+Journal+of+Evaluation%22&id=EJ1179428
2019-02-17 12:37:32 [scrapy.core.scraper] DEBUG: Scraped from <200 https://eric.ed.gov/?q=source%3a%22American+Journal+of+Evaluation%22&id=EJ1179428>
{'Accession_Number': u' EJ1179428',
 'Link': u'http://dx.doi.org/10.1177/1098214017720066',
 'Link_Type': u'Direct link',
 'Publication_Type': u' Journal Articles; Opinion Papers; Reports - Descriptive',
 'Record_Type': u' Journal',
 'Source': u'American Journal of Evaluation',
 'Title': u'The Oral History of Evaluation: The Professional Development of Thomas D. Cook'}
2019-02-17 12:37:32 [scrapy.extensions.feedexport] INFO: Stored csv feed (19 items) in: test.csv
2019-02-17 12:37:32 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 16154,
 'downloader/request_count': 56,
 'downloader/request_method_count/GET': 56,
 'downloader/response_bytes': 290973,
 'downloader/response_count': 56,
 'downloader/response_status_count/200': 39,
 'downloader/response_status_count/301': 17,
 'dupefilter/filtered': 51,
 'finish_reason': 'shutdown',
 'finish_time': datetime.datetime(2019, 2, 17, 19, 37, 32, 303502),
 'item_scraped_count': 19,
 'log_count/DEBUG': 76,
 'log_count/INFO': 30,
 'memusage/max': 53043200,
 'memusage/startup': 53043200,
 'request_depth_max': 2,
 'response_received_count': 39,
 'robotstxt/request_count': 1,
 'robotstxt/response_count': 1,
 'robotstxt/response_status_count/200': 1,
 'scheduler/dequeued': 55,
 'scheduler/dequeued/memory': 55,
 'scheduler/enqueued': 4374,
 'scheduler/enqueued/memory': 4374,
 'start_time': datetime.datetime(2019, 2, 17, 19, 36, 35, 55950)}
2019-02-17 12:37:32 [scrapy.core.engine] INFO: Spider closed (shutdown)

 Thanks

txt
yml
(98 Bytes)
py

Best Answer

okm looks like upgrading to the latest version of scrapy 1.6 and using requests.compat.unquote instead of urllib.unquote made it work.


I tried it again using "urllib==1.17" in the requirements.txt file same error.

Tried using the six library no luck. 


 

Configs-MacBook-Pro:eric_spider bpecheux$ shub deploy
Packing version 1.0
Deploying to Scrapy Cloud project "375014"
Deploy log last 30 lines:
 ---> Using cache
 ---> 3327d201e5b8
Step 4/12 : ADD run-pipcheck /usr/local/bin/
 ---> Using cache
 ---> 0f1bb3083976
Step 5/12 : RUN chmod +x /usr/local/bin/run-pipcheck
 ---> Using cache
 ---> 7ef8537c8057
Step 6/12 : RUN chmod +x /usr/local/sbin/eggbased-entrypoint &&     ln -sf /usr/local/sbin/eggbased-entrypoint /usr/local/sbin/start-crawl &&     ln -sf /usr/local/sbin/eggbased-entrypoint /usr/local/sbin/scrapy-list &&     ln -sf /usr/local/sbin/eggbased-entrypoint /usr/local/sbin/shub-image-info &&     ln -sf /usr/local/sbin/eggbased-entrypoint /usr/local/sbin/run-pipcheck
 ---> Using cache
 ---> 87d8a61b7e28
Step 7/12 : ADD requirements.txt /app/requirements.txt
 ---> 6a49831a76fb
Step 8/12 : RUN mkdir /app/python && chown nobody:nogroup /app/python
 ---> Running in 2c3a135d86f7
Removing intermediate container 2c3a135d86f7
 ---> b1b440b8584b
Step 9/12 : RUN sudo -u nobody -E PYTHONUSERBASE=$PYTHONUSERBASE     pip install --user --no-cache-dir -r /app/requirements.txt
 ---> Running in b688be527329
Collecting six.moves.urllib.parse (from -r /app/requirements.txt (line 1))
  Could not find a version that satisfies the requirement six.moves.urllib.parse (from -r /app/requirements.txt (line 1)) (from versions: )

No matching distribution found for six.moves.urllib.parse (from -r /app/requirements.txt (line 1))

You are using pip version 18.1, however version 19.0.2 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.

{"message": "The command '/bin/sh -c sudo -u nobody -E PYTHONUSERBASE=$PYTHONUSERBASE     pip install --user --no-cache-dir -r /app/requirements.txt' returned a non-zero code: 1", "details": {"message": "The command '/bin/sh -c sudo -u nobody -E PYTHONUSERBASE=$PYTHONUSERBASE     pip install --user --no-cache-dir -r /app/requirements.txt' returned a non-zero code: 1", "code": 1}, "error": "requirements_error"}

{"status": "error", "message": "Requirements error"}
Deploy log location: /var/folders/wj/0tf8pc215x9f_ybkv4mr0p040000gp/T/shub_deploy_xdPqVD.log
Error: Deploy failed: {"status": "error", "message": "Requirements error"}

 

Tried to use requests.compat.unquote instead of urllib.unquote. works locally still no luck on scrapinghub...

Answer

okm looks like upgrading to the latest version of scrapy 1.6 and using requests.compat.unquote instead of urllib.unquote made it work.

Good to hear you were able to find a solution on your own. For the future though, please note that while you have an active subscription with us, you are able to open support tickets directly from your dashboard, under Help - Contact Support. We respond to support tickets within one business day. The forum is for customers who don't have an active subscription with us.

Login to post a comment