Could anyone help me in connection with the following problem:
I deployed a project on scrapinghub cloud with a spider scraping a sport betting website. The targets are the currently running live football games (data about the matches, match stats, odds, etc.). I know for sure that the spider works properly, because if I run it via Anaconda prompt terminal it gets the job done.
However when I run it through scrapinghub cloud the spider sometimes gives back items as results, sometimes it does not give any items back (however it should - because the url works fine, there are live matches, every condition should be met).
When no items are returned I see this in the logs:
Hello Everyone,
Could anyone help me in connection with the following problem:
I deployed a project on scrapinghub cloud with a spider scraping a sport betting website. The targets are the currently running live football games (data about the matches, match stats, odds, etc.). I know for sure that the spider works properly, because if I run it via Anaconda prompt terminal it gets the job done.
However when I run it through scrapinghub cloud the spider sometimes gives back items as results, sometimes it does not give any items back (however it should - because the url works fine, there are live matches, every condition should be met).
When no items are returned I see this in the logs:
[scrapy.spidermiddlewares.httperror] Ignoring response <410 https://eu-offering.kambicdn.org/offering/v2018/ub/event/live/open.json>: HTTP status code is not handled or not allowed
I believe I took care of any possible user-agent problem in my spider with this:
yield scrapy.Request(url=self.starting_url, callback=self.parse, headers={
'User-Agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.66 Safari/537.36"
})
Can anyone help me what can cause this strange behaviour?
Thank you in advance!
0 Votes
0 Comments
Login to post a comment