Ignoring response <410 - HTTP status code is not handled or not allowed

Posted over 4 years ago by papamaci90

Post a topic

Un Answered

papamaci90

Hello Everyone,

Could anyone help me in connection with the following problem:

I deployed a project on scrapinghub cloud with a spider scraping a sport betting website. The targets are the currently running live football games (data about the matches, match stats, odds, etc.). I know for sure that the spider works properly, because if I run it via Anaconda prompt terminal it gets the job done.

However when I run it through scrapinghub cloud the spider sometimes gives back items as results, sometimes it does not give any items back (however it should - because the url works fine, there are live matches, every condition should be met).

When no items are returned I see this in the logs:

[scrapy.spidermiddlewares.httperror] Ignoring response <410 https://eu-offering.kambicdn.org/offering/v2018/ub/event/live/open.json>: HTTP status code is not handled or not allowed

I believe I took care of any possible user-agent problem in my spider with this:

def start_requests(self):

yield scrapy.Request(url=self.starting_url, callback=self.parse, headers={

'User-Agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.66 Safari/537.36"

})

Can anyone help me what can cause this strange behaviour?

Thank you in advance!

0 Votes

0 Comments