Start a new topic

Ignoring response <410 - HTTP status code is not handled or not allowed

Hello Everyone,


Could anyone help me in connection with the following problem:

I deployed a project on scrapinghub cloud with a spider scraping a sport betting website. The targets are the currently running live football games (data about the matches, match stats, odds, etc.). I know for sure that the spider works properly, because if I run it via Anaconda prompt terminal it gets the job done.


However when I run it through scrapinghub cloud the spider sometimes gives back items as results, sometimes it does not give any items back (however it should - because the url works fine, there are live matches, every condition should be met).


When no items are returned I see this in the logs:


[scrapy.spidermiddlewares.httperror] Ignoring response <410 https://eu-offering.kambicdn.org/offering/v2018/ub/event/live/open.json>: HTTP status code is not handled or not allowed


I believe I took care of any possible user-agent problem in my spider with this:

 

    def start_requests(self):

        yield scrapy.Request(url=self.starting_url, callback=self.parse, headers={

            'User-Agent'"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.66 Safari/537.36"

        }) 


Can anyone help me what can cause this strange behaviour?

Thank you in advance!

Login to post a comment