Start a new topic

How to fix? builtins.UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfd

Few pages on one utf-8 website contains non utf-8 character. This causes the error.

The logs are attached. The problem is that I use an error callback which should catch the error, at least I was thinking, it will.


How to handle the error properly? in python 2 I would use decode utf-8 with ignore parameter, which would just skip the non utf-8 characters. How to fix this in python 3 or within scrapy request parameters?


yield scrapy.Request(product_url, callback=self.parse_product, errback=self.errback)


    def errback(self, failure):

        # log all failures

        self.logger.error(repr(failure))

 

        # in case you want to do something special for some errors,

        # you may need the failure's type:

        if failure.check(UnicodeDecodeError):

            # these exceptions come from HttpError spider middleware

            # you can get the non-200 response

            response = failure.value.response

            self.logger.error('UnicodeDecodeError on %s', response.url)


txt
Login to post a comment