How to fix? builtins.UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfd

Posted almost 6 years ago by andrej.fogelton

Post a topic

andrej.fogelton

Few pages on one utf-8 website contains non utf-8 character. This causes the error.

The logs are attached. The problem is that I use an error callback which should catch the error, at least I was thinking, it will.

How to handle the error properly? in python 2 I would use decode utf-8 with ignore parameter, which would just skip the non utf-8 characters. How to fix this in python 3 or within scrapy request parameters?

yield scrapy.Request(product_url, callback=self.parse_product, errback=self.errback)

def errback(self, failure):

# log all failures

self.logger.error(repr(failure))

# in case you want to do something special for some errors,

# you may need the failure's type:

if failure.check(UnicodeDecodeError):

# these exceptions come from HttpError spider middleware

# you can get the non-200 response

response = failure.value.response

self.logger.error('UnicodeDecodeError on %s', response.url)

Attachments (1)

txt

logmojalekar....txt
28.5 KB

0 Votes

0 Comments