Few pages on one utf-8 website contains non utf-8 character. This causes the error.
The logs are attached. The problem is that I use an error callback which should catch the error, at least I was thinking, it will.
How to handle the error properly? in python 2 I would use decode utf-8 with ignore parameter, which would just skip the non utf-8 characters. How to fix this in python 3 or within scrapy request parameters?
Few pages on one utf-8 website contains non utf-8 character. This causes the error.
The logs are attached. The problem is that I use an error callback which should catch the error, at least I was thinking, it will.
How to handle the error properly? in python 2 I would use decode utf-8 with ignore parameter, which would just skip the non utf-8 characters. How to fix this in python 3 or within scrapy request parameters?
yield scrapy.Request(product_url, callback=self.parse_product, errback=self.errback)
def errback(self, failure):
# log all failures
self.logger.error(repr(failure))
# in case you want to do something special for some errors,
# you may need the failure's type:
if failure.check(UnicodeDecodeError):
# these exceptions come from HttpError spider middleware
# you can get the non-200 response
response = failure.value.response
self.logger.error('UnicodeDecodeError on %s', response.url)
Attachments (1)
logmojalekar....txt
28.5 KB
0 Votes
0 Comments
Login to post a comment