Why is python decode replacing more than the invalid bytes from an encoded string?
- by dangra
Trying to decode an invalid encoded utf-8 html page gives different results in python, firefox and chrome.
The invalid encoded fragment from test page looks like 'PREFIX\xe3\xabSUFFIX'
>>> fragment = 'PREFIX\xe3\xabSUFFIX'
>>> fragment.decode('utf-8', 'strict')
...
UnicodeDecodeError: 'utf8' codec can't decode bytes in position…