S
Steven D'Aprano
At least two standard error handlers are documented as working for
encoding only:
xmlcharrefreplace
backslashreplace
See http://docs.python.org/library/codecs.html#codec-base-classes
and http://docs.python.org/py3k/library/codecs.html
Why is this? I don't see why they shouldn't work for decoding as well.
Consider this example using Python 3.2:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'cp932' codec can't decode bytes in position 9-10:
illegal multibyte sequence
The two bytes b'\xe9!' is an illegal multibyte sequence for CP-932 (also
known as MS-KANJI or SHIFT-JIS). Is there some reason why this shouldn't
or can't be supported?
# This doesn't actually work.
b"aaa--\xe9z--\xe9!--bbb".decode("cp932", "backslashreplace")
=> r'aaa--騷--\xe9\x21--bbb'
and similarly for xmlcharrefreplace.
encoding only:
xmlcharrefreplace
backslashreplace
See http://docs.python.org/library/codecs.html#codec-base-classes
and http://docs.python.org/py3k/library/codecs.html
Why is this? I don't see why they shouldn't work for decoding as well.
Consider this example using Python 3.2:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'cp932' codec can't decode bytes in position 9-10:
illegal multibyte sequence
The two bytes b'\xe9!' is an illegal multibyte sequence for CP-932 (also
known as MS-KANJI or SHIFT-JIS). Is there some reason why this shouldn't
or can't be supported?
# This doesn't actually work.
b"aaa--\xe9z--\xe9!--bbb".decode("cp932", "backslashreplace")
=> r'aaa--騷--\xe9\x21--bbb'
and similarly for xmlcharrefreplace.