J
John Perks and Sarah Mount
(My Python uses UTF16 natively; can someone with UTF32 Python let me
know if that behaves differently?)
u'\ud800'
codecs.utf_16_be_encode(_)[0]
'\xd8\x00'
codecs.utf_16_be_decode(_)[0]
Traceback (most recent call last):
File "<input>", line 1, in ?
UnicodeDecodeError: 'utf16' codec can't decode bytes in position 0-1:
unexpected end of data
If the ascii can't be recognized as UTF16, then surely the codec
shouldn't have allowed it to be encoded in the first place? I could
understand if it was trying to decode ascii into (native) UTF32.
On a similar note, if you are using UTF32 natively, are you allowed to
have raw surrogate escape sequences (paired or otherwise) in unicode
literals?
Thanks
John
know if that behaves differently?)
u'\ud800'
codecs.utf_16_be_encode(_)[0]
'\xd8\x00'
codecs.utf_16_be_decode(_)[0]
Traceback (most recent call last):
File "<input>", line 1, in ?
UnicodeDecodeError: 'utf16' codec can't decode bytes in position 0-1:
unexpected end of data
If the ascii can't be recognized as UTF16, then surely the codec
shouldn't have allowed it to be encoded in the first place? I could
understand if it was trying to decode ascii into (native) UTF32.
On a similar note, if you are using UTF32 natively, are you allowed to
have raw surrogate escape sequences (paired or otherwise) in unicode
literals?
Thanks
John