E
Edward Loper
I would like to convert an 8-bit string (i.e., a str) into unicode,
treating chars \x00-\x7f as ascii, and converting any chars \x80-xff
into a backslashed escape sequences. I.e., I want something like this:
u'abc \\xff\\xe8 def'
The best I could come up with was:
def decode_with_backslashreplace(s):
"str -> unicode"
return (s.decode('latin1')
.encode('ascii', 'backslashreplace')
.decode('ascii'))
Surely there's a better way than converting back and forth 3 times? Is
there a reason that the 'backslashreplace' error mode can't be used with
codecs.decode?
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: don't know how to handle UnicodeDecodeError in error callback
-Edward
treating chars \x00-\x7f as ascii, and converting any chars \x80-xff
into a backslashed escape sequences. I.e., I want something like this:
u'abc \\xff\\xe8 def'
The best I could come up with was:
def decode_with_backslashreplace(s):
"str -> unicode"
return (s.decode('latin1')
.encode('ascii', 'backslashreplace')
.decode('ascii'))
Surely there's a better way than converting back and forth 3 times? Is
there a reason that the 'backslashreplace' error mode can't be used with
codecs.decode?
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: don't know how to handle UnicodeDecodeError in error callback
-Edward