unicode question

Edward Loper · Feb 25, 2006

I would like to convert an 8-bit string (i.e., a str) into unicode,
treating chars \x00-\x7f as ascii, and converting any chars \x80-xff
into a backslashed escape sequences. I.e., I want something like this:
u'abc \\xff\\xe8 def'

The best I could come up with was:

def decode_with_backslashreplace(s):
"str -> unicode"
return (s.decode('latin1')
.encode('ascii', 'backslashreplace')
.decode('ascii'))

Surely there's a better way than converting back and forth 3 times? Is
there a reason that the 'backslashreplace' error mode can't be used with
codecs.decode?
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: don't know how to handle UnicodeDecodeError in error callback

-Edward

Tim Roberts · Feb 25, 2006

Edward Loper said:
I would like to convert an 8-bit string (i.e., a str) into unicode,
treating chars \x00-\x7f as ascii, and converting any chars \x80-xff
into a backslashed escape sequences. I.e., I want something like this:

u'abc \\xff\\xe8 def'

The best I could come up with was:

def decode_with_backslashreplace(s):
"str -> unicode"
return (s.decode('latin1')
.encode('ascii', 'backslashreplace')
.decode('ascii'))

Surely there's a better way than converting back and forth 3 times?

I didn't check whether this was faster, although I rather suspect it is
not:

cvt = lambda x: ord(x)<0x80 and x or '\\x'+hex(ord(x))
def decode_with_backslashreplace(s):
return ''.join(map(cvt,s))

Kent Johnson · Feb 25, 2006

Edward said:
I would like to convert an 8-bit string (i.e., a str) into unicode,
treating chars \x00-\x7f as ascii, and converting any chars \x80-xff
into a backslashed escape sequences. I.e., I want something like this:

u'abc \\xff\\xe8 def'

Kent

Unicode confusion	0	Jul 14, 2008
Understanding Unicode & encodings	6	Jul 23, 2006
Unicode blues in Python3	14	Mar 23, 2010
Unicode conversion problem (codec can't decode)	2	Apr 4, 2008
Guessing the encoding from a BOM	7	Jan 16, 2014
Once again a unicode question	2	Mar 26, 2005
urllib2.unquote() vs unicode	1	Mar 18, 2008
Anoying unicode / str conversion problem	2	Jan 26, 2009

unicode question

Edward Loper

Tim Roberts

Kent Johnson

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads