is this a unicode/string bug?

O

olsongt

I was going to submit to sourceforge, but my unicode skills are weak.
I was trying to strip characters from a string that contained values
outside of ASCII. I though I could just encode as 'ascii' in 'replace'
mode but it threw an error. Strangely enough, if I decode via the
ascii codec and then encode via the ascii codec, I get what I want.
That being said, this may be operating correctly.
Traceback (most recent call last):
File "<interactive input>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xae in position 3:
ordinal not in range(128)
 
F

Fredrik Lundh

I was going to submit to sourceforge, but my unicode skills are weak.
I was trying to strip characters from a string that contained values
outside of ASCII. I though I could just encode as 'ascii' in 'replace'
mode but it threw an error. Strangely enough, if I decode via the
ascii codec and then encode via the ascii codec, I get what I want.
That being said, this may be operating correctly.

encode on 8-bit strings and decode on unicode strings aren't exactly
obvious operations...

encode("ascii") is a unicode operation, so when you do this, Python first
attempts to turn your string into a unicode string, using the default en-
coding. that operation fails:

Traceback (most recent call last):
File "<interactive input>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xae in position 3:
ordinal not in range(128)
u'aaa\ufffd'

this converts the encoded stream to Unicode, using a "suitable replacement
character" for characters that cannot be converted. U+FFFD is 'REPLACEMENT
CHARACTER', which, I assume, is about as suitable as you can get.
'aaa?'

this converts the unicode string from the previous step back to ascii, using
a "suitable replacement character" for characters than cannot be converted.
for 8-bit strings, "?" is a suitable character.

instead of playing codec games, you could use translate or a simple regular
expression:

outstring = re.sub("[\x80-\xff]", "?", instring)

</F>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,820
Latest member
GilbertoA5

Latest Threads

Top