I'm using code.Interactive console but it doesn't work correctly
with non-ascii characters. I think it boils down to this problem:
Python 2.4.3 (#69, Mar 29 2006, 17:35:34) [MSC v.1310 32 bit (Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
This is utterly useless for diagnostic purposes. What you see is NOT
what you've got. Use repr().
What you've got, as the error message says, is u'\x84' which is not
u"\N{LATIN SMALL LETTER A WITH DIAERESIS}", it is a control character.
See below.
ä
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "<string>", line 1, in ?
File "c:\python24\lib\encodings\cp850.py", line 18, in encode
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\x84' in
Why does the exec call fail, and is there a workaround?
Executive summary:
The exec statement didn't fail, it was the print statement trying to
print, to your CP850 console, a unicode char that doesn't exist in CP850.
This happened because you copied a character whose repr() is '\x84' from
your MS-DOS console and pasted it into 'u"<insert any old rubbish
here>"'
Details:
Windows XP, in a console screen:
Python 2.4.2 (#67, Sep 28 2005, 12:41:11) [MSC v.1310 32 bit (Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
|>> uc = u"\N{LATIN SMALL LETTER A WITH DIAERESIS}"
|>> uc
u'\xe4' <<== agrees with Unicode book
|>> encoded = uc.encode('cp850')'\x84' <<== agrees with
http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/PC/CP850.TXT
|>> print uc
ä <<== looks like LATIN SMALL LETTER A WITH DIAERESIS, as expected
|>> print encoded
ä <<== looks like LATIN SMALL LETTER A WITH DIAERESIS, as expected
|>> print u"\x84"
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "c:\python24\lib\encodings\cp850.py", line 18, in encode
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\x84' in
position 0: character maps to <undefined>
<<== as expected
Looks like Python is working fine to me ...
So, what's happening? Look at this:
|>> char1 = u"ä" <<= corresponds to your "print"
|>> char2 = "ä" <<= corresponds to your exec -- which was given a STRING
constant, like this, not a Unicode constant.
Character in char1 was copied from DOS console.
Second line was obtained by DOS console editing of copy of first line.
|>> char1
u'\xe4'
|>> char2
'\x84' <<= Aha!
What you have done is effectively: exec 'print u"\x84"'
Workaround/kludge/bypass:
exec u'print u"ä"'
......^
Much better: embed non-ASCII characters in source code *ONLY* when you
have a proper coding header:
http://www.python.org/dev/peps/pep-0263/
HTH,
John