Unicode string in exec()

S

Shrii

1.I read a unicode file by using codec
2.I want to pass that string to exec() statement
3.But one of my character (U+0950) in that string is not showing
properly in the output got by that exec() statement

could anyone help me to get proper output. ?


somesh
 
J

Jeff Epler

First off, I just have to correct your terminology. "exec" is a
statement, and doesn't require parentheses, so talking about "exec()"
invites confusion.

I'll answer your question in terms of eval(), which takes a string
representing a Python expression, interprets it, and returns the result.

In Python 2.3, the following works right: u'\u0190'
Here, the string passed to eval() contains the literal LATIN CAPITAL
LETTER OPEN E, and the expected unicode string is returned

The following behaves "surprisingly": '\xc6\x90'
.... you seem to get the UTF-8 encoding of the unicode.

This is related to PEP 263 (http://www.python.org/peps/pep-0263.html)
but the behavior of compile(), eval() and exec don't seem to be spelled
out.

Jeff

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQFCn0c0Jd01MZaTXX0RAmSCAJ4ww3vaX76vtyCPJfbPk1t/3rQZKgCcCND/
lnsVr6GvioPRmfo83gi+y6Y=
=m3n/
-----END PGP SIGNATURE-----
 
J

John Roth

See below.
--------------


First off, I just have to correct your terminology. "exec" is a
statement, and doesn't require parentheses, so talking about "exec()"
invites confusion.

I'll answer your question in terms of eval(), which takes a string
representing a Python expression, interprets it, and returns the result.

In Python 2.3, the following works right: u'\u0190'
Here, the string passed to eval() contains the literal LATIN CAPITAL
LETTER OPEN E, and the expected Unicode string is returned

The following behaves "surprisingly": '\xc6\x90'
.... you seem to get the UTF-8 encoding of the Unicode.

This is related to PEP 263 (http://www.python.org/peps/pep-0263.html)
but the behavior of compile(), eval() and exec don't seem to be spelled
out.

Jeff

[response]

To expand on Jeff's reply:

in the first example, he's passing a Unicode string to eval(),
which contains a Unicode string that contains a Unicode escape.
The result is a Unicode string containing a single Unicode character.

In the second example,
he's passing a Unicode string to eval(), which string contains
a ***normal*** string that contains a Unicode escape. The
Unicode escape produces two characters. The result is a
***normal*** string that contains two characters.

Is this your problem?

John Roth
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,240
Messages
2,571,211
Members
47,849
Latest member
RoseannKoz

Latest Threads

Top