What is file.encoding convention?

N

Naoki INADA

In document <http://docs.python.org/library/
stdtypes.html#file.encoding>:

But in logging.StreamHandler.emit() ::

try:
if (isinstance(msg, unicode) and
getattr(stream, 'encoding', None)):
#fs = fs.decode(stream.encoding)
try:
stream.write(fs % msg)
except UnicodeEncodeError:
#Printing to terminals sometimes fails.
For example,
#with an encoding of 'cp1251', the above
write will
#work if written to a stream opened or
wrapped by
#the codecs module, but fail when writing
to a
#terminal even when the codepage is set to
cp1251.
#An extra encoding step seems to be
needed.
stream.write((fs % msg).encode
(stream.encoding))
else:
stream.write(fs % msg)
except UnicodeError:
stream.write(fs % msg.encode("UTF-8"))

And behavior of sys.stdout in Windows::Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position
0-2: ordinal not in range(128)


What is file.encoding convention?
If I want to write a unicode string to a file(-like) that have
encoding attribute, I should do
(1) try: file.write(unicode_str),
(2) except UnicodeEncodeError: file.write(unicode_str.encode
(file.encoding))
like logging?
It seems agly.
 
N

Naoki INADA

What is file.encoding convention?
If I want to write a unicode string to a file(-like) that have
encoding attribute, I should do
(1) try: file.write(unicode_str),
(2) except UnicodeEncodeError: file.write(unicode_str.encode
(file.encoding))
like logging?
It seems agly.

s/agly/ugly/
 
V

Vinay Sajip

In document <http://docs.python.org/library/
stdtypes.html#file.encoding>:


But inlogging.StreamHandler.emit() ::

try:
if (isinstance(msg, unicode) and
getattr(stream, 'encoding', None)):
#fs = fs.decode(stream.encoding)
try:
stream.write(fs % msg)
except UnicodeEncodeError:
#Printing to terminals sometimes fails.
For example,
#with an encoding of 'cp1251', the above
write will
#work if written to a stream opened or
wrapped by
#the codecs module, but fail when writing
to a
#terminal even when the codepage is set to
cp1251.
#An extra encoding step seems to be
needed.
stream.write((fs % msg).encode
(stream.encoding))
else:
stream.write(fs % msg)
except UnicodeError:
stream.write(fs % msg.encode("UTF-8"))

And behavior of sys.stdout in Windows::>>> import sys

u'\u3042\u3044\u3046'>>> print >>sys.stdout, u
$B$"$$$&(B

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position
0-2: ordinal not in range(128)

What is file.encoding convention?
If I want to write a unicode string to a file(-like) that have
encoding attribute, I should do
(1) try: file.write(unicode_str),
(2) except UnicodeEncodeError: file.write(unicode_str.encode
(file.encoding))
likelogging?
It seems agly.

If you are writing a Unicode string to a stream which has been opened
with e.g. codecs.open with a specific encoding, then the stream is
actually a wrapper. You can write Unicode strings directly to it, and
the wrapper stream will encode the Unicode to bytes using the specific
encoding and write those bytes to the underlyting stream. In your
example you didn't show sys.stderr.encoding - you showed
sys.stdout.encoding and printed out something to it which seemed to
give the correct result, but then wrote to sys.stderr which gave a
UnicodeEncodeError. What is the encoding of sys.stderr in your
example? Also note that logging had to handle what appeared to be an
oddity with terminals - they (at least sometimes) have an encoding
attribute but appear to expect to have bytes written to them, and not
Unicode. Hence the logging kludge, which should not be needed and so
has been carefully commented.

Regards,

Vinay Sajip
 
V

Vinay Sajip

In document <http://docs.python.org/library/
stdtypes.html#file.encoding>:


But inlogging.StreamHandler.emit() ::

try:
if (isinstance(msg, unicode) and
getattr(stream, 'encoding', None)):
#fs = fs.decode(stream.encoding)
try:
stream.write(fs % msg)
except UnicodeEncodeError:
#Printing to terminals sometimes fails.
For example,
#with an encoding of 'cp1251', the above
write will
#work if written to a stream opened or
wrapped by
#the codecs module, but fail when writing
to a
#terminal even when the codepage is set to
cp1251.
#An extra encoding step seems to be
needed.
stream.write((fs % msg).encode
(stream.encoding))
else:
stream.write(fs % msg)
except UnicodeError:
stream.write(fs % msg.encode("UTF-8"))

And behavior of sys.stdout in Windows::>>> import sys

u'\u3042\u3044\u3046'>>> print >>sys.stdout, u
$B$"$$$&(B

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position
0-2: ordinal not in range(128)

What is file.encoding convention?
If I want to write a unicode string to a file(-like) that have
encoding attribute, I should do
(1) try: file.write(unicode_str),
(2) except UnicodeEncodeError: file.write(unicode_str.encode
(file.encoding))
likelogging?
It seems agly.

Further to my earlier mail, please have a look at the following
screenshot:

http://imgur.com/FtAi0.png

As you can see, the codepage is set to 1251 (Cyrillic) at the
beginning. A Unicode string is initialised with Cyrillic code points.
Then sys.stdout.encoding shows 'cp1251', but writing the string to it
gives a UnicodeEncodeError. Explicitly encoding the string and writing
it works. Next, we get a wrapper from the codecs module for the same
encoding and use it to wrap sys.stdout. Writing the Unicode string to
the wrapped string works, too.

So the problem is essentially this: if a stream has an encoding
attribute, sometimes it is a wrapped stream which encodes Unicode
(e.g. a stream obtained via the codecs module) and sometimes it is not
(e.g. sys.stdout, sys.stderr).

Regards,

Vinay
 
N

Naoki INADA

What is the encoding of sys.stderr in your example?
Sorry, I missed. It is cp932
So the problem is essentially this: if a stream has an encoding
attribute, sometimes it is a wrapped stream which encodes Unicode
(e.g. a stream obtained via the codecs module) and sometimes it is not
(e.g. sys.stdout, sys.stderr).

Yes! I confused by it.

I feel this doc means "file object with encoding attribute encodes
unicode
regardless it is tty or not" and sys.stdout/stderr defies convention.

If this doc means "file object encodes unicode if it isn't tty.", I
should write
like below::

if not afile.isatty():
if getattr(afile, "encoding") is not None:
afile.write(unicode_str)
elif getattr(afile, "encoding") is not None:
afile.write(unicode_str.encode(afile.encoding))
else:
afile.write(unicode_str.encode(fallback_encoding)) # utf8,
defaultencoding, preferedencoding, ...


"Writing unicode to a file(-like)" is a simple requirement.
Does python have any simple resolution for it?
 
V

Vinay Sajip

s/I confused/I am confused/


s/resolution/solution/

Of course, Python 3 has much better Unicode support:
---------------------------------------------------------------------
C:\Users\Vinay>chcp 1251
Active code page: 1251

C:\Users\Vinay>\python31\python
ActivePython 3.1.0.1 (ActiveState Software Inc.) based on
Python 3.1 (r31:73572, Jun 28 2009, 19:55:39) [MSC v.1500 32 bit
(Intel)] on win
32
Type "help", "copyright", "credits" or "license" for more information.до ÑвиданиÑ>>> ^Z
---------------------------------------------------------------------

Regards,


Vinay Sajip
 
P

Piet van Oostrum

Naoki INADA said:
NI> "Writing unicode to a file(-like)" is a simple requirement.
NI> Does python have any simple resolution for it?

Yes, Python 3 will do this. For Python < 3.0 you will have to use a
codecs wrapper or explicitely do the encoding.
 
G

Gabriel Genellina

En Thu, 23 Jul 2009 21:49:26 -0300, Naoki INADA <[email protected]>
escribió:

I feel this doc means "file object with encoding attribute encodes
unicode
regardless it is tty or not" and sys.stdout/stderr defies convention.

The 'encoding' file attribute is very confusing and mostly unsupported (or
there is a bug in the documentation, at least in Python 2.x). See
http://bugs.python.org/issue4947
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,189
Members
46,735
Latest member
HikmatRamazanov

Latest Threads

Top