codecs latin1 unicode standard output file

M

Marko Faldix

Hello,

with Python 2.3 I can write umlauts (a,o,u umlaut) to a file with this piece
of code:

import codecs

f = codecs.open("klotentest.txt", "w", "latin-1")
print >>f, unicode("My umlauts are ä, ö, ü", "latin-1")


This works fine. This is not exactly what I wanted to have. I would like to
write this to standard output so that I can use same code to produce output
lines on console or to use this to pipe into file. It was possible before
Python 2.3. Isn't possible anymore with same code?


--
Marko Faldix
M+R Infosysteme
Hubert-Wienen-Str. 24 52070 Aachen
Tel.: 0241-93878-16 Fax.:0241-875095
E-Mail: markopointfaldix@mplusrpointde
 
M

Michael Hudson

Marko Faldix said:
Hello,

with Python 2.3 I can write umlauts (a,o,u umlaut) to a file with this piece
of code:

import codecs

f = codecs.open("klotentest.txt", "w", "latin-1")
print >>f, unicode("My umlauts are ä, ö, ü", "latin-1")


This works fine. This is not exactly what I wanted to have. I would like to
write this to standard output so that I can use same code to produce output
lines on console or to use this to pipe into file. It was possible before
Python 2.3. Isn't possible anymore with same code?

If your locale is setup up in an appropriate way, you should be able
to print latin-1 characters to stdout without any intervention at all.

If that doesn't work, we need more details.

Cheers,
mwh
 
M

Marko Faldix

Hi,

Michael Hudson said:
If your locale is setup up in an appropriate way, you should be able
to print latin-1 characters to stdout without any intervention at all.

If that doesn't work, we need more details.

Cheers,
mwh


I try to describe. It's a Window machine with Python 2.3.2 installed. Using
command line (cmd). Put these lines of code in a file called klotentest1.py:

# -*- coding: iso-8859-1 -*-

print unicode("My umlauts are ä, ö, ü", "latin-1")
print "My umlauts are ä, ö, ü"

Calling this on command line:

klotentest1.py

Indeed, result of first print is as desired, result of second print delivers
strange letters but no error.
Now I call this on command line:

klotentest1.py > klotentest1.txt

This fails:
Traceback (most recent call last):
File "C:\home\marko\moeller_port\moeller_port_exec_svn\klotentest1.py", line
3, in ?
print unicode("My umlauts are õ, ÷, ³", "latin-1")
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position
15: ordinal not in range(128)


( By the way: error result is same if I call it this way: python
klotentest1.py > klotentest1.txt )

In my point of view python shouldn't act in different ways whether result is
piped to file or not.

Marko Faldix
 
F

Fredrik Lundh

Marko said:
I try to describe. It's a Window machine with Python 2.3.2 installed. Using
command line (cmd). Put these lines of code in a file called klotentest1.py:

# -*- coding: iso-8859-1 -*-

print unicode("My umlauts are ä, ö, ü", "latin-1")
print "My umlauts are ä, ö, ü"

Calling this on command line:

klotentest1.py

Indeed, result of first print is as desired, result of second print delivers
strange letters but no error.

your console device doesn't use iso-8859-1; it probably uses cp850.
if you print an 8-bit string to the console, Python assumes that you
know what you're doing...
Now I call this on command line:

klotentest1.py > klotentest1.txt

This fails:
Traceback (most recent call last):
File "C:\home\marko\moeller_port\moeller_port_exec_svn\klotentest1.py", line
3, in ?
print unicode("My umlauts are õ, ÷, ³", "latin-1")
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position
15: ordinal not in range(128)

In my point of view python shouldn't act in different ways whether result is
piped to file or not.

when you print to a console with a known encoding, Python 2.3 auto-
magically converts Unicode strings to 8-bit strings using the console
encoding.

files don't have an encoding, which is why the second case fails.

also note that in 2.2 and earlier, you example always failed.

</F>
 
M

Marko Faldix

Hi,

Fredrik Lundh said:
your console device doesn't use iso-8859-1; it probably uses cp850.
if you print an 8-bit string to the console, Python assumes that you
know what you're doing...


when you print to a console with a known encoding, Python 2.3 auto-
magically converts Unicode strings to 8-bit strings using the console
encoding.

files don't have an encoding, which is why the second case fails.

also note that in 2.2 and earlier, you example always failed.

</F>

So I just have to use only this:

print "My umlauts are ä, ö, ü"

without any encoding-assignment to use for standard output on console AND
redirecting to file. In latter case, it looks nice with e.g. notepad, just
strange on console, so settings for console are to adjust and not python
code. Right?


Marko Faldix
 
M

Martin v. =?iso-8859-15?q?L=F6wis?=

Marko Faldix said:
print "My umlauts are ä, ö, ü"

without any encoding-assignment to use for standard output on console AND
redirecting to file. In latter case, it looks nice with e.g. notepad, just
strange on console, so settings for console are to adjust and not python
code. Right?

Wrong. On your operating system, notepad.exe and the console use
*different* encodings. If you think this is stupid, please complain to
Microsoft. If you print byte strings, it will come out wrong either in
the terminal, or in notepad - there is *no way* to have the same byte
string show correctly in both encodings.

If you want to output to a file, you should open the file in
locale.getpreferredencoding(). If you want to output to a terminal,
Python should automatically find out what the terminal's encoding is
(to make things worse, the user can override the terminal encoding
on Windows, on a per-terminal basis, using chcp.exe).

Regards,
Martin
 
B

Bengt Richter

Marko said:
I try to describe. It's a Window machine with Python 2.3.2 installed. Using
command line (cmd). Put these lines of code in a file called klotentest1.py: ^^^^[1]

# -*- coding: iso-8859-1 -*- ^^^^^^^^^^[2]

print unicode("My umlauts are ä, ö, ü", "latin-1")
print "My umlauts are ä, ö, ü" ^^^^^^^^^^^^^^^^^^^^^^^^[3]
[...]
Calling this on command line:

klotentest1.py

Indeed, result of first print is as desired, result of second print delivers
strange letters but no error.

your console device doesn't use iso-8859-1; it probably uses cp850.
if you print an 8-bit string to the console, Python assumes that you
know what you're doing...
I think the OP is suggesting that given [1] & [2], [3] should implicitly carry the [2] info
and be converted for output just like the result of unicode(...) is.

(I know that's not the way it works now, and I know it's not an easy problem ;-)
when you print to a console with a known encoding, Python 2.3 auto-
magically converts Unicode strings to 8-bit strings using the console
encoding.

files don't have an encoding, which is why the second case fails.
I think the OP is thinking files [1] with # -*- coding: iso-8859-1 -*- [2]
_do_ have an encoding, so in some way [3] should be an unambiguous character sequence,
not just a byte sequence (I have to get back to a previous thread with Martin, where
I owe a reply. This same issue is key there). (I realize that's not the way it works now,
and that it's a hard problem, to repeat myself ;-)

Regards,
Bengt Richter
 
S

Serge Orlov

Marko Faldix said:
So I just have to use only this:

print "My umlauts are ä, ö, ü"

without any encoding-assignment to use for standard output on console AND
redirecting to file. In latter case, it looks nice with e.g. notepad, just
strange on console, so settings for console are to adjust and not python
code. Right?

No, the right code is
=============================
# -*- coding: iso-8859-1 -*-
import locale, codecs, sys

if not sys.stdout.isatty():
sys.stdout = codecs.lookup(locale.getpreferedencoding())[3](sys.stdout)

print u"My umlauts are ä, ö, ü"
=============================
The difference between console and file output is that while
there's only one way to output ä on cp850 console, there
are many ways to output the same character to file (latin-1,
utf-8, utf-7, utf-16le, utf-16be, cp850 and maybe more).
So python refuses to guess.
Another rule to follow is to store non-ascii character in
unicode strings. Otherwise either you will have to track
the encodings yourself or assume that all 8-bits strings
in your program have the same encoding. That's not
a good idea. I'm not sure if you will have proper .upper()
and .lower() methods on 8-bit strings. (don't have python
here to check)

-- Serge.
 
?

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Bengt said:
I think the OP is thinking files [1] with # -*- coding: iso-8859-1 -*- [2]
_do_ have an encoding, so in some way [3] should be an unambiguous character sequence,
not just a byte sequence

The OP could easily overcome this aspect of the problem with a Unicode
literal (and in fact, he originally did convert the string literal to
a Unicode object before further processing).

This does not solve the problem, though: Writing the Unicode object to
a file still gives an encoding error, since he did not specify the
encoding of the file.

Regards,
Martin
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,001
Messages
2,570,254
Members
46,850
Latest member
VMRKlaus8

Latest Threads

Top