Encoding problems

G

Gandalf

Hi All!

I have a program that looks like this:

# -*- coding: iso-8859-2 -*-
s1 = 'néz'
s2 = raw_input('Please type in "néz":')
print repr(s1)
print repr(s2)

I always type in the same ('néz') on the input.

On windows, this the result:

C:\Temp\>test.py
Please type in "nÚz":néz
'n\xe9z'
'n\x82z'

C:\Temp>

On FreeBSD, this is the result:

%python ./test.py
Please type in "néz":néz
'n\xe9z'
'n\xe9z'
%

Apparently, the encoding of the python file and the encoding used on the
win32 console is different.
I need to write a console mode program that processes input from the
console. Both on UNIX and Windows.
It is no use if I give the encoding of the file, the raw input will be
still bad. Is there a way to give
an encoding for raw_input somehow? Of course I could convert the input
explicitely but it depends on the
platform.... Somehow Python should know the encoding of the console.

Comments are more than welcome.

Laci 2.0
 
M

Michel Claveau - abstraction méta-galactique non t

Hi !


On W-XP / W2K, you can solve the problem with :
- change the font of the console to "Lucida handwriter"
- change your script to :

# -*- coding: cp1252 -*-

import os
ecran=os.popen('MODE CON: CP SELECT=1252').readlines()

s1 = 'néz'
s2 = raw_input('Please type in "néz":')
print repr(s1)
print repr(s2)



Note than you can force the console to MODE CP = 1252, and del two lines or
the script. Perso, I do it via an icon.



*sorry for my bad english*



@-salutations
 
G

Gandalf

Michel Claveau - abstraction méta-galactique non triviale en fuite
perpétuelle. said:
Hi !


On W-XP / W2K, you can solve the problem with :
- change the font of the console to "Lucida handwriter"
- change your script to :

# -*- coding: cp1252 -*-

import os
ecran=os.popen('MODE CON: CP SELECT=1252').readlines()

s1 = 'néz'
s2 = raw_input('Please type in "néz":')
print repr(s1)
print repr(s2)
Okay, I understand now.
This is a fault of the win32 console - it defaults to a different
encoding than other parts of the Windows system.
This is messy but we cannot do anything about it. :-(
*sorry for my bad english*
Not bad at all.
Thanks for your help.

Laci 2.0
 
?

=?ISO-8859-2?Q?=22Martin_v=2E_L=F6wis=22?=

Gandalf said:
This is a fault of the win32 console - it defaults to a different
encoding than other parts of the Windows system.
This is messy but we cannot do anything about it. :-(

It's better than you think. Python, starting with 2.3, will do the
right thing for

# -*- coding: cp1252 -*-
print u"néz"

It determines that this is a Windows console, determines its encoding,
and converts the Unicode string to that encoding. Of course, this
requires the string to be a Unicode literal. So you'ld expect that

bildschirm = raw_input(u"néz")

works, but unfortunately, it doesn't, as raw_input does not support
Unicode. However, the encoding Python has determined is available
as sys.stdout.encoding, so you can do

bildschirm = raw_input(u"néz".encode(sys.stdout.encoding))

This works even if the user has done chcp in the window, as Python
queries the window what its encoding is, during Python startup.

HTH,
Martin
 
G

Gandalf

works, but unfortunately, it doesn't, as raw_input does not support
Unicode. However, the encoding Python has determined is available
as sys.stdout.encoding, so you can do

bildschirm = raw_input(u"néz".encode(sys.stdout.encoding))

This works even if the user has done chcp in the window, as Python
queries the window what its encoding is, during Python startup.
'cp852'

It is way strange!
I understand that we need to do encoding for the output because strings
in the program files needs to be encoded to the terminal's encoding
before printing.
However, the input (the result of raw_input) will be in the correct
encoding (iso-8859-2 in my case) without any conversion.
I do not understand why is that? The stdin encoding is cp852, not
iso-8859-2.
 
?

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Gandalf said:
However, the input (the result of raw_input) will be in the correct
encoding (iso-8859-2 in my case) without any conversion.
I do not understand why is that? The stdin encoding is cp852, not
iso-8859-2.

Why do you say that the input will be in the correct encoding? In
your original message, you said that you got this:

'n\xe9z'
'n\x82z'

where the first string was repr(s1) (i.e. in the source encoding,
iso-8859-2). The second string (repr(s2)) is the one that you got
from raw_input, so it is *not* in iso-8859-2. Why do you say it
is?

Regards,
Martin
 
M

Michel Claveau - abstraction méta-galactique non t

Bonsoir !


Le code-page de l' iso-8859-2 doit être le 912 (d'après Google)

iso-8859-2 <==> cp912


@-salutations
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,205
Messages
2,571,067
Members
47,673
Latest member
MahaliaPal

Latest Threads

Top