Encoding problems

Gandalf · Sep 2, 2004

Hi All!

I have a program that looks like this:

# -*- coding: iso-8859-2 -*-
s1 = 'néz'
s2 = raw_input('Please type in "néz":')
print repr(s1)
print repr(s2)

I always type in the same ('néz') on the input.

On windows, this the result:

C:\Temp\>test.py
Please type in "nÚz":néz
'n\xe9z'
'n\x82z'

C:\Temp>

On FreeBSD, this is the result:

%python ./test.py
Please type in "néz":néz
'n\xe9z'
'n\xe9z'
%

Apparently, the encoding of the python file and the encoding used on the
win32 console is different.
I need to write a console mode program that processes input from the
console. Both on UNIX and Windows.
It is no use if I give the encoding of the file, the raw input will be
still bad. Is there a way to give
an encoding for raw_input somehow? Of course I could convert the input
explicitely but it depends on the
platform.... Somehow Python should know the encoding of the console.

Comments are more than welcome.

Laci 2.0

Michel Claveau - abstraction méta-galactique non t · Sep 2, 2004

Hi !

On W-XP / W2K, you can solve the problem with :
- change the font of the console to "Lucida handwriter"
- change your script to :

# -*- coding: cp1252 -*-

import os
ecran=os.popen('MODE CON: CP SELECT=1252').readlines()

s1 = 'néz'
s2 = raw_input('Please type in "néz":')
print repr(s1)
print repr(s2)

Note than you can force the console to MODE CP = 1252, and del two lines or
the script. Perso, I do it via an icon.

*sorry for my bad english*

@-salutations

Gandalf · Sep 2, 2004

Michel Claveau - abstraction méta-galactique non triviale en fuite

perpétuelle. said:
Hi !

On W-XP / W2K, you can solve the problem with :
- change the font of the console to "Lucida handwriter"
- change your script to :

# -*- coding: cp1252 -*-

import os
ecran=os.popen('MODE CON: CP SELECT=1252').readlines()

s1 = 'néz'
s2 = raw_input('Please type in "néz":')
print repr(s1)
print repr(s2)

Okay, I understand now.
This is a fault of the win32 console - it defaults to a different
encoding than other parts of the Windows system.
This is messy but we cannot do anything about it. :-(

*sorry for my bad english*

Not bad at all.
Thanks for your help.

Laci 2.0

=?ISO-8859-2?Q?=22Martin_v=2E_L=F6wis=22?= · Sep 2, 2004

Gandalf said:
This is a fault of the win32 console - it defaults to a different
encoding than other parts of the Windows system.
This is messy but we cannot do anything about it. :-(

It's better than you think. Python, starting with 2.3, will do the
right thing for

# -*- coding: cp1252 -*-
print u"néz"

It determines that this is a Windows console, determines its encoding,
and converts the Unicode string to that encoding. Of course, this
requires the string to be a Unicode literal. So you'ld expect that

bildschirm = raw_input(u"néz")

works, but unfortunately, it doesn't, as raw_input does not support
Unicode. However, the encoding Python has determined is available
as sys.stdout.encoding, so you can do

bildschirm = raw_input(u"néz".encode(sys.stdout.encoding))

This works even if the user has done chcp in the window, as Python
queries the window what its encoding is, during Python startup.

HTH,
Martin

Gandalf · Sep 3, 2004

works, but unfortunately, it doesn't, as raw_input does not support
Unicode. However, the encoding Python has determined is available
as sys.stdout.encoding, so you can do

bildschirm = raw_input(u"néz".encode(sys.stdout.encoding))

This works even if the user has done chcp in the window, as Python
queries the window what its encoding is, during Python startup.

'cp852'

It is way strange!
I understand that we need to do encoding for the output because strings
in the program files needs to be encoded to the terminal's encoding
before printing.
However, the input (the result of raw_input) will be in the correct
encoding (iso-8859-2 in my case) without any conversion.
I do not understand why is that? The stdin encoding is cp852, not
iso-8859-2.

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= · Sep 3, 2004

Gandalf said:
However, the input (the result of raw_input) will be in the correct
encoding (iso-8859-2 in my case) without any conversion.
I do not understand why is that? The stdin encoding is cp852, not
iso-8859-2.

Why do you say that the input will be in the correct encoding? In
your original message, you said that you got this:

'n\xe9z'
'n\x82z'

where the first string was repr(s1) (i.e. in the source encoding,
iso-8859-2). The second string (repr(s2)) is the one that you got
from raw_input, so it is *not* in iso-8859-2. Why do you say it
is?

Regards,
Martin

Michel Claveau - abstraction méta-galactique non t · Sep 3, 2004

Bonsoir !

Le code-page de l' iso-8859-2 doit être le 912 (d'après Google)

iso-8859-2 <==> cp912

@-salutations

How to convert CSV to parquet file without RLE_DICTIONARY encoding?	0	Sep 2, 2022
files.py (encoding error)	0	Jun 10, 2013
Rock, Paper, Scissor game. Im getting TypeError, unsupported operand type(s) for -=: 'NoneType' and 'int'	2	Aug 29, 2023
Weird behaviour?	9	Apr 22, 2013
One of the comments on the Python video made me laugh out loud	1	Jul 31, 2024
Can anyone help me code a simple python code?	1	Mar 13, 2022
Translater + module + tkinter	1	Feb 16, 2023
files.py (weird encoding error)	0	Jun 10, 2013

Encoding problems

Gandalf

Michel Claveau - abstraction méta-galactique non t

Gandalf

=?ISO-8859-2?Q?=22Martin_v=2E_L=F6wis=22?=

Gandalf

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Michel Claveau - abstraction méta-galactique non t

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads