Unicode issue on Windows cmd line

J

jeffg

Having issue on Windows cmd.
Python.exe

This gives a unicode error.

Works fine in IDLE, PythonWin, and my Macbook but I need to run this
from a windows batch.

Character should look like this "ð".

Please help!
 
A

Albert Hopkins

Having issue on Windows cmd.

This gives a unicode error.

Works fine in IDLE, PythonWin, and my Macbook but I need to run this
from a windows batch.

Character should look like this "ð".

Please help!

You forgot to paste the error.
 
J

jeffg

You forgot to paste the error.

The error looks like this:
File "<stdin", line 1, in <module>
File "C:\python25\lib\encodings\cp437.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\xf0' in
position 0
: character maps to <undefined>


Running Python 2.5.4 on Windows XP
 
M

Martin v. Löwis

Having issue on Windows cmd.
This gives a unicode error.

Works fine in IDLE, PythonWin, and my Macbook but I need to run this
from a windows batch.

Character should look like this "ð".

Please help!

Well, your terminal just cannot display this character by default; you
need to use a different terminal program, or reconfigure your terminal.

For example, do

chcp 1252

and select Lucida Console as the terminal font, then try again.

Of course, this will cause *different* characters to become
non-displayable.

Regards,
Martin
 
M

MRAB

Benjamin Kaplan wrote:
[snip]
Whoops. Didn't mean to hit send there. I was going to say, you can't
have everything when Microsoft is only willing to break the programs
that average people are going to use on a daily basis. I mean, why
would they do something nice for the international community at the
expense of breaking some 20 year old batch scripts? Those were the
only things that still worked when Vista first came out.
I remember when I had to use MS-Access but it could be either of 2 versions.

The newer version couldn't open a database from the older version unless
I let it convert it first, after which point I wouldn't be able to open
it in the older version... :-(
 
J

jeffg

Well, your terminal just cannot display this character by default; you
need to use a different terminal program, or reconfigure your terminal.

For example, do

chcp 1252

and select Lucida Console as the terminal font, then try again.

Of course, this will cause *different* characters to become
non-displayable.

Regards,
Martin

Thanks, I ended up using encode('iso-8859-15', "replace")
Perhaps more up to date than cp1252...??

It still didn't print correctly, but it did write correctly, which was
my main problem.
 
M

Martin v. Löwis

Thanks, I ended up using encode('iso-8859-15', "replace")
Perhaps more up to date than cp1252...??

It still didn't print correctly, but it did write correctly, which was
my main problem.

If you encode as iso-8859-15, but this is not what your terminal
expects, it certainly won't print correctly. To get correct printing,
the output encoding must be the same as the terminal encoding. If the
terminal encoding is not up to date (as you consider cp1252), then
the output encoding should not be up to date, either.

If you want a modern encoding that supports all of Unicode, and you
don't care whether the output is legible, use UTF-8.

Regards,
Martin
 
J

jeffg

If you encode as iso-8859-15, but this is not what your terminal
expects, it certainly won't print correctly. To get correct printing,
the output encoding must be the same as the terminal encoding. If the
terminal encoding is not up to date (as you consider cp1252), then
the output encoding should not be up to date, either.

If you want a modern encoding that supports all of Unicode, and you
don't care whether the output is legible, use UTF-8.

Regards,
Martin

I did try UTF-8 but it produced the upper case character instead of
the proper lower case, so the output was incorrect for the unicode
supplied.
I think both 8859-15 and cp1252 produced the correct output, but I
figured 8859-15 would have additional character support (though not
sure this is the case - if it is not, please let me know and I'll use
1252). I'm dealing with large data sets and this just happend to be
one small example. I want to have the best ability to write future
unicode characters properly based on running from the windows command
line (unless there is a better way to do it on windows).
 
G

Gabriel Genellina

I did try UTF-8 but it produced the upper case character instead of
the proper lower case, so the output was incorrect for the unicode
supplied.
I think both 8859-15 and cp1252 produced the correct output, but I
figured 8859-15 would have additional character support (though not
sure this is the case - if it is not, please let me know and I'll use
1252). I'm dealing with large data sets and this just happend to be
one small example. I want to have the best ability to write future
unicode characters properly based on running from the windows command
line (unless there is a better way to do it on windows).

As Martin v. Löwis already said, the encoding used by Python when writing
to the console, must match the encoding the console expects. (And you also
should use a font capable of displaying such characters).

windows-1252 and iso-8859-15 are similar, but not identical. This table
shows the differences (less than 30 printable characters):
http://en.wikipedia.org/wiki/Western_Latin_character_sets_(computing)
If your script encodes its output using iso-8859-15, the corresponding
console code page should be 28605.
"Western European" (whatever that means exactly) Windows versions use the
windows-1252 encoding as the "Ansi code page" (GUI applications), and
cp850 as the "OEM code page" (console applications) -- cp437 in the US
only.

C:\Documents and Settings\Gabriel>chcp 1252
Tabla de códigos activa: 1252

C:\Documents and Settings\Gabriel>python
Python 2.6 (r26:66721, Oct 2 2008, 11:35:03) [MSC v.1500 32 bit (Intel)]
on win
32
Type "help", "copyright", "credits" or "license" for more information.
py> unichr(0x0153).encode("windows-1252")
'\x9c'
py> print _
Å“
py> ^Z

C:\Documents and Settings\Gabriel>chcp 28605
Tabla de códigos activa: 28605

C:\Documents and Settings\Gabriel>python
Python 2.6 (r26:66721, Oct 2 2008, 11:35:03) [MSC v.1500 32 bit (Intel)]
on win
32
Type "help", "copyright", "credits" or "license" for more information.
py> unichr(0x0153).encode("iso-8859-15")
'\xbd'
py> print _
Å“
py> unichr(0x0153).encode("latin9")
'\xbd'
 
J

jeffg

En Wed, 11 Feb 2009 23:11:37 -0200, jeffg <[email protected]> escribió:


I did try UTF-8 but it produced the upper case character instead of
the proper lower case, so the output was incorrect for the unicode
supplied.
I think both 8859-15 and cp1252 produced the correct output, but I
figured 8859-15 would have additional character support (though not
sure this is the case - if it is not, please let me know and I'll use
1252).  I'm dealing with large data sets and this just happend to be
one small example.  I want to have the best ability to write future
unicode characters properly based on running from the windows command
line (unless there is a better way to do it on windows).

As Martin v. Löwis already said, the encoding used by Python when writing  
to the console, must match the encoding the console expects. (And you also  
should use a font capable of displaying such characters).

windows-1252 and iso-8859-15 are similar, but not identical. This table  
shows the differences (less than 30 printable characters):  http://en.wikipedia.org/wiki/Western_Latin_character_sets_(computing)
If your script encodes its output using iso-8859-15, the corresponding  
console code page should be 28605.
"Western European" (whatever that means exactly) Windows versions use the  
windows-1252 encoding as the "Ansi code page" (GUI applications), and  
cp850 as the "OEM code page" (console applications) -- cp437 in the US  
only.

C:\Documents and Settings\Gabriel>chcp 1252
Tabla de códigos activa: 1252

C:\Documents and Settings\Gabriel>python
Python 2.6 (r26:66721, Oct  2 2008, 11:35:03) [MSC v.1500 32 bit (Intel)]  
on win
32
Type "help", "copyright", "credits" or "license" for more information.
py> unichr(0x0153).encode("windows-1252")
'\x9c'
py> print _
œ
py> ^Z

C:\Documents and Settings\Gabriel>chcp 28605
Tabla de códigos activa: 28605

C:\Documents and Settings\Gabriel>python
Python 2.6 (r26:66721, Oct  2 2008, 11:35:03) [MSC v.1500 32 bit (Intel)]  
on win
32
Type "help", "copyright", "credits" or "license" for more information.
py> unichr(0x0153).encode("iso-8859-15")
'\xbd'
py> print _
œ
py> unichr(0x0153).encode("latin9")
'\xbd'

Thanks, switched it to windows-1252.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,298
Messages
2,571,542
Members
48,282
Latest member
PrincessX3

Latest Threads

Top