Unicode issue on Windows cmd line

jeffg · Feb 11, 2009

Having issue on Windows cmd.

Python.exe

This gives a unicode error.

Works fine in IDLE, PythonWin, and my Macbook but I need to run this
from a windows batch.

Character should look like this "ð".

Please help!

Albert Hopkins · Feb 11, 2009

Having issue on Windows cmd.

This gives a unicode error.

Works fine in IDLE, PythonWin, and my Macbook but I need to run this
from a windows batch.

Character should look like this "Ã°".

Please help!

You forgot to paste the error.

jeffg · Feb 11, 2009

You forgot to paste the error.

The error looks like this:
File "<stdin", line 1, in <module>
File "C:\python25\lib\encodings\cp437.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\xf0' in
position 0
: character maps to <undefined>

Running Python 2.5.4 on Windows XP

Martin v. Löwis · Feb 11, 2009

Having issue on Windows cmd.

This gives a unicode error.

Works fine in IDLE, PythonWin, and my Macbook but I need to run this
from a windows batch.

Character should look like this "ð".

Please help!

Well, your terminal just cannot display this character by default; you
need to use a different terminal program, or reconfigure your terminal.

For example, do

chcp 1252

and select Lucida Console as the terminal font, then try again.

Of course, this will cause *different* characters to become
non-displayable.

Regards,
Martin

MRAB · Feb 11, 2009

Benjamin Kaplan wrote:
[snip]

Whoops. Didn't mean to hit send there. I was going to say, you can't
have everything when Microsoft is only willing to break the programs
that average people are going to use on a daily basis. I mean, why
would they do something nice for the international community at the
expense of breaking some 20 year old batch scripts? Those were the
only things that still worked when Vista first came out.

I remember when I had to use MS-Access but it could be either of 2 versions.

The newer version couldn't open a database from the older version unless
I let it convert it first, after which point I wouldn't be able to open
it in the older version... :-(

jeffg · Feb 11, 2009

Well, your terminal just cannot display this character by default; you
need to use a different terminal program, or reconfigure your terminal.

For example, do

chcp 1252

and select Lucida Console as the terminal font, then try again.

Of course, this will cause *different* characters to become
non-displayable.

Regards,
Martin

Thanks, I ended up using encode('iso-8859-15', "replace")
Perhaps more up to date than cp1252...??

It still didn't print correctly, but it did write correctly, which was
my main problem.

Martin v. Löwis · Feb 11, 2009

Thanks, I ended up using encode('iso-8859-15', "replace")

Perhaps more up to date than cp1252...??

It still didn't print correctly, but it did write correctly, which was
my main problem.

If you encode as iso-8859-15, but this is not what your terminal
expects, it certainly won't print correctly. To get correct printing,
the output encoding must be the same as the terminal encoding. If the
terminal encoding is not up to date (as you consider cp1252), then
the output encoding should not be up to date, either.

If you want a modern encoding that supports all of Unicode, and you
don't care whether the output is legible, use UTF-8.

Regards,
Martin

jeffg · Feb 12, 2009

If you encode as iso-8859-15, but this is not what your terminal
expects, it certainly won't print correctly. To get correct printing,
the output encoding must be the same as the terminal encoding. If the
terminal encoding is not up to date (as you consider cp1252), then
the output encoding should not be up to date, either.

If you want a modern encoding that supports all of Unicode, and you
don't care whether the output is legible, use UTF-8.

Regards,
Martin

I did try UTF-8 but it produced the upper case character instead of
the proper lower case, so the output was incorrect for the unicode
supplied.
I think both 8859-15 and cp1252 produced the correct output, but I
figured 8859-15 would have additional character support (though not
sure this is the case - if it is not, please let me know and I'll use
1252). I'm dealing with large data sets and this just happend to be
one small example. I want to have the best ability to write future
unicode characters properly based on running from the windows command
line (unless there is a better way to do it on windows).

Gabriel Genellina · Feb 12, 2009

En Wed said:
I did try UTF-8 but it produced the upper case character instead of
the proper lower case, so the output was incorrect for the unicode
supplied.
I think both 8859-15 and cp1252 produced the correct output, but I
figured 8859-15 would have additional character support (though not
sure this is the case - if it is not, please let me know and I'll use
1252). I'm dealing with large data sets and this just happend to be
one small example. I want to have the best ability to write future
unicode characters properly based on running from the windows command
line (unless there is a better way to do it on windows).

As Martin v. LÃ¶wis already said, the encoding used by Python when writing
to the console, must match the encoding the console expects. (And you also
should use a font capable of displaying such characters).

windows-1252 and iso-8859-15 are similar, but not identical. This table
shows the differences (less than 30 printable characters):
http://en.wikipedia.org/wiki/Western_Latin_character_sets_(computing)
If your script encodes its output using iso-8859-15, the corresponding
console code page should be 28605.
"Western European" (whatever that means exactly) Windows versions use the
windows-1252 encoding as the "Ansi code page" (GUI applications), and
cp850 as the "OEM code page" (console applications) -- cp437 in the US
only.

C:\Documents and Settings\Gabriel>chcp 1252
Tabla de cÃ³digos activa: 1252

C:\Documents and Settings\Gabriel>python
Python 2.6 (r26:66721, Oct 2 2008, 11:35:03) [MSC v.1500 32 bit (Intel)]
on win
32
Type "help", "copyright", "credits" or "license" for more information.
py> unichr(0x0153).encode("windows-1252")
'\x9c'
py> print _
Å“
py> ^Z

C:\Documents and Settings\Gabriel>chcp 28605
Tabla de cÃ³digos activa: 28605

C:\Documents and Settings\Gabriel>python
Python 2.6 (r26:66721, Oct 2 2008, 11:35:03) [MSC v.1500 32 bit (Intel)]
on win
32
Type "help", "copyright", "credits" or "license" for more information.
py> unichr(0x0153).encode("iso-8859-15")
'\xbd'
py> print _
Å“
py> unichr(0x0153).encode("latin9")
'\xbd'

jeffg · Feb 12, 2009

En Wed, 11 Feb 2009 23:11:37 -0200, jeffg <[email protected]> escribió:

I did try UTF-8 but it produced the upper case character instead of
the proper lower case, so the output was incorrect for the unicode
supplied.
I think both 8859-15 and cp1252 produced the correct output, but I
figured 8859-15 would have additional character support (though not
sure this is the case - if it is not, please let me know and I'll use
1252). I'm dealing with large data sets and this just happend to be
one small example. I want to have the best ability to write future
unicode characters properly based on running from the windows command
line (unless there is a better way to do it on windows).

Click to expand...

As Martin v. Löwis already said, the encoding used by Python when writing
to the console, must match the encoding the console expects. (And you also
should use a font capable of displaying such characters).

windows-1252 and iso-8859-15 are similar, but not identical. This table
shows the differences (less than 30 printable characters): http://en.wikipedia.org/wiki/Western_Latin_character_sets_(computing)
If your script encodes its output using iso-8859-15, the corresponding
console code page should be 28605.
"Western European" (whatever that means exactly) Windows versions use the
windows-1252 encoding as the "Ansi code page" (GUI applications), and
cp850 as the "OEM code page" (console applications) -- cp437 in the US
only.

C:\Documents and Settings\Gabriel>chcp 1252
Tabla de códigos activa: 1252

C:\Documents and Settings\Gabriel>python
Python 2.6 (r26:66721, Oct 2 2008, 11:35:03) [MSC v.1500 32 bit (Intel)]
on win
32
Type "help", "copyright", "credits" or "license" for more information.
py> unichr(0x0153).encode("windows-1252")
'\x9c'
py> print _
œ
py> ^Z

C:\Documents and Settings\Gabriel>chcp 28605
Tabla de códigos activa: 28605

C:\Documents and Settings\Gabriel>python
Python 2.6 (r26:66721, Oct 2 2008, 11:35:03) [MSC v.1500 32 bit (Intel)]
on win
32
Type "help", "copyright", "credits" or "license" for more information.
py> unichr(0x0153).encode("iso-8859-15")
'\xbd'
py> print _
œ
py> unichr(0x0153).encode("latin9")
'\xbd'

Thanks, switched it to windows-1252.

Newbie questions on import & cmd line run	5	May 17, 2012
Php code issue	0	Aug 28, 2024
Python 2.7 and cmd on Windows 7 64 (files lost)	6	Jun 23, 2011
debugging on windows	0	Mar 7, 2014
helping with unicode	4	Jul 3, 2012
Strange Behavior on Python 3 Windows Command Line	2	Feb 13, 2012
Command Line Arguments	0	Mar 7, 2023
I can NOT install Anaconda on my Windows laptop correctly	2	Sep 18, 2023

Unicode issue on Windows cmd line

jeffg

Albert Hopkins

jeffg

Martin v. Löwis

MRAB

jeffg

Martin v. Löwis

jeffg

Gabriel Genellina

jeffg

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads