Help needed with filenames

P

pdenize

I have a program that reads files using glob and puts them into an XML
file in UTF-8 using
unicode(file, sys.getfilesystemencoding()).encode("UTF-8")
This all works fine including all the odd characters like accents etc.

However I also print what it is doing and someone pointed out that
many characters are not printing correctly in the Windows command
window.

I have tried to figure this out but simply get lost in the translation
stuff.
if I just use print filename it has characters that dont match the
ones in the filename (I sorta expected that).
So I tried print unicode(file, sys.getfilesystemencoding()) expecting
the correct result, but no.
UnicodeEncodeError: 'charmap' codec can't encode character u'\u2013'

I did notice that when a windows command window does a directory
listing of these files the characters seem to be translated into close
approximations (long dash to minus, special double quotes to simple
double quotes, but still retains many of the accent chars). I looked
at translate to do this but did not know how to determine which
characters to map.

Can anyone tell me what I should be doing here?
 
M

Martin v. Löwis

Can anyone tell me what I should be doing here?

The console uses the "OEM code page". The Windows conversion
routine from Unicode to the OEM code page provides the lossy
conversion that you observe in listing.

Unfortunately, the OEM code page conversion is not available
from Python. What you can do is to use

u.encode(sys.stdout.encoding, "replace")

This will replace unprintable characters with question marks.

Regards,
Martin
 
Y

Yinon Ehrlich

I did notice that when a windows command window does a directory
listing of these files the characters seem to be translated into close
approximations (long dash to minus, special double quotes to simple
double quotes, but still retains many of the accent chars).  I looked
at translate to do this but did not know how to determine which
characters to map.

Can anyone tell me what I should be doing here?

Hi,

Seems like your problem is just with the correct representation of
those characters in Windows command-line.
I have seen two solutions for that:
* Right-click on the top bar of the the command-line window, go to
"properties--> Font", select a font that shows all characters
correctly.
* Even better: use http://sourceforge.net/projects/console - much
better that Windows' default console.

Good luck,
Yinon
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,819
Latest member
masterdaster

Latest Threads

Top