system(...) and unicode

A

andrew

Hi,

I'm seeing the following error:

...
system(cmd)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe3' in
position 57: ordinal not in range(128)

and I think I vaguely understand what's going on - "cmd" is constructed
to include a file name that is UTF-8 encoded (I think - it includes
accents when I "ls" the file - this is on a recent Suse Linux with
Python 2.4.2). So I guess I need to specify the encoding used, right?
But (1) I don't know how to do this; (2) this string came from the
filesystem in the first place, so how come it isn't managed in an
internally consistent way?; and (3) I have no explicit uncode strings
in my program.

Looking at the docs (unicode howto) it seems like maybe I need to do
system(cmd.encode(...))
but how do I know which locale and what if cmd isn't a unicode string
(I didn't make it so!)? I could force an encoding as in the unicode
howto ("filename.decode(encoding)"), but that seems to already be
happening (or is it not - am I wrong in assuming that?).

So can someone help me or point me to some more detailed instructions,
please? At the CL "locale" says en_GB.UTF-8, but I'd like this code to
work whatever the locale is, if that makes sense.

Sorry for being stupid,
Andrew
 
A

andrew

Hmmm. After reading
http://kofoto.rosdahl.net/trac/wiki/UnicodeInPython I tried:

system(cmd.encode(getfilesystemencoding()))

which works (nothing else changed). But that seems odd - is this a bug
(the asymmetry - I read files with os.listdir with no explicit unicode
handling, but need to do something explicitly on output - seems wrong),
or am I going to be bitten by other errors later?

Thanks,
Andrew
 
?

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Hmmm. After reading
http://kofoto.rosdahl.net/trac/wiki/UnicodeInPython I tried:

system(cmd.encode(getfilesystemencoding()))

which works (nothing else changed). But that seems odd - is this a bug
(the asymmetry - I read files with os.listdir with no explicit unicode
handling, but need to do something explicitly on output - seems wrong),
or am I going to be bitten by other errors later?

Whether or not listdir returns a Unicode string depends on whether you
pass a Unicode string as the directory name. So if you change the
directory name to be a byte string, the file name should be a byte
string, too.

And yes, it would be desirable to enhance system() to support Unicode
strings; contributions in that direction are welcome (although one
should then also support exec*(), spawn*(), popen*(), and the subprocess
module).

Regards,
Martin
 
A

andrew

The impression I got from the link I gave was that exec et al already
had the appropriate unicode support; system seems to be the exception.

Anyway, thanks for the info - that directory name is coming from a DOM
call, and I'm pretty sure it's returning Unicode, so that makes sense.

Andrew
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

Forum statistics

Threads
474,298
Messages
2,571,539
Members
48,274
Latest member
HowardKipp

Latest Threads

Top