What encoding is used when initializing sys.argv?

P

Petr Prikryl

Hi,

When solving the problem of passing the unicode
directory name through command line into a script
(MS Windows environment), I have discovered that
I do not understand what encoding should be used
to convert the sys.argv into unicode.

I know about the rejected attempt to implement
sys.argvu. Still, how the sys.argv is filled? What
encoding is used when parsing the cmd line internally?
To what encoding is it converted when non ASCII
characters appear?

Thanks for your time and experience,
pepr
 
?

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Petr said:
I know about the rejected attempt to implement
sys.argvu. Still, how the sys.argv is filled? What
encoding is used when parsing the cmd line internally?
To what encoding is it converted when non ASCII
characters appear?

Python does not perform any conversion whatsoever.
It has a traditional main() function, with the
char *argv[] argument.

So if you think that the arguments are inherently
Unicode on your system, your question should be
"how does my operating system convert the arguments"?

That, of course, depends on your operating system.
"MS Windows environment" is not precise enough, since
it also depends on the specific incarnation of that
environment. On Windows 9x, I believe the command
line arguments are "inherently" *not* in Unicode,
but in a char array. On Windows NT+, they are Unicode,
and Windows (or is it the MS VC runtime?) converts them
to characters using the CP_ACP code page.

Kind regards,
Martin
 
N

Neil Hodgson

Petr Prikryl:
... I have discovered that
I do not understand what encoding should be used
to convert the sys.argv into unicode.

Martin mentioned CP_ACP. In Python on Windows, this can be accessed
as the "mbcs" codec.

import sys
print repr(sys.argv[1])
print repr(unicode(sys.argv[1], "mbcs"))

C:\bin>python glurp.py abcß•
'abc\xdf\x95'
u'abc\xdf\u2022'

Neil
 
T

Tim Roberts

Neil Hodgson said:
Petr Prikryl:
... I have discovered that
I do not understand what encoding should be used
to convert the sys.argv into unicode.

Martin mentioned CP_ACP. In Python on Windows, this can be accessed
as the "mbcs" codec.

import sys
print repr(sys.argv[1])
print repr(unicode(sys.argv[1], "mbcs"))

C:\bin>python glurp.py abcß•
'abc\xdf\x95'
u'abc\xdf\u2022'

There's another entry in my "keep this post forever" file.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Staff online

Members online

Forum statistics

Threads
474,264
Messages
2,571,323
Members
48,007
Latest member
Elvis60357

Latest Threads

Top