Python & unicode

G

Guest

Hi !

If Python is Ok with Unicode, why the next script not run ?


# -*- coding: utf-8 -*-

def режим(toto):
return(toto*3)

aret = режим(4)









@-salutations
 
J

John Roth

It doesn't work because Python scripts
must be in ASCII except for the
contents of string literals. Having a function
name in anything but ASCII isn't
supported.

John Roth

"Michel Claveau - abstraction méta-galactique non triviale en fuite
message Hi !

If Python is Ok with Unicode, why the next script not run ?


# -*- coding: utf-8 -*-

def режим(toto):
return(toto*3)

aret = режим(4)









@-salutations
 
L

Leif K-Brooks

John said:
It doesn't work because Python scripts must be in ASCII except for
the contents of string literals. Having a function name in anything
but ASCII isn't supported.

To nit-pick a bit, identifiers can be in Unicode; they're simply
confined to digits and plain Latin letters.
 
M

Michel Claveau - abstraction méta-galactique non t

Hi !

But not all letters (no : é à ç à ê ö ñ etc.)



Therefore, the Python's support of Unicode is... limited.



Good night
 
R

Radovan Garabik

Michel Claveau - abstraction méta-galactique non triviale en fuite perpétuelle. said:
Hi !


But not all letters (no : é à ç à ê ö ñ etc.)

.... and some more letters that are not latin (j,w,u,z)
ok, I'd better shut up :)


--
-----------------------------------------------------------
| Radovan Garabík http://melkor.dnp.fmph.uniba.sk/~garabik/ |
| __..--^^^--..__ garabik @ kassiopeia.juls.savba.sk |
-----------------------------------------------------------
Antivirus alert: file .signature infected by signature virus.
Hi! I'm a signature virus! Copy me into your signature file to help me spread!
 
M

michele.simionato

I forgot to add the following:
The letter è

Python identifiers can be generic strings, including Latin-1
characters;
they cannot be unicode strings, however:
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe8' in
position 0: ordinal not in range(128)

So you are right after all, but I though most people didn't know that
you can have
valid identifiers with accented letters, spaces, and non printable
chars.
setattr(C, " ", "this works")
getattr(C, " ")


Michele Simionato
 
P

P

Michel Claveau - abstraction méta-galactique non triviale en fuite
perpétuelle. said:
Hi !

If Python is Ok with Unicode, why the next script not run ?

# -*- coding: utf-8 -*-

def режим(toto):
return(toto*3)

Because the coding is only supported in string literals.
But I'm not sure exactly why. It would be nice to do:

import math
Ï€ = math.pi
 
K

Kent Johnson

I forgot to add the following:



u'The letter \xe8'


The letter è

But try this: File "<stdin>", line 1
C.è
^
SyntaxError: invalid syntax
Python identifiers can be generic strings, including Latin-1
characters;

I don't think so. You have hacked an attribute with latin-1 characters in it, but you haven't
actually created an identifier.

According to the language reference, identifiers can only contain letters a-z and A-Z, digits 0-9
and underscore.
http://docs.python.org/ref/identifiers.html

Kent
 
S

Serge.Orlov

Michel Claveau - abstraction méta-galactique non triviale en fuite
perpétuelle. said:
Hi !


But not all letters (no : é à ç à ê ö ñ etc.)



Therefore, the Python's support of Unicode is... limited.

So is the support of Unicode in virtually every computer language
because they don't support ... digits except 0..9. Does anyone know a
language that supports?

Serge.
 
M

michele.simionato

Kent:
I don't think so. You have hacked an attribute with latin-1 characters in it, but you
haven't actually created an identifier.

No, I really created an identifier. For instance
I can create a global name in this way:
globals()["è"]=1
globals()["è"]
1

According to the language reference, identifiers can only contain letters a-z and A-Z,
digits 0-9 and underscore.
http://docs.python.org/ref/identifiers.html

The parser has this restriction, so it gets confused if it finds "è".
But the underlying
implementation just works for generic identifiers.
Michele Simionato
 
K

Kent Johnson

Kent:
I don't think so. You have hacked an attribute with latin-1

characters in it, but you
haven't actually created an identifier.


No, I really created an identifier. For instance
I can create a global name in this way:

globals()["è"]=1
globals()["è"]

1

Maybe I'm splitting hairs but to me an identifier is a syntactical element that can be used in
specific ways. For example the syntax defines
attributeref ::=
primary "." identifier
so if identifiers can contain latin-1 characters you should be able to say
C.è=1

Kent
 
S

Scott David Daniels

Because the coding is only supported in string literals.
But I'm not sure exactly why.
The why is the same as why we write in English on this newsgroup.
Not because English is better, but because that leaves a single
language for everyone to use to communicate in. If you allow
non-ASCII characters in symbol names, your source code will be
unviewable (and uneditable) for people with ASCII-only terminals,
never mind how comprehensible it might otherwise be. It is a
least-common-denominator argument, not a "this is better"
argument.

-Scott David Daniels
(e-mail address removed)
 
L

Leif K-Brooks

So is the support of Unicode in virtually every computer language
because they don't support ... digits except 0..9.

Hex digits aren't 0..9.

Python 2.4 (#2, Dec 3 2004, 17:59:05)
[GCC 3.3.5 (Debian 1:3.3.5-2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.'0x7b'
 
M

Michel Claveau - abstraction méta-galactique non t

Hi !
argument.

I understand, but I have a feeling of attempt at hegemony. Is english
language really least-common-denominator for a russian who writes into
cyrillic, or not anglophone chinese?

And, did you think of klingons?

;-)


Michel Claveau
 
S

Serge Orlov

Michel Claveau - abstraction méta-galactique non triviale en fuite
perpétuelle. said:
Hi !

argument.

I understand, but I have a feeling of attempt at hegemony. Is english
language really least-common-denominator for a russian who writes into
cyrillic, or not anglophone chinese?

I don't know about Chinese but English *is* the least common
denominator for native Russian software developers, there are a lot of
reasons for that:

- to switch between Russian keyboard layout and English keyboard you
need to press a switch key or usually even two keys (at the same time).
Since language syntax and library calls are in English you have to
switch often. Very often you forget what is the current keyboard layout
and start typing in wrong one and you have to delete the garbage, hit
switch key and type it again. If it happens ten times every ten minutes
it will drive you crazy.

- Most of native Russian developers graduated from universities or
institutes. They attended hundreds of hours of math and physics
classes. All these classes use latin notation.

- Any serious local sw development job application mentions "Technical
English" as requirement. It means you're expected to read technical
documents in English.

- At the same time majority of native Russians developers do not speak
English very well and they feel they need more English practice. Using
English identifiers is a chance to practice while you work.

- The amount of useful information in English is much greater than in
Russian, thanks to Internet.

Surprised? :)

Serge.
 
M

Michel Claveau - abstraction méta-galactique non t

Hi !

Sorry, but I think that, for russians, english is an *add-on*, and not a
common-denominator.
English is the most known language, but it is not common. It is the same
difference as between co-operation and colonization.

Have a good day
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,215
Messages
2,571,113
Members
47,716
Latest member
MiloManley

Latest Threads

Top