unicode as valid naming symbols

M

Mark H Harris

greetings, I would like to create a lamda as follows:

√ = lambda n: sqrt(n)



however this works:

The question is which unicode(s) are capable of being proper name
characters, and which ones are off-limits, and why?


marcus
 
W

wxjmfauth

Le mardi 25 mars 2014 19:30:34 UTC+1, Mark H. Harris a écrit :
greetings, I would like to create a lamda as follows:



√ = lambda n: sqrt(n)





On my keyboard mapping the "problem" character is alt-v which produces

the radical symbol. When trying to set the symbol as a name within the

name-space gives a syntax error:




SyntaxError: invalid character in identifier




however this works:






The question is which unicode(s) are capable of being proper name

characters, and which ones are off-limits, and why?





marcus
S.isidentifier() -> bool

Return True if S is a valid identifier according
to the language definition.
cf "unicode.org" doc

jmf
 
M

Mark H Harris

S.isidentifier() -> bool

Return True if S is a valid identifier according
to the language definition.

cf "unicode.org" doc

Excellent, thanks!

marcus
 
M

MRAB

greetings, I would like to create a lamda as follows:

√ = lambda n: sqrt(n)




however this works:


The question is which unicode(s) are capable of being proper name
characters, and which ones are off-limits, and why?
It's explained in PEP 3131.

Basically, a name should to start with a letter (this has been extended
to include Chinese characters, etc) or an underscore.

λ is a classified as Lowercase_Letter.

√ is classified as Math_Symbol.
 
M

Mark H Harris

It's explained in PEP 3131.

Basically, a name should to start with a letter (this has been extended
to include Chinese characters, etc) or an underscore.

λ is a classified as Lowercase_Letter.

√ is classified as Math_Symbol.

Thanks much! I'll note that for improvements. Any unicode symbol
(that is not a number) should be allowed as an identifier.

marcus
 
M

Mark H Harris

It's explained in PEP 3131.

Basically, a name should to start with a letter (this has been extended
to include Chinese characters, etc) or an underscore.

λ is a classified as Lowercase_Letter.

√ is classified as Math_Symbol.

Thanks much! I'll note that for improvements. Any unicode symbol
(that is not a number) should be allowed as an identifier.

marcus
 
D

Dave Angel

Mark H Harris said:
greetings, I would like to create a lamda as follows:

√ = lambda n: sqrt(n)




however this works:


The question is which unicode(s) are capable of being proper name
characters, and which ones are off-limits, and why?

See the official docs


http://docs.python.org/3/reference/lexical_analysis.html#identifiers

There's also a method on str that'll tell you: isidentifier ().
To see such methods, use dir ("")

As for why, you can get a pretty good idea from the reference
above, as it lists 12 unicode categories that can be used. You
can also look at pep3131 and at Potsdam ' s site. Both links are
on the above page. Letters, marks, connectors, and numbers, but
not punctuation.
 
M

Marko Rauhamaa

Mark H Harris said:
Thanks much! I'll note that for improvements. Any unicode symbol
(that is not a number) should be allowed as an identifier.

I don't know if that's a good idea, but that's how it is in lisp/scheme.

Thus, "*" and "1+" are normal identifiers in lisp and scheme.


Marko
 
I

Ian Kelly

Thanks much! I'll note that for improvements. Any unicode symbol (that
is not a number) should be allowed as an identifier.

√ cannot be used in identifiers for the same reasons that + and ~
cannot: identifiers are intended to be alphanumeric. √ is not
currently the name of an operator, but who knows what may happen in
the future?

Python generally follows Annex 31 of the Unicode standard in this regard:

http://www.unicode.org/reports/tr31/
 
S

Skip Montanaro

I don't know if that's a good idea, but that's how it is in lisp/scheme.

Thus, "*" and "1+" are normal identifiers in lisp and scheme.

But parsing Lisp is pretty trivial.

Skip
 
T

Tim Chase

Thanks much! I'll note that for improvements. Any unicode
symbol (that is not a number) should be allowed as an identifier.

It's not just about number'ness:
True

-tkc
 
C

Cameron Simpson

I don't know if that's a good idea, but that's how it is in lisp/scheme.

I think it is a terrible idea. Doing that preemptively prevents
allowing them for any other purpose in the future, ever.

Identifiers are easy if you stick to the corresponding Unicode class.

Sucking in every other symbol prevents other uses later. Such as using the
square root symbol as a prefix operator. Etc.

Don't be too grabby with syntax; it leaves no room later for better syntax.

Cheers,
 
E

Ethan Furman

Thanks much! I'll note that for improvements. Any unicode symbol (that is not a number) should be allowed as an
identifier.

No, it shouldn't. Doing so would mean we could not use √ as the square root operator in the future.

Identifiers are made up of letters, numbers, and the underscore. Considering all the unicode letters and unicode
numbers out there, you shouldn't be lacking for names.
 
S

Steven D'Aprano

Thanks much! I'll note that for improvements. Any unicode symbol
(that is not a number) should be allowed as an identifier.


To quote a great Spaniard:

“You keep using that word, I do not think it means what you
think it means.â€


Do you think that the ability to write this would be an improvement?

import ⌺
⌚ = ⌺.╩░
⑥ = 5*⌺.⋨⋩
â¹ = â‘¥ - 1
â™…âš•âš› = [⌺.✱✳**⌺.â‡*â¹{â ª|⌚.∣} for â ª in ⌺.⣚]
⌺.˘˜¨´՛՜(♅⚕⚛)


Of course, it's not even necessary to be that exotic. "Any unicode symbol
that is not a number"... that means things like these:

x+y
spam.eggs
cheese["toast"]

would count as identifiers, which could lead to a little bit of parsing
ambiguity... *wink*

There are languages that can allow arbitrary symbols as identifiers, like
Lisp and Forth. You will note that they have a certain reputation for
being, um, different, and although both went through periods of
considerable popularity, both have faded in popularity since. While they
have their strengths, and their defenders, nobody argues that they are
readily accessible to the average programmer.
 
R

Rustom Mody

Le mardi 25 mars 2014 19:30:34 UTC+1, Mark H. Harris a écrit :
S.isidentifier() -> bool
Return True if S is a valid identifier according
to the language definition.

Thanks jmf!
You obviously have more unicode knowledge than many (most?) of us here.
And when you contribute that knowledge in short-n-sweet form as above
it is helpful to all.
cf "unicode.org" doc

Ummm...
Less helpful here.
What/where do you expect someone to start reading?
If a python beginner asks some basic question and someone here were to say
"Go read up on http://python.org"
who is helped?
 
T

Terry Reedy

greetings, I would like to create a lamda as follows:

A lambda is a function lacking a proper name.
√ = lambda n: sqrt(n)

This is discouraged in PEP8. If the following worked,

def √(n): return sqrt(n)

would have √ as its __name__ attribute
 
M

MRAB

No, it shouldn't. Doing so would mean we could not use √ as the square root operator in the future.
Or as a root operator, e.g. 3 √ x (the cube root of x).
 
A

Antoon Pardon

No, it shouldn't. Doing so would mean we could not use √ as the
square root operator in the future.

And what advantage would that bring over just using it as a function?
 
A

Antoon Pardon

Or as a root operator, e.g. 3 √ x (the cube root of x).
Personally I would think such an operator is too limited to include in a programming language.
This kind of notation is only used with a constant to indicate what kind of root is taken and
only with positive integers. Something like the equivallent of the following I have never seen.

t = 2.5
x = 8.2
y = t √ x

Of course we don't have to follow mathematical convention with python. However allowing any
unicode symbol as an identifier doesn't prohibit from using √ as an operator. We do have
"in" and "is" as operators now, even if they would otherwise be acceptable identifiers.
So I wonder, would you consider to introduce log as an operator. 2 log x seems an interesting
operation for a programmer.
 
I

Ian Kelly

Personally I would think such an operator is too limited to include in a programming language.
This kind of notation is only used with a constant to indicate what kind of root is taken and
only with positive integers. Something like the equivallent of the following I have never seen.

t = 2.5
x = 8.2
y = t √ x

An example is taking the geometric mean of an arbitrary number of values:

product = functools.reduce(operator.mul, values, 1)
n = len(values)
geometric_mean = n √ product

I might argue though for the inverted syntax (product √ n) to more
closely parallel division.

Of course we don't have to follow mathematical convention with python. However allowing any
unicode symbol as an identifier doesn't prohibit from using √ as an operator. We do have
"in" and "is" as operators now, even if they would otherwise be acceptable identifiers.
So I wonder, would you consider to introduce log as an operator. 2 log x seems an interesting
operation for a programmer.

If it's going to become an operator, then it has to be a keyword.
Changing a token that is currently allowed to be an identifier into a
keyword is generally avoided as much as possible, because it breaks
backward compatibility. "in" and "is" have both been keywords for a
very long time, perhaps since the initial release of Python.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,226
Members
46,815
Latest member
treekmostly22

Latest Threads

Top