D
Dave Angel
]Anyway, none of the calculations that has been given takes into account
the fact that names can be /less/ than one million characters long.
Not in *my* code they don't!!!
*wink*
The
actual number of non-empty strings of length at most 1000000 characters,
that consist only of ascii letters, digits or underscores, and that
don't start with a digit, is
sum(53*63**i for i in range(1000000)) == 53*(63**1000000 - 1)//62
I take my hat of to you sir, or possibly madam. That is truly an inspired
piece of pedantry.
It's perhaps worth mentioning that some non-ascii characters are allowed
in identifiers in Python 3, though I don't know which ones.
PEP 3131 describes the rules:
http://www.python.org/dev/peps/pep-3131/
For example:
py> import unicodedata as ud
py> for c in 'é極¿μЖᚃ‰⇄∞':
... print(c, ud.name(c), c.isidentifier(), ud.category(c))
...
é LATIN SMALL LETTER E WITH ACUTE True Ll
æ LATIN SMALL LETTER AE True Ll
Â¥ YEN SIGN False Sc
µ MICRO SIGN True Ll
¿ INVERTED QUESTION MARK False Po
μ GREEK SMALL LETTER MU True Ll
Ж CYRILLIC CAPITAL LETTER ZHE True Lu
ᚃ OGHAM LETTER FEARN True Lo
‰ PER MILLE SIGN False Po
⇄ RIGHTWARDS ARROW OVER LEFTWARDS ARROW False So
∞ INFINITY False Sm
The isidentifier() method will let you weed out the characters that
cannot start an identifier. But there are other groups of characters
that can appear after the starting "letter". So a more reasonable
sample might be something like:
py> import unicodedata as ud
py> for c in 'é極¿μЖᚃ‰⇄∞':
... xc = "X" + c
... print(c, ud.name(c), xc.isidentifier(), ud.category(c))
...
In particular,
http://docs.python.org/3.3/reference/lexical_analysis.html#identifiers
has a definition for id_continue that includes several interesting
categories. I expected the non-ASCII digits, but there's other stuff
there, like "nonspacing marks" that are surprising.
I'm pretty much speculating here, so please correct me if I'm way off.