Commonly-used names in the Python standard library

C

Chris Angelico

In working on a proposal that might result in the creation of a new
keyword, I needed to ascertain what names were used extensively in
existing Python code. Out of random curiosity, I asked my script what
names were the most commonly used. The script responded with 21854
names and a total of 297164 references, averaging 13-14 refs per name.

A good number of names are referenced just once - set and never
referenced. They're there for applications to use. That takes out 6217
names. But I'm more interested in the ones that see a lot of use.

By far the most common name is 'self', for obvious reasons; after
that, it's a fairly steady drop-off. The most popular names in the
standard library are... *drumroll*

45298 - self
3750 - os
3744 - name
3166 - i
3140 - s
2685 - value
2648 - a
2451 - len
2348 - c
2331 - sys
2255 - b
2238 - line
2132 - print
2131 - x

Few surprises there. The os and sys modules are used extensively, and
short variable names are reused frequently. To the print-detractors:
That's two thousand uses in the standard library, more than any other
single function bar 'len'! (And by the way, this is ignoring any file
with /test/ in the name.)

I find the pairing of 'name' and 'value' interesting. There are 40%
more names than values in Python, apparently. And on that bombshell,
as they say on Top Gear, it's time to end!

ChrisA
 
M

Marko Rauhamaa

Chris Angelico said:
In working on a proposal that might result in the creation of a new
keyword,

I'm looking forward to the day when every application can add its own
keywords as is customary in Lisp.
I needed to ascertain what names were used extensively in existing
Python code

One advantage of Perl is that names and keywords are in separate
namespaces so introducing new keywords is easy.

Should Python have something like:

from py35 import *

That way, you could choose between:

unless x > 7:
return

and:

py35.unless x > 7:
return

in case you have already made use of the name "unless" in your program.


Marko
 
C

Chris Angelico

from py35 import *

That way, you could choose between:

unless x > 7:
return

and:

py35.unless x > 7:
return

in case you have already made use of the name "unless" in your program.

What about return? Are you allowed to namespace that? And 'from' and
'import' and '*'?

In languages with keywords, they're there to signal things to the
parser. There are languages that have no keywords at all, like REXX,
and their grammars are usually restricted to non-alphabetic tokens
(for instance, REXX has & and | instead of "and" and "or"). Python
already has most of its important names in either builtins (which can
be shadowed) or actual modules, so it's only actual language keywords
that can't be reused; and there aren't all that many of those. But
more can be created, and it's worth being careful.

In this instance, various proposals included "then", "when", "use",
and "raises". My script reported the following:

1 instances of the name 'use'
12 instances of the name 'when'

and none of either of the others. Granted, the stdlib isn't
everything, and isn't even reliably representative, but that supported
my gut feeling that keywording 'when' would be likely to trip code up.

If you're curious about the full proposal, it's PEP 463, an expression
form of the 'except' statement. The latest draft PEP can be found
here:

https://raw2.github.com/Rosuav/ExceptExpr/master/pep-0463.txt

and the official repo (currently out of date, but later on will be the
official and maintained version) has it here:

http://www.python.org/dev/peps/pep-0463/

ChrisA
 
M

Marko Rauhamaa

Chris Angelico said:
What about return? Are you allowed to namespace that? And 'from' and
'import' and '*'?

Old keywords are guaranteed not to clash with programs. Introducing new
keywords runs that risk. Hence, C had to introduce the ugly _Bool
keyword.
If you're curious about the full proposal, it's PEP 463, an expression
form of the 'except' statement. The latest draft PEP can be found
here:

https://raw2.github.com/Rosuav/ExceptExpr/master/pep-0463.txt

A coworker pointed out that the gist of the PEP has already been
implemented by <URL: https://pypi.python.org/pypi/fuckit>.


Marko
 
C

Chris Angelico

Old keywords are guaranteed not to clash with programs. Introducing new
keywords runs that risk. Hence, C had to introduce the ugly _Bool
keyword.

Okay, so what you're saying is that there are three states:

Before Python X.Y, the unless keyword simply doesn't exist. (It can't
be coded in as a module, so it can't exist until someone implements
the code.)
From X.Y, it can be called up by importing it from "pyAB" and used in
its namespace.
From A.B onward, it always exists.

Python has a facility like this. It doesn't namespace the keywords,
but it does let you choose whether to have them or not. In Python 2.5,
you could type "from __future__ import with_statement" to turn 'with'
into a keyword. After Python 2.6, it's always a keyword.

ChrisA
 
M

Marko Rauhamaa

Chris Angelico said:
Python has a facility like this. It doesn't namespace the keywords,
but it does let you choose whether to have them or not. In Python 2.5,
you could type "from __future__ import with_statement" to turn 'with'
into a keyword. After Python 2.6, it's always a keyword.

That certainly softens the blow but might still cause unnecessary
suffering when maintaining/resurrecting legacy Python code.

How about blocking the introduction of new keywords for ever except if
you specify:

from __py35__ import syntax

Eventually, every Python module would likely begin with a statement like
that, and it would document the assumption more clearly than __future__.


Marko
 
C

Chris Angelico

How about blocking the introduction of new keywords for ever except if
you specify:

from __py35__ import syntax

Eventually, every Python module would likely begin with a statement like
that, and it would document the assumption more clearly than __future__.

It's more self-documenting with the __future__ directive, because it
says *what* syntax you're importing from the future. And at some
point, the new keywords must just become standard. There's no point
polluting every Python script forever with these directives, and no
point maintaining two branches of code in the interpreter.

ChrisA
 
M

Marko Rauhamaa

Chris Angelico said:
It's more self-documenting with the __future__ directive, because it
says *what* syntax you're importing from the future.

As a developer, I will probably want to state the Python dialect that
was used to write the module. Each dialect comes with hundreds of
features. I don't want to list them individually (even if I could).
And at some point, the new keywords must just become standard.

That's an explicit program of destroying backwards-compatibility: a war
on legacy code. That may be the Python way, but it's not a necessary
strategy.
There's no point polluting every Python script forever with these
directives, and no point maintaining two branches of code in the
interpreter.

Two branches? I would imagine there would be dozens of "branches" in the
interpreter if the latest interpreter were to support all past Python
dialects (as it should, IMO).


Marko
 
C

Chris Angelico

That's an explicit program of destroying backwards-compatibility: a war
on legacy code. That may be the Python way, but it's not a necessary
strategy.


Two branches? I would imagine there would be dozens of "branches" in the
interpreter if the latest interpreter were to support all past Python
dialects (as it should, IMO).

Indeed. If the interpreter were to include every dialect of "old
Python", then it would have a lot more than two branches. They would,
in fact, increase exponentially with every Python version.
Fortunately, there is an alternative. You can specify the version of
Python like this:

#!/usr/local/bin/python3.4

or any of several other ways. You then choose exactly which versions
of Python to have installed, and continue to use them for as long as
you wish. There's no reason for the 3.4 interpreter to be able to run
code "as if it were" the 3.1 interpreter, when you can just have the
3.1 interpreter itself right there.

ChrisA
 
M

Marko Rauhamaa

Chris Angelico said:
If the interpreter were to include every dialect of "old Python", then
it would have a lot more than two branches. They would, in fact,
increase exponentially with every Python version.

It shouldn't be *that bad*; the linux kernel is grappling with the glut
of system calls, but they are managing it reasonably well. I don't see
why Python, especially at this mature stage, couldn't adopt a similar
stance *going forward*.

In fact, not every syntax change requires special
backwards-compatibility treatment in the compiler. Constructs that used
to be illegal might become legal (say, try-except-finally). They don't
require any attention. Even new keywords have a very small impact on the
parser; it should be a simple matter of enabling dictionary entries.
Fortunately, there is an alternative. You can specify the version of
Python like this:

#!/usr/local/bin/python3.4

Well,

* you won't be finding old Python versions on newer operating system
distributions,

* even <URL: http://www.python.org/downloads/> isn't all that extensive
and

* the program may import modules that were written in different Python
dialects.


Marko
 
C

Chris Angelico

* you won't be finding old Python versions on newer operating system
distributions,

* even <URL: http://www.python.org/downloads/> isn't all that extensive
and

* the program may import modules that were written in different Python
dialects.

You can always build your own Python, if it really matters... but more
likely, if you care about old versions, you actually care about *one
specific old version* which your program uses. That's why Red Hat
still supports Python 2.4 and, I think, 2.3. You can't randomly pick
up 2.2 or 1.5, but if you want 2.4, you can keep on using that for as
long as this RHEL is supported.

As to importing modules written for other versions... that can be a
major problem. Often the new keywords come with new functionality.
Take string exceptions, for instance. Say you import a module that was
written for a version that still supported them - if it raises a
string, you can't catch it. There is a limit to how far the
compatibility can be taken. Also, what happens if two modules (one of
which might be your script) written for different versions both import
some third module? Should they get different versions, based on what
version tags they use themselves? Compatibility can't be changed that
easily. You either run on the new version, or run on the old. Not
both.

ChrisA
 
M

Marko Rauhamaa

Chris Angelico said:
Also, what happens if two modules (one of which might be your script)
written for different versions both import some third module? Should
they get different versions, based on what version tags they use
themselves? Compatibility can't be changed that easily. You either run
on the new version, or run on the old. Not both.

Shared C libraries face the exact same issue. Java seems pretty good on
this front as well. When there is a will, there is a way.


Marko
 
C

Chris Angelico

Shared C libraries face the exact same issue. Java seems pretty good on
this front as well. When there is a will, there is a way.

Shared C libraries usually do it by linking against a particular
version. That's why you often need to keep multiple versions around.
Once it's all binary code, there's no more compatibility question - it
all runs on the same CPU. With Python code, the module's written to
run on a particular interpreter, and that can't just switch around -
it's like the weird and wonderful life I enjoyed as 32-bit computing
started coming along, and I wanted to call on code that used the other
word length...

ChrisA
 
S

Steven D'Aprano

That certainly softens the blow but might still cause unnecessary
suffering when maintaining/resurrecting legacy Python code.

How about blocking the introduction of new keywords for ever except if
you specify:

from __py35__ import syntax

Eventually, every Python module would likely begin with a statement like
that, and it would document the assumption more clearly than __future__.

What *actual* problem is this supposed to solve? Do you often find that
Python has introduced new keywords, breaking your code?
 
S

Steven D'Aprano

I would imagine there would be dozens of "branches" in the interpreter
if the latest interpreter were to support all past Python dialects (as
it should, IMO).

Well thank goodness you're not in charge of Python's future development.
That way leads to madness: madness for the core developers (if you think
maintaining Python 2 and 3 branches is hard imagine maintaining *dozens*
of them, *forever*), madness of the programmers using the language, and
madness for anyone trying to learn the language. It's hard enough for
newbies to deal with *two* dialects, 2 and 3. And you want to introduce
dozens. Thanks, but no thanks.
 
S

Steven D'Aprano

In working on a proposal that might result in the creation of a new
keyword, I needed to ascertain what names were used extensively in
existing Python code.

I would love to steal^W see your script for doing this :)
 
S

Steven D'Aprano

I'm looking forward to the day when every application can add its own
keywords as is customary in Lisp.

And what a wonderful day that will be! Reading any piece of code you
didn't write yourself -- or wrote a long time ago -- will be an
adventure! Every script will have it's own exciting new set of keywords
doing who knows what, which makes every script nearly it's own language!
Oh joy, I cannot wait!

That's sarcasm, by the way.

One advantage of Perl is that names and keywords are in separate
namespaces so introducing new keywords is easy.

Then I can write code like:

for for in in:
while while:
if if:
raise raise

which will go a long way to ensuring that my code is an hostile and
unreadable as possible.


(Sometimes, less can be more. That's especially true of programming
languages.)
 
C

Chris Angelico

I would love to steal^W see your script for doing this :)

No probs! It's part of my ancillary stuff for the PEP 463 research:

https://github.com/Rosuav/ExceptExpr/blob/master/find_except_expr.py

It basically just runs over one file at a time, parses it into an AST,
and walks the tree. Pretty simple.

Actually, some of these sorts of things might make neat examples of
what can be done with the ast module. Until this week, I had no idea
how easy it was to analyze Python code this way.

ChrisA
 
C

Chris Angelico

Then I can write code like:

for for in in:
while while:
if if:
raise raise

which will go a long way to ensuring that my code is an hostile and
unreadable as possible.

REXX allows that. Most people wouldn't use classic keywords like 'if',
as that'll only cause confusion (although "if if then then; else else"
is legal), but some of the other keywords are useful in other
contexts. The main advantage is that, for instance, the PARSE command
can freely use keywords:

PARSE VAR x blah blah
PARSE VALUE linein(blah) WITH blah blah

All those words (parse, var, value, with) are keywords - in that
context. But I can happily use "var" and "value" elsewhere, and will
do so. Python, on the other hand, has to be more careful; so you see
things like "cls" instead of "class", or "import_" and so on, with the
trailing underscore.

Trade-offs.

ChrisA
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,982
Messages
2,570,189
Members
46,735
Latest member
HikmatRamazanov

Latest Threads

Top