R
Raymond Hettinger
Comments are invited on the following proposed PEP.
Raymond Hettinger
-------------------------------------------------------
PEP: 329
Title: Treating Builtins as Constants in the Standard Library
Author: Raymond Hettinger <[email protected]>
Status: Pre-PEP
Abstract
========
This proposal is to add a private function for treating builtin
references as constants and to apply that function throughout
the standard library.
Motivation
==========
The library contains code such as _len=len which is intended to create
fast local references instead of a slower global lookups. Though
necessary for performance, these constructions clutter the code and
are usually incomplete (missing many opportunities).
If the proposal is adopted, those constructions could be eliminated
from the code base and at the same time improve upon their results
in terms of performance.
There are currently over a hundred instances of ``while 1`` in the
library. They were not replaced with the more readable ``while True``
because of performance reasons (the compiler cannot eliminate the
test because True is not known to always be a constant). Conversion
of True to a constant will clarify the code while retaining performance.
Many other basic Python operations run much slower because of global
lookups. In a try/except statement, the matching exception must be
re-looked up on every call (whether or not the exception is raised).
Similarly, simple identity tests such as ``while input is not None``
require the None variable to be re-looked up on every pass.
Builtin lookups are especially egregious because the enclosing
global scope must be checked first. These lookup chains devour cache
space that is best used elsewhere.
In short, if the proposal is adopted, the code will become cleaner
and performance will improve across the board.
Proposal
========
Add an internal module called _bind.py which contains two functions,
bind_builtins() and bind_builtins_and_globals().
For each module in the standard library, add a pair of lines near
the near of the code::
import _bind, sys
_bind.bind_builtins(sys.modules[__name__])
In a few select cases (such as sre_compile) where the module variables
are just a collection of constants, the globals will also be converted
to constants using bind_builtins_and_globals(). This will not be
applied anywhere there would be cause for a user to externally
change the value of one of the module variables.
Question and Answers
====================
Q. Will this make everyone divert their attention to optimization issues?
A. No, it is internal only (not user visible). Also, it reduces the need
to think about optimization because it is done automatically.
Q. What if the module defines a variable shadowing a builtin?
A. This does happen. For instance, True can be redefined at the module
level as `True = (1==1)`. The sample implementation below detects the
shadowing and leaves the global lookup unchanged.
Q. How do you know this works?
A. I implemented it, applied it to every module in library, and ran the
test suite without exception.
Q. Are you the first person to recognize that most global lookups are for
values that never change?
A. No, this has long been known. Skip provides an especially eloquent
explanation in [1]_.
Q. Why not make this a public module so that users can optimize builtins
or builtins and globals in their own code?
A. The ASPN recipe [2]_ is provided for this purpose. It takes a modicum
of skill to use correctly and not inadvertently tranform a non-constant.
Q. What if I have an overpowering urge to import a library module,
write a new shadowing builtin, and insert it into the module's namespace?
A. Your problems are less acute than with a module written in C
because you can always comment out the automatic binding.
Q. More realistically, what if I want to replace the builtins module and
supply my own implementations?
A. Either do this before importing a module, or just reload the module,
or disable _bind.py.
Q. How susceptible is this module to changes in Python's byte coding?
A. It imports opcode.py to protect against renumbering. Also, it uses
LOAD_CONST and LOAD_GLOBAL which are fundamental and have been around
forever. That notwithstanding, the coding scheme could change
dramatically and this implementation would have to change along with
modules like `dis` which also rely on the current coding scheme.
Sample Implementation
=====================
Here is a sample implementation for _bind.py::
from types import ClassType, FunctionType
from opcode import opmap, HAVE_ARGUMENT, EXTENDED_ARG
LOAD_GLOBAL, LOAD_CONST = opmap['LOAD_GLOBAL'], opmap['LOAD_CONST']
__all__ = ['bind_builtins', 'bind_builtins_and_globals']
def bind(f, builtin_only=True):
""" Return a new function with global references converted to constants.
If builtin_only is True (the default), will convert all builtin
references unless they have been overridden in the function's globals.
If builtin_only is False, will also convert other module variable
references into constants.
"""
import __builtin__
env = vars(__builtin__).copy()
if builtin_only:
stoplist = f.func_globals # do not replace overridden builtins
else:
stoplist = {}
env.update(f.func_globals)
co = f.func_code
newcode = map(ord, co.co_code)
newconsts = list(co.co_consts)
codelen = len(newcode)
i = 0
while i < codelen:
opcode = newcode
if opcode == EXTENDED_ARG:
return f # handle only the simple cases
if opcode == LOAD_GLOBAL:
oparg = newcode[i+1] + (newcode[i+2] << 8)
name = co.co_names[oparg]
if name in env and name not in stoplist:
value = env[name]
try:
pos = newconsts.index(value)
except ValueError:
pos = len(newconsts)
newconsts.append(value)
newcode = LOAD_CONST
newcode[i+1] = pos & 0xFF
newcode[i+2] = pos >> 8
i += 1
if opcode >= HAVE_ARGUMENT:
i += 2
codestr = ''.join(map(chr, newcode))
if codestr == co.co_code:
return f # no changes were found
codeobj = type(co)(co.co_argcount, co.co_nlocals, co.co_stacksize,
co.co_flags, codestr, tuple(newconsts), co.co_names,
co.co_varnames, co.co_filename, co.co_name,
co.co_firstlineno, co.co_lnotab, co.co_freevars,
co.co_cellvars)
return type(f)(codeobj, f.func_globals, f.func_name, f.func_defaults,
f.func_closure)
def bind_all(module_or_class, builtin_only=True):
"""Recursively apply bind() to functions in a module or class.
Use as the last line of the module (after everything is defined, but
before test code or lines defining shorthand references to functions.
If builtin_only is True (the default), will convert all builtin
references unless they have been overridden in the function's globals.
If builtin_only is False, will also convert other module variable
references into constants. This should only be used in modules
where the values of the module variables are truly constant.
"""
for k, v in vars(module_or_class).items():
if type(v) is FunctionType:
newv = bind(v)
setattr(module_or_class, k, newv)
elif type(v) in (type, ClassType):
bind_all(v, builtin_only)
import sys
bind_all(sys.modules[__name__], False) # Binder, bind thyself!
# Self-documenting API (eliminates the builtin_only switch)
bind_builtins = bind_all
bind_builtins_and_globals = lambda m: bind_all(m, False)
The final code should have some minor embellishments:
* Automatic detection of a non-CPython environment that does not have
bytecodes. In that situation, the bind functions would simply
return the original function unchanged. This assures that the two
line additions to library modules do not impact other implementations.
* Add a few lines to handle reading and writing of EXTENDED_ARG. While
this almost never arises in practice, it is best to have the code as
adaptable as possible.
References
==========
... [1] Optimizing Global Variable/Attribute Access
http://www.python.org/peps/pep-0266.html
... [2] ASPN Recipe for a non-private implementation
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/277940
Copyright
=========
This document has been placed in the public domain.
...
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
End:
Raymond Hettinger
-------------------------------------------------------
PEP: 329
Title: Treating Builtins as Constants in the Standard Library
Author: Raymond Hettinger <[email protected]>
Status: Pre-PEP
Abstract
========
This proposal is to add a private function for treating builtin
references as constants and to apply that function throughout
the standard library.
Motivation
==========
The library contains code such as _len=len which is intended to create
fast local references instead of a slower global lookups. Though
necessary for performance, these constructions clutter the code and
are usually incomplete (missing many opportunities).
If the proposal is adopted, those constructions could be eliminated
from the code base and at the same time improve upon their results
in terms of performance.
There are currently over a hundred instances of ``while 1`` in the
library. They were not replaced with the more readable ``while True``
because of performance reasons (the compiler cannot eliminate the
test because True is not known to always be a constant). Conversion
of True to a constant will clarify the code while retaining performance.
Many other basic Python operations run much slower because of global
lookups. In a try/except statement, the matching exception must be
re-looked up on every call (whether or not the exception is raised).
Similarly, simple identity tests such as ``while input is not None``
require the None variable to be re-looked up on every pass.
Builtin lookups are especially egregious because the enclosing
global scope must be checked first. These lookup chains devour cache
space that is best used elsewhere.
In short, if the proposal is adopted, the code will become cleaner
and performance will improve across the board.
Proposal
========
Add an internal module called _bind.py which contains two functions,
bind_builtins() and bind_builtins_and_globals().
For each module in the standard library, add a pair of lines near
the near of the code::
import _bind, sys
_bind.bind_builtins(sys.modules[__name__])
In a few select cases (such as sre_compile) where the module variables
are just a collection of constants, the globals will also be converted
to constants using bind_builtins_and_globals(). This will not be
applied anywhere there would be cause for a user to externally
change the value of one of the module variables.
Question and Answers
====================
Q. Will this make everyone divert their attention to optimization issues?
A. No, it is internal only (not user visible). Also, it reduces the need
to think about optimization because it is done automatically.
Q. What if the module defines a variable shadowing a builtin?
A. This does happen. For instance, True can be redefined at the module
level as `True = (1==1)`. The sample implementation below detects the
shadowing and leaves the global lookup unchanged.
Q. How do you know this works?
A. I implemented it, applied it to every module in library, and ran the
test suite without exception.
Q. Are you the first person to recognize that most global lookups are for
values that never change?
A. No, this has long been known. Skip provides an especially eloquent
explanation in [1]_.
Q. Why not make this a public module so that users can optimize builtins
or builtins and globals in their own code?
A. The ASPN recipe [2]_ is provided for this purpose. It takes a modicum
of skill to use correctly and not inadvertently tranform a non-constant.
Q. What if I have an overpowering urge to import a library module,
write a new shadowing builtin, and insert it into the module's namespace?
A. Your problems are less acute than with a module written in C
because you can always comment out the automatic binding.
Q. More realistically, what if I want to replace the builtins module and
supply my own implementations?
A. Either do this before importing a module, or just reload the module,
or disable _bind.py.
Q. How susceptible is this module to changes in Python's byte coding?
A. It imports opcode.py to protect against renumbering. Also, it uses
LOAD_CONST and LOAD_GLOBAL which are fundamental and have been around
forever. That notwithstanding, the coding scheme could change
dramatically and this implementation would have to change along with
modules like `dis` which also rely on the current coding scheme.
Sample Implementation
=====================
Here is a sample implementation for _bind.py::
from types import ClassType, FunctionType
from opcode import opmap, HAVE_ARGUMENT, EXTENDED_ARG
LOAD_GLOBAL, LOAD_CONST = opmap['LOAD_GLOBAL'], opmap['LOAD_CONST']
__all__ = ['bind_builtins', 'bind_builtins_and_globals']
def bind(f, builtin_only=True):
""" Return a new function with global references converted to constants.
If builtin_only is True (the default), will convert all builtin
references unless they have been overridden in the function's globals.
If builtin_only is False, will also convert other module variable
references into constants.
"""
import __builtin__
env = vars(__builtin__).copy()
if builtin_only:
stoplist = f.func_globals # do not replace overridden builtins
else:
stoplist = {}
env.update(f.func_globals)
co = f.func_code
newcode = map(ord, co.co_code)
newconsts = list(co.co_consts)
codelen = len(newcode)
i = 0
while i < codelen:
opcode = newcode
if opcode == EXTENDED_ARG:
return f # handle only the simple cases
if opcode == LOAD_GLOBAL:
oparg = newcode[i+1] + (newcode[i+2] << 8)
name = co.co_names[oparg]
if name in env and name not in stoplist:
value = env[name]
try:
pos = newconsts.index(value)
except ValueError:
pos = len(newconsts)
newconsts.append(value)
newcode = LOAD_CONST
newcode[i+1] = pos & 0xFF
newcode[i+2] = pos >> 8
i += 1
if opcode >= HAVE_ARGUMENT:
i += 2
codestr = ''.join(map(chr, newcode))
if codestr == co.co_code:
return f # no changes were found
codeobj = type(co)(co.co_argcount, co.co_nlocals, co.co_stacksize,
co.co_flags, codestr, tuple(newconsts), co.co_names,
co.co_varnames, co.co_filename, co.co_name,
co.co_firstlineno, co.co_lnotab, co.co_freevars,
co.co_cellvars)
return type(f)(codeobj, f.func_globals, f.func_name, f.func_defaults,
f.func_closure)
def bind_all(module_or_class, builtin_only=True):
"""Recursively apply bind() to functions in a module or class.
Use as the last line of the module (after everything is defined, but
before test code or lines defining shorthand references to functions.
If builtin_only is True (the default), will convert all builtin
references unless they have been overridden in the function's globals.
If builtin_only is False, will also convert other module variable
references into constants. This should only be used in modules
where the values of the module variables are truly constant.
"""
for k, v in vars(module_or_class).items():
if type(v) is FunctionType:
newv = bind(v)
setattr(module_or_class, k, newv)
elif type(v) in (type, ClassType):
bind_all(v, builtin_only)
import sys
bind_all(sys.modules[__name__], False) # Binder, bind thyself!
# Self-documenting API (eliminates the builtin_only switch)
bind_builtins = bind_all
bind_builtins_and_globals = lambda m: bind_all(m, False)
The final code should have some minor embellishments:
* Automatic detection of a non-CPython environment that does not have
bytecodes. In that situation, the bind functions would simply
return the original function unchanged. This assures that the two
line additions to library modules do not impact other implementations.
* Add a few lines to handle reading and writing of EXTENDED_ARG. While
this almost never arises in practice, it is best to have the code as
adaptable as possible.
References
==========
... [1] Optimizing Global Variable/Attribute Access
http://www.python.org/peps/pep-0266.html
... [2] ASPN Recipe for a non-private implementation
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/277940
Copyright
=========
This document has been placed in the public domain.
...
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
End: