I have this innocent and simple code:
from collections import deque
exhaust_iter = deque(maxlen=0).extend
At this point, exhaust_iter is another name for the bound instance method
"extend" of one specific deque instance.
Other implementations may do otherwise[1], but CPython optimizes built-in
methods and functions. E.g. they have no __dict__ so you can't add
attributes to them. When you look up exhaust_iter.__doc__, you are
actually looking up (type(exhaust_iter)).__doc__, which is a descriptor:
py> type(exhaust_iter).__doc__
<attribute '__doc__' of 'builtin_function_or_method' objects>
py> type(type(exhaust_iter).__doc__)
<class 'getset_descriptor'>
Confused yet? Don't worry, you will be...
So, calling exhaust_iter.__doc__:
1) looks up '__doc__' on the class "builtin_function_or_method", not the
instance;
2) which looks up '__doc__' on the class __dict__:
py> type(exhaust_iter).__dict__['__doc__']
<attribute '__doc__' of 'builtin_function_or_method' objects>
3) This is a descriptor with __get__ and __set__ methods. Because the
actual method is written in C, you can't access it's internals except via
the API: even the class __dict__ is not really a dict, it's a wrapper
around a dict:
py> type(type(exhaust_iter).__dict__)
<class 'mappingproxy'>
Anyway, we have a descriptor that returns the doc string:
py> descriptor = type(exhaust_iter).__doc__
py> descriptor.__get__(exhaust_iter)
'Extend the right side of the deque with elements from the iterable'
My guess is that it is fetching this from some private C member, which
you can't get to from Python except via the descriptor. And you can't set
it:
py> descriptor.__set__(exhaust_iter, '')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: attribute '__doc__' of 'builtin_function_or_method'
objects is not writable
which is probably because if you could write to it, it would change the
docstring for *every* deque. And that would be bad.
If this were a pure-Python method, you could probably bypass the
descriptor, but it's a C-level built-in. I think you're out of luck.
I think the right solution here is the trivial:
def exhaust(it):
"""Doc string here."""
deque(maxlen=0).extend(it)
which will be fast enough for all but the tightest inner loops. But if
you really care about optimizing this:
def factory():
eatit = deque(maxlen=0).extend
def exhaust_iter(it):
"""Doc string goes here"""
eatit(it)
return exhaust_iter
exhaust_it = factory()
del factory
which will be about as efficient as you can get while still having a
custom docstring.
But really, I'm having trouble understanding what sort of application
would have "run an iterator to exhaustion without doing anything with the
values" as the performance bottleneck
exhaust_iter.__doc__ = "Exhaust an iterator efficiently [...]"
Obviously it does not work.
Even if it did work, it would not do what you hope. Because __doc__ is a
dunder attribute (double leading and trailing underscores), help()
currently looks it up on the class, not the instance:
class Spam:
"Spam spam spam"
x = Spam()
help(x)
=> displays "Spam spam spam"
x.__doc__ = "Yummy spam"
help(x)
=> still displays "Spam spam spam"
Is there a way to get it to work simply and
without creating a new scope (which would be a rather inefficient a way
to set documentation, and would hamper introspection)?
How about dropping the "simply" requirement?
I don't believe so.
[1] IronPython and Jython both currently do the same thing as CPython, so
even if this is not explicitly language-defined behaviour, it looks like
it may be de facto standard behaviour.