Can someone explain this weakref behavior?

M

Michael Kent

The Python 2.3.4 docs about weakref say:
Not all objects can be weakly referenced; those objects which can
include class instances, functions written in Python (but not in C),
and methods (both bound and unbound).

I've been unable to get using a bound method as the key in a
WeakKeyDictionary to work. Using a class instance object works fine
as a key, using a method of that same instance object does not.
Here's some code, in a file named test_weakref.py:

#! /usr/bin/env python

import unittest
import weakref

class someClass(object):
def aMethod(self):
print "Hi!"

class TestCase_01_weakref(unittest.TestCase):

def test_01_simple(self):

obj1 = someClass()
obj2 = someClass()
wkd = weakref.WeakKeyDictionary()

wkd[obj1] = 1
self.assertEqual(len(wkd), 1)

wkd[obj1.aMethod] = 1
self.assertEqual(len(wkd), 2)

wkd[obj2.aMethod] = 1
self.assertEqual(len(wkd), 3)


if __name__ == "__main__":
unittest.main()

And here's the output:

../test_weakref.py
F
======================================================================
FAIL: test_01_simple (__main__.TestCase_01_weakref)
----------------------------------------------------------------------
Traceback (most recent call last):
File "./test_weakref.py", line 22, in test_01_simple
self.assertEqual(len(wkd), 2)
File "/usr/local/lib/python2.3/unittest.py", line 302, in
failUnlessEqual
raise self.failureException, \
AssertionError: 1 != 2

----------------------------------------------------------------------
Ran 1 test in 0.001s

FAILED (failures=1)

It is acting as though a bound method is silently not allowed as the
key in a WeakKeyDictionary. Can someone set me straight?
 
P

Peter Otten

Michael said:
The Python 2.3.4 docs about weakref say:
Not all objects can be weakly referenced; those objects which can
include class instances, functions written in Python (but not in C),
and methods (both bound and unbound).

I've been unable to get using a bound method as the key in a
WeakKeyDictionary to work. Using a class instance object works fine
as a key, using a method of that same instance object does not.
Here's some code, in a file named test_weakref.py:

#! /usr/bin/env python

import unittest
import weakref

class someClass(object):
def aMethod(self):
print "Hi!"

class TestCase_01_weakref(unittest.TestCase):

def test_01_simple(self):

obj1 = someClass()
obj2 = someClass()
wkd = weakref.WeakKeyDictionary()

wkd[obj1] = 1
self.assertEqual(len(wkd), 1)

wkd[obj1.aMethod] = 1
self.assertEqual(len(wkd), 2)

wkd[obj2.aMethod] = 1
self.assertEqual(len(wkd), 3)


if __name__ == "__main__":
unittest.main()

And here's the output:

./test_weakref.py
F
======================================================================
FAIL: test_01_simple (__main__.TestCase_01_weakref)
----------------------------------------------------------------------
Traceback (most recent call last):
File "./test_weakref.py", line 22, in test_01_simple
self.assertEqual(len(wkd), 2)
File "/usr/local/lib/python2.3/unittest.py", line 302, in
failUnlessEqual
raise self.failureException, \
AssertionError: 1 != 2

----------------------------------------------------------------------
Ran 1 test in 0.001s

FAILED (failures=1)

It is acting as though a bound method is silently not allowed as the
key in a WeakKeyDictionary. Can someone set me straight?

You need a (strong) reference to the bound methods in order to keep them
from being garbage-collected (and therefore removed from the
WeakKeyDictionary) - not keeping alive the referenced keys is pretty much
the WeakKeyDictionary's raison d'être. A modified

def test_01_simple(self):

obj1 = someClass()
obj2 = someClass()
wkd = weakref.WeakKeyDictionary()
m1 = obj1.aMethod
m2 = obj2.aMethod

wkd[obj1] = 1
self.assertEqual(len(wkd), 1)
wkd[m1] = 1
self.assertEqual(len(wkd), 2)
wkd[m2] = 1
self.assertEqual(len(wkd), 3)

should complete without failure.

Peter
 
D

David MacQuigg

That will pass under CPython today, but there's no general guarantee about
exactly when a weak dict will notice that keys (or values) have become
unreachable by strong references.

OUCH!! We just built a module that uses weak references to keep a
"robust count" of instances in various classes. The claim is that
this is more robust than simply incrementing and decrementing class
variables using __init__ and __del__. The module seems to be working
OK, immediately deleting the weak reference as soon as all references
to the corresponding instance are deleted.

If I understand you correctly, there is some chance that a future
implementation of Python may have the weak references "out-of-sync"
with the actual count of live instances. Is that a remote
possibility, or something quite likely to occur? I have to decide now
whether to rip out some risky code.

Is there a good way to track the count of instances? If not, would it
make sense to request a guarantee on the current behavior of weak
references? Maybe it could be an option, assuming there is some
performance penalty, an option to be used when accuracy is more
important than speed.

-- Dave
 
D

David MacQuigg

[Tim Peters]
[David MacQuigg]
OUCH!! We just built a module that uses weak references to keep a
"robust count" of instances in various classes. The claim is that this
is more robust than simply incrementing and decrementing class variables
using __init__ and __del__. The module seems to be working OK,
immediately deleting the weak reference as soon as all references to the
corresponding instance are deleted.

Then it must be the case that you're running CPython, and that these
instances aren't involved in cycles. Because CPython primarily uses
reference-counting to recycle garbage, its behavior is predictable in the
absence of cycles.

I'm not worried about cyclic references, but it is something to keep
in mind.
CPython's use of reference counting is an implementation detail, and that
internal weakref lists are traversed "immediately" upon an object's refcount
reaching 0 is also an implementation detail. Nothing in the language
definition guarantees these behaviors.


Well, it's been the case for a long time in JPython. I judge the odds of it
changing in CPython as slim. I personally wouldn't worry about it ever
changing in CPython. If a PyPy- or Parrot-based implementation of Python
takes off, behavior will depend on its implementation details.


If you want it enough, you can build Python in a mode that tracks this
automatically (see the discussion of COUNT_ALLOCS in Misc/SpecialBuilds.txt
-- for each distinct type object, the total # of allocations, # of
deallocations, and highwater mark (max(#alloc - #dealloc) over time) are
maintained in a COUNT_ALLOCS build).


You can request anything you can dream up <wink>. If it's something your
business needs, the only way to guarantee it is to get involved in Python
development deeply enough so that, if worse comes to worse, you can maintain
your own Python implementation. That's unreasonably paranoid in my
estimation, but it's a judgment call.

Seems like we could do this more easily with a function that lists
instances, like __subclasses__() does with subclasses. This doesn't
have to be efficient, just reliable. So when I call
cls.__instances__(), I get a current list of all instances in the
class.

Maybe we could implement this function using weak references. If I
understand the problem with weak references, we could have a
WeakValueDictionary with references to objects that actually have a
refcount of zero. There may be too many entries in the dictionary,
but never too few. In that case, maybe I could just loop over every
item in my WeakValueDictionary, and ignore any with a refcount of
zero.

def _getInstances(cls):
d1 = cls.__dict__.get('_instances' , {})
d2 = {}
for key in d1:
if sys.getrefcount(d1[key]) > 0:
d2[key] = d1[key]
return d2
_getInstances = staticmethod(_getInstances)

I'm making some assumptions here that may not be valid, like
sys.getrefcount() for a particular object really will be zero
immediately after all normal references to it are gone. i.e. we don't
have any temporary "out-of-sync" problems like with the weak
references themselves.

Does this seem like a safe strategy?

-- Dave
 
D

David MacQuigg

[David MacQuigg, trying to keep track of how many instances of a class
currently exist]

...
Seems like we could do this more easily with a function that lists
instances, like __subclasses__() does with subclasses. This doesn't have
to be efficient, just reliable. So when I call cls.__instances__(), I
get a current list of all instances in the class.

Maybe we could implement this function using weak references. If I
understand the problem with weak references, we could have a
WeakValueDictionary with references to objects that actually have a
refcount of zero.

Not in CPython today (and in the presence of cycles, the refcount on an
object isn't related to whether it's garbage).
There may be too many entries in the dictionary, but never too few.
Right!

In that case, maybe I could just loop over every item in
my WeakValueDictionary, and ignore any with a refcount of zero.

def _getInstances(cls):
d1 = cls.__dict__.get('_instances' , {})
d2 = {}
for key in d1:
if sys.getrefcount(d1[key]) > 0:
d2[key] = d1[key]
return d2
_getInstances = staticmethod(_getInstances)

I'm making some assumptions here that may not be valid, like
sys.getrefcount() for a particular object really will be zero immediately
after all normal references to it are gone. i.e. we don't have any
temporary "out-of-sync" problems like with the weak references
themselves.

Does this seem like a safe strategy?

An implementation of Python that doesn't base its garbage collection
strategy on reference counting won't *have* a getrefcount() function, so if
you're trying to guard against Python switching gc strategies, this is a
non-starter (it solves the problem for, and only for, implementations of
Python that don't have the problem to begin with <wink>).

Note that CPython's getrefcount() can't return 0 (see the docs). Maybe
comparing against 1 would capture your intent.

Note this part of the weakref docs:

NOTE: Caution: Because a WeakValueDictionary is built on top of a Python
dictionary, it must not change size when iterating over it. This can be
difficult to ensure for a WeakValueDictionary because actions performed by
the program during iteration may cause items in the dictionary to vanish
"by magic" (as a side effect of garbage collection).

If you have threads too, it can be worse than just that.

Bottom line: if you want semantics that depend on the implementation using
refcounts, you can't worm around that. Refcounts are the only way to know
"right away" when an object has become trash, and even that doesn't work in
the presence of cycles. Short of that, you can settle for an upper bound on
the # of objects "really still alive" across implementations by using weak
dicts, and you can increase the likely precision of that upper bound by
forcing a run of garbage collection immediately before asking for the
number. In the absence of cycles, none of that is necessary in CPython
today (or likely ever).

Using a "decrement count in a __del__" approach isn't better: only a
reference-counting based implementation can guarantee to trigger __del__
methods as soon as an object (not involved in a cycle) becomes unreachable.
Under any other implementation, you'll still just get an upper bound.

Note that all garbage collection methods are approximations to true
lifetimes anyway. Even refcounting in the absence of cycles: just because
the refcount on an object is 10 doesn't mean that any of the 10 ways to
reach the object *will* get used again. An object may in reality be dead as
a doorknob no matter how high its refcount. Refcounting is a conservative
approximation too (it can call things "live" that will in fact never be used
again, but won't call things "dead" that will in fact be used again).

Thank you for this very thorough answer to my questions. I have a
much better understanding of the limitations of weakrefs now. I also
see my suggestion of using sys.getrefcount() suffers from the same
limitations. I've decided to leave my code as is, but put some
prominent warnings in the documentation:
'''
[Note 1] As always, there are limitations. Nothing is ever absolute
when it comes to reliability. In this case we are depending on the
Python interpreter to immediately delete a weak reference when the
normal reference count goes to zero. This depends on the
implementation details of the interpreter, and is *not* guaranteed by
the language. Currently, it works in CPython, but not in JPython.
For further discussion, see the post under {"... weakref behavior" by
Tim Peters in comp.lang.python, 6/11/04}.
-- One other limitation - if there is any possibility of an instance
you are tracking with a weakref being included in a cycle (a group of
objects that reference each other, but have no references from
anything outside the group), then this scheme won't work. Cyclic
garbage remains in memory until a special garbage collector gets
around to sniffing it out.
'''

You might want to put something similar in the Library Reference.

-- Dave
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,962
Messages
2,570,134
Members
46,692
Latest member
JenniferTi

Latest Threads

Top