T
Timo
I'm trying to make a thread safe object cache without locking.
The objects are cached by the id of the data dict given in __new__.
Objects are removed from the cache as soon as they are no longer
referenced. The type of the data must be a Python dict (comes from an
external source).
Here's what I've got so far:
import time
from weakref import ref
cache = {} # {id(data): ref_to_Foo(data), ...}
class Foo(object):
__slots__ = ('__weakref__', '_data')
def __new__(cls, data):
new_self = object.__new__(cls)
def remove(wr, _cache=cache, _data=data):
try:
del _cache[id(_data)]
except KeyError:
pass # should never happen
while 1:
self = cache.setdefault(id(data), ref(new_self, remove))()
try:
self._data = data
return self # this is new_self or a cached instance
except AttributeError:
# happens rarely: got a 'dead' cached instance
time.sleep(0) # small delay before retrying
# Usage example:
data = {'a':1, 'b':2}
a = Foo(data)
b = Foo(data)
assert a is b
I've got couple of assumptions based on I think the code is working
correctly:
a) Python's dict is thread safe, i.e. setdefault can't replace an
existing value until the key has been removed - no matter how many
threads are calling it simultaneously. Additionally, only one thread
succeeds in setting the new value.
b) Unreferenced ref(new_self, remove) objects are garbage collected
without the callbacks being called.
c) The remove() keeps a reference to the data dict. That should prevent
Python from reusing the object id too soon.
Are these assumptions correct? Do you have ideas for improvements
(especially regarding speed)? I'm restricted to Py2.3+ compatibility.
Thanks,
Timo
The objects are cached by the id of the data dict given in __new__.
Objects are removed from the cache as soon as they are no longer
referenced. The type of the data must be a Python dict (comes from an
external source).
Here's what I've got so far:
import time
from weakref import ref
cache = {} # {id(data): ref_to_Foo(data), ...}
class Foo(object):
__slots__ = ('__weakref__', '_data')
def __new__(cls, data):
new_self = object.__new__(cls)
def remove(wr, _cache=cache, _data=data):
try:
del _cache[id(_data)]
except KeyError:
pass # should never happen
while 1:
self = cache.setdefault(id(data), ref(new_self, remove))()
try:
self._data = data
return self # this is new_self or a cached instance
except AttributeError:
# happens rarely: got a 'dead' cached instance
time.sleep(0) # small delay before retrying
# Usage example:
data = {'a':1, 'b':2}
a = Foo(data)
b = Foo(data)
assert a is b
I've got couple of assumptions based on I think the code is working
correctly:
a) Python's dict is thread safe, i.e. setdefault can't replace an
existing value until the key has been removed - no matter how many
threads are calling it simultaneously. Additionally, only one thread
succeeds in setting the new value.
b) Unreferenced ref(new_self, remove) objects are garbage collected
without the callbacks being called.
c) The remove() keeps a reference to the data dict. That should prevent
Python from reusing the object id too soon.
Are these assumptions correct? Do you have ideas for improvements
(especially regarding speed)? I'm restricted to Py2.3+ compatibility.
Thanks,
Timo