Doing set operation on non-hashable objects

5

5lvqbwl02

Hi,

I'm writing an application which is structured roughly as follows:

"db" is a dict, where the values are also dicts.
A function searches through db and returns a list of values, each of
which is a dict as described above.
I need to perform set operations on these lists (intersection and
union)
However the objects themselves are not hashable, and therefore can't
be in a set, because they are dicts.
I'm not sure how large each set will be, but the point is I'm trying
to avoid anything looking like an O(n^2) algorithm, so I can't just
use naive double-looping to check for intersection/union on a pair of
lists.

The only way I can think of to do this right is to hash the dicts by
freezing them, turning them all into tuples, which can then be
hashed. But this is a problem because more dicts might be nested
inside. At any rate, I'd need to thaw them out when I'm done and turn
them back into dicts, which seems complicated and annoying, and all
this leads me to believe there is a better way.

What I really need is a set of pointers, so at the end of the
operation I have the actual objects pointed to. Can I somehow use the
object IDs as set elements, then recreate a list with the actual
objects they point to?

How do you get the object back from an ID in python?

thanks
Michael
 
G

Gabriel Genellina

I'm writing an application which is structured roughly as follows:

"db" is a dict, where the values are also dicts.
A function searches through db and returns a list of values, each of
which is a dict as described above.
I need to perform set operations on these lists (intersection and
union)
However the objects themselves are not hashable, and therefore can't
be in a set, because they are dicts.

See this thread from last week:
http://groups.google.com/group/comp.lang.python/t/d6818ff713bd4421
What I really need is a set of pointers, so at the end of the
operation I have the actual objects pointed to. Can I somehow use the
object IDs as set elements, then recreate a list with the actual
objects they point to?

If you *only* care about object identity, you might use a dictionary that
only compares by identity to anyone else:

class nc_dict(dict):
"A hashable dictionary that isn't equal to anyone else"

def __eq__(self, other):
return cmp(id(self),id(other))==0

def __hash__(self):
return hash(id(self))

d1 = nc_dict(a=1, b=2, c=3)
d2 = nc_dict(b=2, c=0, d=4)
d3 = nc_dict(a=1, c=3, e=5)
dd1 = nc_dict(x=d1, y=d2)
dd2 = nc_dict(x=d1, y=d3)
dd3 = nc_dict(y=d2, z=d3, w=d1)
l1 = [dd1, dd2]
l2 = [dd2, dd3]
s1 = set(l1)
s2 = set(l2)
print s1-s2
print s2-s1
print s1&s2

# instances of nc_dict with the same contents are NOT equal
d4 = nc_dict(a=1, b=2, c=3)
print d1, d4, d1==d4 # output: False

# but we can use this function to compare contents
def cmp_contents(d1, d2):
try: return cmp(sorted(d1.items()), sorted(d2.items()))
except Exception: return 1 # assume they're unequal

print cmp_contents(d1,d4)==0 # output: True
How do you get the object back from an ID in python?

You don't :)
 
C

Carl Banks

Hi,

I'm writing an application which is structured roughly as follows:

"db" is a dict, where the values are also dicts.
A function searches through db and returns a list of values, each of
which is a dict as described above.
I need to perform set operations on these lists (intersection and
union)
However the objects themselves are not hashable, and therefore can't
be in a set, because they are dicts.
I'm not sure how large each set will be, but the point is I'm trying
to avoid anything looking like an O(n^2) algorithm, so I can't just
use naive double-looping to check for intersection/union on a pair of
lists.

The only way I can think of to do this right is to hash the dicts by
freezing them, turning them all into tuples, which can then be
hashed.  But this is a problem because more dicts might be nested
inside.  At any rate, I'd need to thaw them out when I'm done and turn
them back into dicts, which seems complicated and annoying, and all
this leads me to believe there is a better way.

What I really need is a set of pointers, so at the end of the
operation I have the actual objects pointed to.  Can I somehow use the
object IDs as set elements, then recreate a list with the actual
objects they point to?

How do you get the object back from an ID in python?

Quick and dirty way is to reference the dicts with a lightweight
hashable object that uses the dict's identity. For instance:


class Identity(object):
def __init__(self,obj):
self.obj = obj
def __hash__(self):
return hash(id(self.obj))
def __eq__(self,other):
return self.obj is other.obj


Then, instead of "s.add(d)" use "s.add(Identity(d))".

Instead of "d = s.pop()" use "d = s.pop().obj".

Instead of "d in s" use "Identity(d) in s".

And so on.


To do it a bit better, you could create an IdentitySet class that
subclasses set and wraps the methods so that they automatically apply
wrap and unwrap the arguments on their way in and out (I'd bet there's
already a cookbook recipe to do that).


Carl Banks
 
5

5lvqbwl02

Quick and dirty way is to reference the dicts with a lightweight
hashable object that uses the dict's identity.  For instance:

class Identity(object):
    def __init__(self,obj):
        self.obj = obj
    def __hash__(self):
        return hash(id(self.obj))
    def __eq__(self,other):
        return self.obj is other.obj

Then, instead of "s.add(d)" use "s.add(Identity(d))".

Instead of "d = s.pop()" use "d = s.pop().obj".

Instead of "d in s" use "Identity(d) in s".

And so on.

To do it a bit better, you could create an IdentitySet class that
subclasses set and wraps the methods so that they automatically apply
wrap and unwrap the arguments on their way in and out (I'd bet there's
already a cookbook recipe to do that).

Carl Banks

Thank you, I like this idea a lot. It's close to what I've been
thinking but a bit cleaner.

Michael
 
5

5lvqbwl02

... "db" is a dict, where the values are also dicts.
Well, if the lists are ordered, you can do intersection and union
in O(n) time.  If they are orderable, you can sort each first, giving
O(n log n).  Note you can do lst.sort(key=id) which will sort on ids.

The lists are not ordered, since the elements of the list could be
arbitrarily complex things consisting of resistors, capacitors, sub-
circuits, etc. I'm trying do a little EDA program, taking the best
parts from the different EDA/cad tools I've used. I finally came up
with an idea using dictionaries in a certain way, so I'd like to be
able to do set operators on arbitrary things that may themselves not
be hashable.
You don't; you remember the mapping in a dictionary and look it up.

Exactly. A mapping between the ID and the thing itself.
If there were a way to go from id to object, the whole idea of garbage
collection and reference counts would fly out the window, leading to
nasty crashes (or you might get to an object that is the re-used id of
an older object).

Yup, this makes perfect sense. It was rattling around in my head for
a bit afer I wrote the original post, then this makes sense.
 
5

5lvqbwl02

I'm writing an application which is structured roughly as follows:
"db" is a dict, where the values are also dicts.
A function searches through db and returns a list of values, each of
which is a dict as described above.
I need to perform set operations on these lists (intersection and
union)
However the objects themselves are not hashable, and therefore can't
be in a set, because they are dicts.


If you *only* care about object identity, you might use a dictionary that  
only compares by identity to anyone else:

class nc_dict(dict):
   "A hashable dictionary that isn't equal to anyone else"

   def __eq__(self, other):
     return cmp(id(self),id(other))==0

   def __hash__(self):
     return hash(id(self))

d1 = nc_dict(a=1, b=2, c=3)
d2 = nc_dict(b=2, c=0, d=4)
d3 = nc_dict(a=1, c=3, e=5)
dd1 = nc_dict(x=d1, y=d2)
dd2 = nc_dict(x=d1, y=d3)
dd3 = nc_dict(y=d2, z=d3, w=d1)
l1 = [dd1, dd2]
l2 = [dd2, dd3]
s1 = set(l1)
s2 = set(l2)
print s1-s2
print s2-s1
print s1&s2

# instances of nc_dict with the same contents are NOT equal
d4 = nc_dict(a=1, b=2, c=3)
print d1, d4, d1==d4  # output: False

# but we can use this function to compare contents
def cmp_contents(d1, d2):
     try: return cmp(sorted(d1.items()), sorted(d2.items()))
     except Exception: return 1 # assume they're unequal

print cmp_contents(d1,d4)==0 # output: True

Good idea. I'll try this out. I don't think it's likely that I'll
have a case where the dicts have identical values, but different
identities. And if they do I probably don't care too much.

Thanks
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,226
Members
46,815
Latest member
treekmostly22

Latest Threads

Top