custom classes in sets

V

vegetax

How can i make my custom class an element of a set?

class Cfile:
def __init__(s,path): s.path = path

def __eq__(s,other):
print 'inside equals'
return not os.popen('cmp %s %s' % (s.path,other.path)).read()

def __hashcode__(s): return s.path.__hashcode__()

the idea is that it accepts file paths and construct a set of unique
files (the command "cmp" compares files byte by byte.),the files can
have different paths but the same content

but the method __eq__ is never called
 
S

Steven Bethard

vegetax said:
How can i make my custom class an element of a set?

class Cfile:
def __init__(s,path): s.path = path

def __eq__(s,other):
print 'inside equals'
return not os.popen('cmp %s %s' % (s.path,other.path)).read()

def __hashcode__(s): return s.path.__hashcode__()

the idea is that it accepts file paths and construct a set of unique
files (the command "cmp" compares files byte by byte.),the files can
have different paths but the same content

but the method __eq__ is never called

Seems to be called fine for me:

py> class Cfile:
.... def __eq__(self, other):
.... print 'inside equals'
.... return False
.... def __hash__(self):
.... return 0
....
py> {Cfile():1, Cfile():2}
inside equals
{<__main__.Cfile instance at 0x01166490>: 1, <__main__.Cfile instance at
0x01166760>: 2}

Note that __eq__ won't be called if the hashes are different:

py> class Cfile:
.... hash = 0
.... def __eq__(self, other):
.... print 'inside equals'
.... return False
.... def __hash__(self):
.... Cfile.hash += 1
.... return Cfile.hash
....
py> {Cfile():1, Cfile():2}
{<__main__.Cfile instance at 0x01166918>: 1, <__main__.Cfile instance at
0x011668A0>: 2}

Steve
 
V

vegetax

Steven said:
Seems to be called fine for me:

py> class Cfile:
... def __eq__(self, other):
... print 'inside equals'
... return False
... def __hash__(self):
... return 0
...
py> {Cfile():1, Cfile():2}
inside equals
{<__main__.Cfile instance at 0x01166490>: 1, <__main__.Cfile instance at
0x01166760>: 2}

Note that __eq__ won't be called if the hashes are different:

I just tried and it wont be called =(, so how can i generate a hash code for
the CFile class? note that the comparitions(__eq__) are done based on the
contents of a file using the command 'cmp', i guess thats not posible but
thanks.
 
C

Carl Banks

vegetax said:
[snip]

I just tried and it wont be called =(, so how can i generate a hash code for
the CFile class? note that the comparitions(__eq__) are done based on the
contents of a file using the command 'cmp', i guess thats not posible but
thanks.


Let me suggest that, if your idea is to get a set of files all with
unique file contents, comparing a file byte-by-byte with each file
already in the set is going to be absurdly inefficient.

Instead, I recommend comparing md5 (or sha) digest. The idea is, you
read in each file once, calculate an md5 digest, and compare the
digests instead of the file contents.

.. import md5
..
.. class Cfile:
.. def __init__(self,path):
.. self.path = path
.. self.md5 = md5.new().update(open(path).read()).digest()
.. def __eq__(self,other):
.. return self.md5 == other.md5
.. def __hash__(self):
.. return hash(self.md5)

This is kind of hackish (not to mention untested). You would probably
do better to mmap the file (see the mmap module) rather than read it.

And, in case you're wondering: yes it is theoretically possible for
different files to have the same md5. However, the chances are
microscopic. (Incidentally, the SCons build system uses MD5 to decide
if a file has been modified.)
 
J

John Machin

vegetax said:
How can i make my custom class an element of a set?

the idea is that it accepts file paths and construct a set of unique
files (the command "cmp" compares files byte by byte.),the files can
have different paths but the same content

Q: How do I transport ten sumo wrestlers on a unicycle?

A: With extreme difficulty. You may well need a different vehicle.

Think about your requirements, then implement the most appropriate data
structure. If, as is likely, you need to know which and how many files
are identical, then a set won't do the job by itself. You may need a
union-find gadget.

Then before you rush and implement something, google around and look in
the Tools and Scripts directories in the Python distribution; I'm quite
sure I've seen something like a "duplicate file detector" written in
Python somewhere.

HTH,
John
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,999
Messages
2,570,243
Members
46,838
Latest member
KandiceChi

Latest Threads

Top