filecmp.cmp() doesn't seem to do what it says in the documentation

T

tinnews

I'm using filecmp.cmp() to compare some files (surprise!).

The documentation says:-
Unless shallow is given and is false, files with identical
os.stat() signatures are taken to be equal.

I'm not setting shallow explicitly so it's True, thus the function
should be comparing the os.stat() results. However this doesn't seem
to be the case as even if I touch one of the files to change it's
access/modification date filecmp.cmp() still returns True.

Here is an example:-

chris$ python
Python 2.6.5 (r265:79063, Apr 16 2010, 13:57:41)
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information. chris$ ls -l /media/disk/DCIM/103_FUJI/DSCF3084.JPG /home/chris/pictures/2010/4/24dscf3084.jpg
-rwxr-xr-x 1 chris chris 1783277 2010-09-06 17:36 /home/chris/pictures/2010/4/24dscf3084.jpg
-rwxr-xr-x 1 chris root 1783277 2010-09-06 17:53 /media/disk/DCIM/103_FUJI/DSCF3084.JPG
chris$

The file modification times are different, surely filecmp.cmp() should
be returning false. I actually think the way it's working makes more
sense as I don't care if the modification time has changed if the
files are exactly the same length still.
 
C

Carl Banks

Reword and read carefully: if shallow == True and signatures are
identical, then files are taken to be equal.
Here is the corresponding code from Lib/filecmp.py:
     if shallow and s1 == s2:
         return True
Does not say the result for non-identical signatures ;-).
Because it goes on to actually compare the files, and they are equal.
...
     result = _cache.get((f1, f2))
     if result and (s1, s2) == result[:2]:
         return result[2]
     outcome = _do_cmp(f1, f2)
     _cache[f1, f2] = s1, s2, outcome
     return outcome
Most of the stdlib files in Python are quite readable. I recommend it
when you have questions.

Well I still don't think it's what the documentation says, it would be
much better if it told you that 'if the os.stat() signatures are not
identical then the file contents are actually compared'.  The
implication to me when I read the documentation was that if shallow
was True and the os.stat() signatures were not identical then False
would be returned.  Where does it say otehrwise?

To me, "comparing files" means to compare the contents and nothing
else, so when documentation says "Compare the files named f1 and f2" I
think it has that covered. I understand the os.stat comparison to be
a (non-foolproof) optimization.


Anyway, if you just want to compare the os.stat parameters you should
just use os.stat.

os.stat(filename1) == os.stat(filename2)


Then if you want, you can write a function to compare only the stats
you are interested in.

def mystatcmp(filename1,filename2):
s1 = os.stat(filename1)
s2 = os.stat(filename2)
return s1.st_size == s2.st_size and s1.st_mtime == s2.st_mtime


Carl Banks
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,996
Messages
2,570,238
Members
46,826
Latest member
robinsontor

Latest Threads

Top