filecmp.cmp() cache

  • Thread starter =?iso-8859-1?B?TWF0dGlhcyBCcuRuZHN0cvZt?=
  • Start date
?

=?iso-8859-1?B?TWF0dGlhcyBCcuRuZHN0cvZt?=

Hello!

I have a question about filecmp.cmp(). The short code snippet blow
does not bahave as I would expect:

import filecmp

f0 = "foo.dat"
f1 = "bar.dat"

f = open(f0, "w")
f.write("1:2")
f.close()

f = open(f1, "w")
f.write("1:2")
f.close()

print "cmp 1: " + str(filecmp.cmp(f0, f1, False))

f = open(f1, "w")
f.write("2:3")
f.close()

print "cmp 2: " + str(filecmp.cmp(f0, f1, False))

I would expect the second comparison to return False instead of True.
Looking at the docs for filecmp.cmp() I found the following: "This
function uses a cache for past comparisons and the results, with a
cache invalidation mechanism relying on stale signatures.". I guess
that this is the reason for my test case failing.

Is there someone here that can tell me how I should invalidate this
cache? If that is not possible, what workaround could I use? I guess
that I can write my own file comparison function, but I would not like
to have to do that since we have filecmp.

Any ideas?

Regards,
Mattias
 
P

Peter Otten

Mattias said:
I have a question about filecmp.cmp(). The short code snippet blow
does not bahave as I would expect:

import filecmp

f0 = "foo.dat"
f1 = "bar.dat"

f = open(f0, "w")
f.write("1:2")
f.close()

f = open(f1, "w")
f.write("1:2")
f.close()

print "cmp 1: " + str(filecmp.cmp(f0, f1, False))

f = open(f1, "w")
f.write("2:3")
f.close()

print "cmp 2: " + str(filecmp.cmp(f0, f1, False))

I would expect the second comparison to return False instead of True.
Looking at the docs for filecmp.cmp() I found the following: "This
function uses a cache for past comparisons and the results, with a
cache invalidation mechanism relying on stale signatures.". I guess
that this is the reason for my test case failing.

Is there someone here that can tell me how I should invalidate this
cache? If that is not possible, what workaround could I use? I guess
that I can write my own file comparison function, but I would not like
to have to do that since we have filecmp.

Any ideas?

You can clear the cache with

filecmp._cache = {}

as a glance into the filecmp module would have shown.
If you don't want to use the cache at all (untested):

class NoCache:
def __setitem__(self, key, value):
pass
def get(self, key):
return None
filecmp._cache = NoCache()


Alternatively an update to Python 2.5 might work as the type of
os.stat(filename).st_mtime was changed from int to float and now offers
subsecond resolution.

Peter
 
?

=?iso-8859-1?B?TWF0dGlhcyBCcuRuZHN0cvZt?=

You can clear the cache with

filecmp._cache = {}

as a glance into the filecmp module would have shown.

You are right, a quick glance would have enlighten me. Next time I
will RTFS first. :)
If you don't want to use the cache at all (untested):

class NoCache:
def __setitem__(self, key, value):
pass
def get(self, key):
return None
filecmp._cache = NoCache()

Just one small tought/question. How likely am I to run into trouble
because of this? I mean, by setting _cache to another value I'm
mucking about in filecmp's implementation details. Is this generally
considered OK when dealing with Python's standard library?

:.:: mattias
 
P

Peter Otten

Mattias said:
You are right, a quick glance would have enlighten me. Next time I
will RTFS first. :)


Just one small tought/question. How likely am I to run into trouble
because of this? I mean, by setting _cache to another value I'm
mucking about in filecmp's implementation details. Is this generally
considered OK when dealing with Python's standard library?

I think it's a feature that Python lends itself to monkey-patching, but
still there are a few things to consider:

- Every hack increases the likelihood that your app will break in the next
version of Python.
- You take some responsibility for the "patched" code. It's no longer the
tried and tested module as provided by the core developers.
- The module may be used elsewhere in the standard library or third-party
packages, and failures (or in the above example: performance degradation)
may ensue.

For a script and a relatively obscure module like 'filecmp' monkey-patching
is probably OK, but for a larger app or a module like 'os' that is heavily
used throughout the standard lib I would play it safe and reimplement.

Peter
 
?

=?iso-8859-1?B?TWF0dGlhcyBCcuRuZHN0cvZt?=

I think it's a feature that Python lends itself to monkey-patching, but
still there are a few things to consider:

- Every hack increases the likelihood that your app will break in the next
version of Python.
- You take some responsibility for the "patched" code. It's no longer the
tried and tested module as provided by the core developers.
- The module may be used elsewhere in the standard library or third-party
packages, and failures (or in the above example: performance degradation)
may ensue.

For a script and a relatively obscure module like 'filecmp' monkey-patching
is probably OK, but for a larger app or a module like 'os' that is heavily
used throughout the standard lib I would play it safe and reimplement.

Thanks for the insight! Right now I need this for a unit test, so in
this case I'm quite happy to use the NoCache solution you suggested.

:.:: brasse
 
S

Steve Holden

Peter said:
I think it's a feature that Python lends itself to monkey-patching, but
still there are a few things to consider:

- Every hack increases the likelihood that your app will break in the next
version of Python.
- You take some responsibility for the "patched" code. It's no longer the
tried and tested module as provided by the core developers.
- The module may be used elsewhere in the standard library or third-party
packages, and failures (or in the above example: performance degradation)
may ensue.

For a script and a relatively obscure module like 'filecmp' monkey-patching
is probably OK, but for a larger app or a module like 'os' that is heavily
used throughout the standard lib I would play it safe and reimplement.
It would probably be a good idea to add a clear_cache() function to the
module API for 2.6 to avoid such issues.

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden
Blog of Note: http://holdenweb.blogspot.com
See you at PyCon? http://us.pycon.org/TX2007
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,996
Messages
2,570,238
Members
46,826
Latest member
robinsontor

Latest Threads

Top