Slight discrepancy with filecmp.cmp

I

Ivan Van Laningham

Hi All--
I noticed recently that a few of the jpgs from my digital cameras have
developed bitrot. Not a real problem, because the cameras are CD
Mavicas, and I can simply copy the original from the cd. Except for the
fact that I've got nearly 25,000 images to check. So I wrote a set of
programs to both index the disk versions with the cd versions, and to
compare, using filecmp.cmp(), the cd and disk version. Works fine.
Turned up several dozen files that had been inadvertantly rotated or
saved with the wrong quality, various fat-fingered mistakes like that.

However, it didn't flag the files that I know have bitrot. I seem to
remember that diff uses a checksum algorithm on binary files, not a
byte-by-byte comparison. Am I wrong? If I am, what then is the source
of the problem in my jpg images where it looks like a bit or two has
been shifted or added; suddenly, there's a line going through the
picture above which it's normal, and below it either the color has
changed (usually to pinkish) or the remaining raster lines are all
shifted either right or left?

Any ideas?

Metta,
Ivan
----------------------------------------------
Ivan Van Laningham
God N Locomotive Works
http://www.andi-holmes.com/
http://www.foretec.com/python/workshops/1998-11/proceedings.html
Army Signal Corps: Cu Chi, Class of '70
Author: Teach Yourself Python in 24 Hours
 
J

John Machin

On Sun, 17 Apr 2005 22:06:04 -0600, Ivan Van Laningham
So I wrote a set of
programs to both index the disk versions with the cd versions, and to
compare, using filecmp.cmp(), the cd and disk version. Works fine.
Turned up several dozen files that had been inadvertantly rotated or
saved with the wrong quality, various fat-fingered mistakes like that.

However, it didn't flag the files that I know have bitrot. I seem to
remember that diff uses a checksum algorithm on binary files, not a
byte-by-byte comparison. Am I wrong?

According to the docs:

"""
cmp( f1, f2[, shallow[, use_statcache]])

Compare the files named f1 and f2, returning True if they seem equal,
False otherwise.
Unless shallow is given and is false, files with identical os.stat()
signatures are taken to be equal
"""

and what is an os.stat() signature, you ask? So did I.

According to the code itself:

def _sig(st):
return (stat.S_IFMT(st.st_mode),
st.st_size,
st.st_mtime)

Looks like it assumes two files are the same if they are of the same
type, same size, and same time-last-modified. Normally I guess that's
good enough, but maybe the phantom bit-toggler is bypassing the file
system somehow. What OS are you running?

You might like to do two things: (1) run your comparison again with
shallow=False (2) submit a patch to the docs.

(-:
You have of course attempted to eliminate other variables by checking
that the bit-rot effect is apparent using different display software,
a different computer, an observer who's not on the same medication as
you, ... haven't you?
:)


HTH,
John
 
I

Ivan Van Laningham

Hi All--

John said:
On Sun, 17 Apr 2005 22:06:04 -0600, Ivan Van Laningham
So I wrote a set of
programs to both index the disk versions with the cd versions, and to
compare, using filecmp.cmp(), the cd and disk version. Works fine.
Turned up several dozen files that had been inadvertantly rotated or
saved with the wrong quality, various fat-fingered mistakes like that.

However, it didn't flag the files that I know have bitrot. I seem to
remember that diff uses a checksum algorithm on binary files, not a
byte-by-byte comparison. Am I wrong?

According to the docs:

"""
cmp( f1, f2[, shallow[, use_statcache]])

Compare the files named f1 and f2, returning True if they seem equal,
False otherwise.
Unless shallow is given and is false, files with identical os.stat()
signatures are taken to be equal
"""

and what is an os.stat() signature, you ask? So did I.

According to the code itself:

def _sig(st):
return (stat.S_IFMT(st.st_mode),
st.st_size,
st.st_mtime)

Looks like it assumes two files are the same if they are of the same
type, same size, and same time-last-modified. Normally I guess that's
good enough, but maybe the phantom bit-toggler is bypassing the file
system somehow. What OS are you running?

WinXP, SP2
You might like to do two things: (1) run your comparison again with
shallow=False (2) submit a patch to the docs.

You know, I read that doc, tried it, and it made absolutely no
difference. Then I read your message, read the docs again, and finally
realized I had flipped the sense of shallow in my head. Sheesh. So
then I tried it with shallow=False, not True, and it runs about ten
times slower, but it works. Beautifully.

Now I have to go back and redo the first five thousand, but it's worth
it. Thanks. Shows how much you need another set of eyeballs to debug
your brain;-)
(-:
You have of course attempted to eliminate other variables by checking
that the bit-rot effect is apparent using different display software,
a different computer, an observer who's not on the same medication as
you, ... haven't you?
:)

;-) Absolutely. Several different viewers and several different OSs.
And my wife never sees anything the way I do;-)

Metta,
Ivan
----------------------------------------------
Ivan Van Laningham
God N Locomotive Works
http://www.andi-holmes.com/
http://www.foretec.com/python/workshops/1998-11/proceedings.html
Army Signal Corps: Cu Chi, Class of '70
Author: Teach Yourself Python in 24 Hours
 
D

Dan Sommers

... Shows how much you need another set of eyeballs to debug your
brain;-)

+1 QOTW
... And my wife never sees anything the way I do;-)

There's probably a rude joke in there somewhere about your wife's eyes
debugging your brain, but since I would like to remain married, I will
not make it. :-/

Regards,
Dan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,996
Messages
2,570,238
Members
46,826
Latest member
robinsontor

Latest Threads

Top