Best way to compare the contents of two directories

R

Robin Siebler

I have two directory trees that I want to compare and I'm trying to
figure out what the best way of doing this would be. I am using walk
to get a list of all of the files in each directory.

I am using this code to compare the file lists:

def compare_files(first_list, second_list, first_dir, second_dir):
missing = in_first_only(first_list, second_list)
for item in missing:
index = first_list.index(item)
print first_list[index] + ' does not exist in ' +
second_dir[index]
first_list.pop(index); first_dir.pop(index)
return first_list, second_list, first_dir, second_dir

However, before I actually compare the files, I want to compare the
directories and if a directory is mising in either set, I want to
report it:

dir_list_a = ['d:\\results\\foldera\\','d:\\results\\folderb\\','d:\\results\\folderc\\']
dir_list_b = ['c:\\results\\foldera\\','c:\\results\\folderb\\']

output:
'folderc' exists in d:\results but not in c:\results


I am using splitall (from the Python Cookbook) to split the paths into
there parts and appending this to a list, but I can't figure out the
best way to compare the contents of the resulting 2 lists and I think
I am starting to make things *too* complicated:

def splitall(path):
"""
Source: Python Cookbook
Credit: Trent Mick

Split a path into all of its parts.
"""
allparts = []
while 1:
parts = os.path.split(path)
if parts[0] == path:
allparts.insert(0, parts[0])
break
elif parts[1] == path:
allparts.insert(0, parts[1])
break
else:
path = parts[0]
allparts.insert(0, parts[1])
return allparts

After using this, I end up with this:

dir_list_a = [['d:\\', 'results', 'foldera', 'd:\\', 'results',
'folderb', 'd:\\', 'results', 'folderc']]
dir_list_b =
[['d:\\', 'results', 'foldera', 'd:\\', 'results', 'folderb']]
 
D

Dan Dang Griffith

I have two directory trees that I want to compare and I'm trying to
figure out what the best way of doing this would be. I am using walk
to get a list of all of the files in each directory.

Once you have the two lists, look into difflib. E.g.,

import difflib
for i in difflib.ndiff(list1, list2):
print i

Dan Gass has recently contributed some code that can produce
a side-by-side difference in HTML format. I believe it is in the 2.4
release, but you can also get it from
"""https://sourceforge.net/tracker/?func=detail&atid=305470&aid=914575&group_id=5470"""

import difflib
tbl = difflib.HtmlDiff().make_file(list1, list2)
f = open("diffs.html", "w")
f.write(tbl)
f.close()

--dang
 
R

Raymond Hettinger

[Robin Siebler]
However, before I actually compare the files, I want to compare the
directories and if a directory is mising in either set, I want to
report it:

The operative word is "set".

Try using sets.py:
one_only = Set(dirlistone) - Set(dirlisttwo)
two_only = Set(dirlisttwo) - Set(dirlistone)


Raymond Hettinger
 
R

Robin Siebler

Raymond Hettinger said:
[Robin Siebler]
However, before I actually compare the files, I want to compare the
directories and if a directory is mising in either set, I want to
report it:

The operative word is "set".

Try using sets.py:
one_only = Set(dirlistone) - Set(dirlisttwo)
two_only = Set(dirlisttwo) - Set(dirlistone)


Raymond Hettinger

I get the following error:

NameError: name 'Set' is not defined

I'm using ActivePython 2.2.3. Is this something that has been added
to a later version of Python?
 
C

Cliff Wells

Raymond Hettinger said:
[Robin Siebler]
However, before I actually compare the files, I want to compare the
directories and if a directory is mising in either set, I want to
report it:

The operative word is "set".

Try using sets.py:
one_only = Set(dirlistone) - Set(dirlisttwo)
two_only = Set(dirlisttwo) - Set(dirlistone)


Raymond Hettinger

I get the following error:

NameError: name 'Set' is not defined

I'm using ActivePython 2.2.3. Is this something that has been added
to a later version of Python?

I think sets were added in 2.3, but either way you must still 'from sets
import Set' before using them.

Regards,
Cliff
 
S

Steven Bethard

Cliff Wells said:
I think sets were added in 2.3, but either way you must still 'from sets
import Set' before using them.

It also might be good to get in the habit of writing this as:

from sets import Set as set

so that when you move to Python 2.4, where set() is a builtin, all you have to
do is remove the import.

Python 2.4a3 (#56, Sep 2 2004, 20:50:21) [MSC v.1310 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.set([0, 2, 4, 6, 8, 10, 12])

Steve
 
R

Raymond Hettinger

I'm using ActivePython 2.2.3. Is this something that has been added
I think sets were added in 2.3, but either way you must still 'from sets
import Set' before using them.

Right.

Also, sets.py is now Py2.2 compatibility, so you can take the current module off
of CVS and use it directly.


Raymond Hettinger
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,969
Messages
2,570,161
Members
46,708
Latest member
SherleneF1

Latest Threads

Top