Comparing directory contents

D

dave davidson

Hi all,

I work in the SCM dept of a Windows software shop. A typical software build
involves us getting the code from an engineer, compiling the binaries, gathering
any support files, and then wrapping it in an installer (Installshield). We run
the installer to make sure everything looks ok. As quick-and-dirty sanity check
to make sure we got everything, we go into the install folder, do a 'dir /s',
and pipe the output to a text file. If the file list in the current build
matches the file list of the previous, we give it the ok. These lists are saved
on disk, printed and filed with the build paperwork so we can refer to them
again if necessary.

This method works surprisingly well for catching files that were mistakenly
excluded, but as you can imagine it gets very tedious and error-prone since we
have to hand-check the output. Additionally, many times we are asked by the
engineer to include additional support files, or remove existing ones. I'm
thinking there must be a better way, or better yet, a Ruby Way :)
I am relatively new to the language, so I don't really know which angle to
attack it from. The basic gist would be to read in the previous file list
output, strip any junk (extra spaces, line breaks, etc), and do the same for the
current, so what's left is two lists of just pure filenames (don't care about
timestamps or attributes right now). The script would process the lists and the
result would be something like "Indentical" or "Extra files: [filenames]" or
"Removed files: [filenames]".

I'm wondering if something like this already exists. A search of rubyforge and
RAA, however, did not turn up anything this specific, although I really wasn't
sure what I should be looking for. If I could be pointed to a base library that
would get me going, that would be great. Any insights on implementation would
also be greatly apprecited. Thanks!
 
B

Brian Schröder

Hi all,
=20
I work in the SCM dept of a Windows software shop. A typical software bu= ild
involves us getting the code from an engineer, compiling the binaries, ga= thering
any support files, and then wrapping it in an installer (Installshield). = We run
the installer to make sure everything looks ok. As quick-and-dirty sanit= y check
to make sure we got everything, we go into the install folder, do a 'dir = /s',
and pipe the output to a text file. If the file list in the current buil= d
matches the file list of the previous, we give it the ok. These lists ar= e saved
on disk, printed and filed with the build paperwork so we can refer to th= em
again if necessary.
=20
This method works surprisingly well for catching files that were mistaken= ly
excluded, but as you can imagine it gets very tedious and error-prone sin= ce we
have to hand-check the output. Additionally, many times we are asked by = the
engineer to include additional support files, or remove existing ones. I'= m
thinking there must be a better way, or better yet, a Ruby Way :)
I am relatively new to the language, so I don't really know which angle t= o
attack it from. The basic gist would be to read in the previous file lis= t
output, strip any junk (extra spaces, line breaks, etc), and do the same = for the
current, so what's left is two lists of just pure filenames (don't care a= bout
timestamps or attributes right now). The script would process the lists = and the
result would be something like "Indentical" or "Extra files: [filenames]"= or
"Removed files: [filenames]".
=20
I'm wondering if something like this already exists. A search of rubyfor= ge and
RAA, however, did not turn up anything this specific, although I really w= asn't
sure what I should be looking for. If I could be pointed to a base libra= ry that
would get me going, that would be great. Any insights on implementation = would
also be greatly apprecited. Thanks!
=20
=20
=20

Does this help?

bschroed@black:~/svn/projekte/ruby-things$ ls -1 > before.list
bschroed@black:~/svn/projekte/ruby-things$ touch another.one
bschroed@black:~/svn/projekte/ruby-things$ ls -1 > after.list
bschroed@black:~/svn/projekte/ruby-things$ irb
irb(main):001:0> before =3D File.read('before.list').to_a
=3D> ["before.list\n", ...]
irb(main):002:0> after =3D File.read('after.list').to_a
=3D> ["before.list\n", "after.list\n", "another.one\n", ...]
irb(main):003:0> before - after
=3D> []
irb(main):004:0> after - before
=3D> ["after.list\n", "another.one\n"]

regards,

Brian

--=20
http://ruby.brian-schroeder.de/

Stringed instrument chords: http://chordlist.brian-schroeder.de/
 
L

Lothar Scholz

Hello Jacob,

JF> Though Brian Schr=F6der gave an interesting irb implementation, what =
you
JF> really need is diff[1]. And don't despair, there is diff for
JF> Windows[2] (via the command line).

JF> The GNU developers have put a *lot*
JF> of work and refinement into this heavily used tool -- don't reinvent
JF> the wheel.

<flame>
And they still got nothing what even comes close to "AraxisMerge" on
Windows, neither from the GUI nor from the quality of the diff algorithm.
</flame>

But back to the question from the original poster, i think diff is a
complete wrong idea as he said he only needs the file names and the conte=
nt
does not matter for an installer as he puts the complete file into the
setup.exe.

I don't see a very ruby way to solve it as it is a not very
complicated task to process strings. Build two hashs over the file lists
and compare them item by item. Just parsing the previous file list would =
be litte bit
complicated if the Installshield file format must be parsed and not a
plain string list, but still it should be able to write the script in
100 lines. Or maybe i did not understand dave's real problem.


--=20
Best regards, emailto: scholz at scriptolutions d=
ot com
Lothar Scholz http://www.ruby-ide.com
CTO Scriptolutions Ruby, PHP, Python IDE 's
=20
 
J

Jacob Fugal

JF> Though Brian Schr=F6der gave an interesting irb implementation, what = you
JF> really need is diff[1]. And don't despair, there is diff for
JF> Windows[2] (via the command line).
=20
JF> The GNU developers have put a *lot*
JF> of work and refinement into this heavily used tool -- don't reinvent
JF> the wheel.
=20
<flame>
And they still got nothing what even comes close to "AraxisMerge" on
Windows, neither from the GUI nor from the quality of the diff algorithm.
</flame>

Ok, to qualify my statement: Don't reinvent this particular wheel for
a once-off solution. I won't say that someone else can make a better
wheel when that's their primary goal. I don't think Dave Davidson's
goal is to develop a new diff utility. Regarding AraxisMerge, I've
never heard of it. It may be better than GNU DiffUtils. I can't make
any judgement there.
But back to the question from the original poster, i think diff is a
complete wrong idea as he said he only needs the file names and the conte= nt
does not matter for an installer as he puts the complete file into the
setup.exe.

diff -qr | grep '^Only'

Know the tool before dismissing it.

Jacob Fugal
 
A

Ara.T.Howard

JF> Though Brian Schr=F6der gave an interesting irb implementation, what = you
JF> really need is diff[1]. And don't despair, there is diff for
JF> Windows[2] (via the command line).
=20
JF> The GNU developers have put a *lot*
JF> of work and refinement into this heavily used tool -- don't reinvent
JF> the wheel.
=20
<flame>
And they still got nothing what even comes close to "AraxisMerge" on
Windows, neither from the GUI nor from the quality of the diff algorithm.
</flame>

Ok, to qualify my statement: Don't reinvent this particular wheel for
a once-off solution. I won't say that someone else can make a better
wheel when that's their primary goal. I don't think Dave Davidson's
goal is to develop a new diff utility. Regarding AraxisMerge, I've
never heard of it. It may be better than GNU DiffUtils. I can't make
any judgement there.
But back to the question from the original poster, i think diff is a
complete wrong idea as he said he only needs the file names and the conte= nt
does not matter for an installer as he puts the complete file into the
setup.exe.

diff -qr | grep '^Only'

Know the tool before dismissing it.

the way i read the OP's post the original contents should be stored and
alterable. the diff approach would require both directories to exist and be
stored and i think the OP wanted to store only the __inventory__ of the dir -
not the actual dir. so not only would the storage/database requirements
skyrocket, but you'd be using a sledgehammer to pound in a mini-tack. this
problem is quite easily solved in only a few lines of ruby - including
database code, command line parsing, etc:


here's the code:


harp:~ > cat ./dirlist

#! /usr/bin/env ruby
require 'pstore'
require 'yaml'
require 'getoptlong'

class DirDb < ::pStore
def [] dir
transaction{ super(exp(dir)) rescue nil}
end
def []= dir, filelist
transaction{ super(exp(dir), filelist) }
end
def exp dir
File::expand_path dir
end
end

class FileList < ::Array
def initialize dir
@dir = File::expand_path dir
@glob = File::join @dir, '**', '*'
replace Dir[@glob].map{|f| File::expand_path f}
end
def basenames
map{|f| f.gsub(%r|^#{ Regexp::escape @dir }/*|,'')}
end
def add filename
self << File::expand_path(File::join(@dir, filename))
end
def delete filename
super(File::expand_path(File::join(@dir, filename)))
end
def to_yaml
to_a.to_yaml
end
end

class Main
def self::main(*a, &b)
new(*a, &b).run
end
def initialize
gl = GetoptLong::new ['--db', '-d', GetoptLong::REQUIRED_ARGUMENT]
gl.each do |opt, arg|
case opt
when /db/
@db_path = arg
end
end
@db_path ||= File::expand_path(File::join('~', '.dirdb'))
@mode, @mode_args = ARGV.shift, ARGV
@mode ||= 'help'
@db = DirDb::new @db_path
end
def run
send(@mode, *@mode_args)
end
def scan dir
@db[dir] = FileList::new dir
show dir
end
def show dir
y @db[dir]
end
def report old_dir, new_dir
previous = @db[old_dir]
current = FileList::new new_dir
report = {}
report['identical'] = previous.basenames & current.basenames
report['extra'] = current.basenames - previous.basenames
report['removed'] = previous.basenames - current.basenames
y report
end
def add dir, filename
filelist = @db[dir]
filelist.add filename
@db[dir] = filelist
end
def delete dir, filename
filelist = @db[dir]
filelist.delete filename
@db[dir] = filelist
end
def help
puts "#{ $0 } scan dir | show dir | report new_dir old_dir | add dir filename | delete dir filename"
end
end

$0 == __FILE__ and Main::main



and here's how you use it:


harp:~ > mkdir version-1.0.0 && touch version-1.0.0/a version-1.0.0/b version-1.0.0/c


harp:~ > ./dirlist
./dirlist scan dir | show dir | report new_dir old_dir | add dir filename | delete dir filename


harp:~ > ./dirlist scan version-1.0.0/
---
- /home/ahoward/version-1.0.0/a
- /home/ahoward/version-1.0.0/b
- /home/ahoward/version-1.0.0/c


harp:~ > rm -rf version-1.0.0/


harp:~ > mkdir version-2.0.0 && touch version-2.0.0/a version-2.0.0/b


harp:~ > ./dirlist report version-1.0.0 version-2.0.0
---
removed:
- c
extra: []
identical:
- a
- b


harp:~ > touch version-2.0.0/c


harp:~ > ./dirlist report version-1.0.0 version-2.0.0
---
removed: []
extra: []
identical:
- a
- b
- c


harp:~ > touch version-2.0.0/d


harp:~ > ./dirlist report version-1.0.0 version-2.0.0
---
removed: []
extra:
- d
identical:
- a
- b
- c


harp:~ > ./dirlist add version-1.0.0 d


harp:~ > ./dirlist report version-1.0.0 version-2.0.0
---
removed: []
extra: []
identical:
- a
- b
- c
- d


harp:~ > rm version-2.0.0/a


harp:~ > ./dirlist report version-1.0.0 version-2.0.0
---
removed:
- a
extra: []
identical:
- b
- c
- d


harp:~ > ./dirlist delete version-1.0.0 a


harp:~ > ./dirlist report version-1.0.0 version-2.0.0
---
removed: []
extra: []
identical:
- b
- c
- d


in any case, i'm all for using built-in tools to accomplish tasks - but this
task is so basic it seem silly not to just write it in pure ruby...

kind regards.

-a
--
===============================================================================
| email :: ara [dot] t [dot] howard [at] noaa [dot] gov
| phone :: 303.497.6469
| My religion is very simple. My religion is kindness.
| --Tenzin Gyatso
===============================================================================
 
D

dave davidson

All,

Thanks so much for the hints and pointers regarding this issue... I've not had a
chance to try all the suggestions (too busy counting files by hand :) but I just
wanted to let you know i appreciate the help!

Dave
 
J

Jacob Fugal

=20
the way i read the OP's post the original contents should be stored and
alterable. the diff approach would require both directories to exist and= be
stored and i think the OP wanted to store only the __inventory__ of the d= ir -
not the actual dir. so not only would the storage/database requirements
skyrocket, but you'd be using a sledgehammer to pound in a mini-tack.

Ok, I forgot about that constraint. I still think diff would be the
exact tool I would use when on a *nix system:

# Done once to build the list compared against
$ find master_dir/ > master.list

# Done each time to verify all files are there in the working copy
$ find working_dir/ | diff master.list -

I'll admit that once you start getting into pipes and such this
solution probably won't work, or at least not as easily, on Windows.

Jacob Fugal
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,997
Messages
2,570,239
Members
46,827
Latest member
DMUK_Beginner

Latest Threads

Top