Searching Directories

J

Jabari Zakiya

I'm trying to figure out the best way to accomplish this task.

Being new to Ruby, I'm trying to be efficient (if not elegant),
but first just want something that works.

I'm using Ruby 1.8.2 rc4 on Windows 2000.

Here's the task.

--------------
I have text files of lastnames, one name per line.

I want to read each name from each file, and determine
if that name is part of any file which has the extensions
*.txt or *.doc or *.rtf which occur in any directory within
the tree structure for a given top level directory.

The human interface is something simple like this:

The text filename is keyboard entered into "NameListFile".
The top-level-dir is keyboard entered into "DirToSearch".
The output-file is keyboard entered into "SearchResults".

Doing something like: DirItems = Dir.entries(DirToSearch)
I can get an array of items in the top-level directory.

Given this array of files and directories, I just need to
check to see if an item is a "file" or "directory".

If its a "file" I check to see that its extension is
*.txt or *.doc or *.rtf. If the file has any of those
extensions, I then check to see whether the filename
contains, in any part, a name from "NameListFile".

If YES, I print the name in an output file in the formant:
Name, filedate, filename(fullpath relative to "DirToSearch")

If an item in DirItems is a "directory" then I do the checking
process for the entries in that directory, etc, etc, untill
every file in every directory is searched.

It seems this is a natural case to use recursion to check all
the directories in the tree, but I'm not sure which methods
to use to do this.

This seems like a pretty easy/standard thing to do, but I just
don't know enough ruby (yet) to do it easy.

Your help and guidance is appreciated in advance.

Jabari
 
M

Markus

Pseudo-code is your friend. What you say you want to do is something
like:

define find_stuff_in a_directory
for each thing in a_directory
if it is a directory
find stuff in it (unless it's . & ..)
else (when it's a file)
print information about it if its interesting

All of this is from you e-mail, and could apply to any language.
Now for the ruby part. Look up the class Dir, which will give you the
contents of a directory as an array, Array (and Enumerable) which let
you walk through instances, and Regexp which let you test strings to see
if they match patterns.

Note that the recursion sort of takes care of itself in the forth line
of the pseudo code.

Have fun.

-- MarkusQ
 
O

Osuka Adartse

Jabari said:
I'm trying to figure out the best way to accomplish this task.

Being new to Ruby, I'm trying to be efficient (if not elegant),
but first just want something that works.

I'm using Ruby 1.8.2 rc4 on Windows 2000.

Here's the task.

--------------
I have text files of lastnames, one name per line.

I want to read each name from each file, and determine
if that name is part of any file which has the extensions
*.txt or *.doc or *.rtf which occur in any directory within
the tree structure for a given top level directory.

The human interface is something simple like this:

The text filename is keyboard entered into "NameListFile".
The top-level-dir is keyboard entered into "DirToSearch".
The output-file is keyboard entered into "SearchResults".

Doing something like: DirItems = Dir.entries(DirToSearch)
I can get an array of items in the top-level directory.

Given this array of files and directories, I just need to
check to see if an item is a "file" or "directory".

If its a "file" I check to see that its extension is
*.txt or *.doc or *.rtf. If the file has any of those
extensions, I then check to see whether the filename
contains, in any part, a name from "NameListFile".

If YES, I print the name in an output file in the formant:
Name, filedate, filename(fullpath relative to "DirToSearch")

If an item in DirItems is a "directory" then I do the checking
process for the entries in that directory, etc, etc, untill
every file in every directory is searched.

It seems this is a natural case to use recursion to check all
the directories in the tree, but I'm not sure which methods
to use to do this.

This seems like a pretty easy/standard thing to do, but I just
don't know enough ruby (yet) to do it easy.

Your help and guidance is appreciated in advance.

Jabari
require 'find'
dirs=Array.new
files=Array.new
Find.find('/directory/path'){|entry|
if FileTest.directory?(entry)
dirs.push(entry)
else
files.push(entry)
end
}

#by now dirs and files are arrays of diretories and files, find is
#recursive so You can use it instead of Dir[]
require 'pp'
pp dirs
#=>["E:/usr\\local/rdoc/sqlite-1.3.0",
"E:/usr\\local/rdoc/sqlite-1.3.0/files",
"E:/usr\\local/rdoc/sqlite-1.3.0/classes",
"E:/usr\\local/rdoc/sqlite-1.3.0/classes/SQLite"]
#process files...
files.each{|f|
case File.extname(f)
when '.html' then puts f + ' : is a HTML'
when 'rid' then puts f + ' : is mmmh something...'
end
}
#results in
#=>
E:/usr\local/rdoc/sqlite-1.3.0/index.html : is a HTML
E:/usr\local/rdoc/sqlite-1.3.0/fr_method_index.html : is a HTML
E:/usr\local/rdoc/sqlite-1.3.0/fr_file_index.html : is a HTML
E:/usr\local/rdoc/sqlite-1.3.0/fr_class_index.html : is a HTML
E:/usr\local/rdoc/sqlite-1.3.0/files/sqlite_rb.html : is a HTML
E:/usr\local/rdoc/sqlite-1.3.0/files/sqlite_c.html : is a HTML
E:/usr\local/rdoc/sqlite-1.3.0/created.rid : is mmmh something...

hope this may help you...
Adartse
 
J

Jabari Zakiya

Osuka Adartse said:
Jabari said:
I'm trying to figure out the best way to accomplish this task.

Being new to Ruby, I'm trying to be efficient (if not elegant),
but first just want something that works.

I'm using Ruby 1.8.2 rc4 on Windows 2000.

Here's the task.

--------------
I have text files of lastnames, one name per line.

I want to read each name from each file, and determine
if that name is part of any file which has the extensions
*.txt or *.doc or *.rtf which occur in any directory within
the tree structure for a given top level directory.

The human interface is something simple like this:

The text filename is keyboard entered into "NameListFile".
The top-level-dir is keyboard entered into "DirToSearch".
The output-file is keyboard entered into "SearchResults".

Doing something like: DirItems = Dir.entries(DirToSearch)
I can get an array of items in the top-level directory.

Given this array of files and directories, I just need to
check to see if an item is a "file" or "directory".

If its a "file" I check to see that its extension is
*.txt or *.doc or *.rtf. If the file has any of those
extensions, I then check to see whether the filename
contains, in any part, a name from "NameListFile".

If YES, I print the name in an output file in the formant:
Name, filedate, filename(fullpath relative to "DirToSearch")

If an item in DirItems is a "directory" then I do the checking
process for the entries in that directory, etc, etc, untill
every file in every directory is searched.

It seems this is a natural case to use recursion to check all
the directories in the tree, but I'm not sure which methods
to use to do this.

This seems like a pretty easy/standard thing to do, but I just
don't know enough ruby (yet) to do it easy.

Your help and guidance is appreciated in advance.

Jabari
require 'find'
dirs=Array.new
files=Array.new
Find.find('/directory/path'){|entry|
if FileTest.directory?(entry)
dirs.push(entry)
else
files.push(entry)
end
}

#by now dirs and files are arrays of diretories and files, find is
#recursive so You can use it instead of Dir[]
require 'pp'
pp dirs
#=>["E:/usr\\local/rdoc/sqlite-1.3.0",
"E:/usr\\local/rdoc/sqlite-1.3.0/files",
"E:/usr\\local/rdoc/sqlite-1.3.0/classes",
"E:/usr\\local/rdoc/sqlite-1.3.0/classes/SQLite"]
#process files...
files.each{|f|
case File.extname(f)
when '.html' then puts f + ' : is a HTML'
when 'rid' then puts f + ' : is mmmh something...'
end
}
#results in
#=>
E:/usr\local/rdoc/sqlite-1.3.0/index.html : is a HTML
E:/usr\local/rdoc/sqlite-1.3.0/fr_method_index.html : is a HTML
E:/usr\local/rdoc/sqlite-1.3.0/fr_file_index.html : is a HTML
E:/usr\local/rdoc/sqlite-1.3.0/fr_class_index.html : is a HTML
E:/usr\local/rdoc/sqlite-1.3.0/files/sqlite_rb.html : is a HTML
E:/usr\local/rdoc/sqlite-1.3.0/files/sqlite_c.html : is a HTML
E:/usr\local/rdoc/sqlite-1.3.0/created.rid : is mmmh something...

hope this may help you...
Adartse


Thanks for the suggestions.

Heres my approach.
---------------------------------------------
require 'find'

def dirsearch(namefile, topdir, outfile)

out = File.new(outfile, "w")
#Total number of files found
total = 0

# Array with file extensions to check for
exts = %w{.doc .rtf .txt}

# Check entries in each subdirectory of TopDir
Find.find(topdir){|entry|
# If entry a Directory search inside it
if FileTest.directory?(entry)
next
# Else entry was a file
else
# If current file has a desired extension
if exts.include? File.extname(entry)
# Check for each name in namefile
File.open(namefile).each{|name| name = name.chomp
# If name is part of basename of file
if File.basename(entry) =~ /#{name}/
# Write line to output file and screen if there is a match
out.puts(name+ ", " + entry + ", " + File.open(entry).mtime.to_s)
puts name + ", " + entry + ", " + File.open(entry).mtime.to_s
total += 1
end
}
end
end
}
print "Total files = ", total , "\n"
out.close
end
 
N

nobu.nokada

Hi,

At Thu, 2 Sep 2004 04:50:22 +0900,
Jabari Zakiya wrote in [ruby-talk:111197]:
Heres my approach.
---------------------------------------------
require 'find'

def dirsearch(namefile, topdir, outfile)

out = File.new(outfile, "w")
#Total number of files found
total = 0

# Array with file extensions to check for
exts = %w{.doc .rtf .txt}

# Check entries in each subdirectory of TopDir
Find.find(topdir){|entry|
# If entry a Directory search inside it
if FileTest.directory?(entry)
next
# Else entry was a file
else

Don't you want to check if it is a file? Many file systems
have other types than file and directory, you should use
File.file?(entry) or:

stat = File.stat(entry)
if stat.file?
# If current file has a desired extension
if exts.include? File.extname(entry)
# Check for each name in namefile
File.open(namefile).each{|name| name = name.chomp
# If name is part of basename of file
if File.basename(entry) =~ /#{name}/

Compiling regexp each time would be too expensive.
# Write line to output file and screen if there is a match
out.puts(name+ ", " + entry + ", " + File.open(entry).mtime.to_s)
puts name + ", " + entry + ", " + File.open(entry).mtime.to_s

Leaving opened files is very bad manner. You can use
File.mtime(entry) instead, or with above File.stat:

out.puts(name+ ", " + entry + ", " + stat.mtime.to_s)
puts name + ", " + entry + ", " + stat.mtime.to_s
 
O

Osuka Adartse

Jabari said:
Thanks for the suggestions.

Heres my approach.
---------------------------------------------
require 'find'

def dirsearch(namefile, topdir, outfile)

out = File.new(outfile, "w")
#Total number of files found
total = 0

# Array with file extensions to check for
exts = %w{.doc .rtf .txt}

# Check entries in each subdirectory of TopDir
Find.find(topdir){|entry|
# If entry a Directory search inside it
if FileTest.directory?(entry)
next
unless I didn't understood You, there's no real point for this 2
lines... so I changed'em to deleting also the next 4...
if !FileTest.directory?(entry) && exts.include?(File.extname(entry))
# Else entry was a file
else
# If current file has a desired extension
if exts.include? File.extname(entry)
I prefer to add the ()'s for readability, I wondered for a sec' what the
above line meant ;-)
# Check for each name in namefile
File.open(namefile).each{|name| name = name.chomp
# If name is part of basename of file
if File.basename(entry) =~ /#{name}/
# Write line to output file and screen if there is a match
out.puts(name+ ", " + entry + ", " + File.open(entry).mtime.to_s)
btw there's no need to open the file for using several of File.methods
or the to_s
puts name + ", " + entry + ", " + File.open(entry).mtime.to_s
a matter of preference but I avoid: puts variable + ", " + ...etc intead
puts "#{variable}, #{var2}" or better use printf to get more control on
output, it's easier on the eyes...at least for me i.e. printf("%-24s,
%-48s , %s\n",name,entry,File.mtime(entry)) I used strftime for similar
reasons.
total += 1
end
}
end
end
}
print "Total files = ", total , "\n"
out.close
end
goodie!! :)

my lil' changes FWIW

def dirsearch(namefile, topdir, outfile)
require 'find'
out = File.new(outfile, "w")
#Total number of files found
total = 0

# Array with file extensions to check for
exts = %w{.doc .rtf .txt}

# Check entries in each subdirectory of TopDir
Find.find(topdir){|entry|
# we're looking for files not dirs and files thta match exts so...
if !FileTest.directory?(entry) && exts.include?(File.extname(entry))
# Check for each name in namefile
File.open(namefile).each{|name| name = name.chomp
# If name is part of basename of file
if File.basename(entry) =~ /#{name}/
# Write line to output file and screen if there is a match
out.printf("%-16s: %-48s:
%s\n",name,entry,File.mtime(entry).strftime("%d-%b-%Y %H:%M"))
#either this line with mtime's output as it is or formatted
with strftime
#strftime makes things more my way omitting info I don't
need/want...
#option#printf("%-24s: %-48s:
%s\n",name,entry,File.mtime(entry))
printf("%-16s: %-48s:
%s\n",name,entry,File.mtime(entry).strftime("%d-%b-%Y %H:%M"))
total += 1
end
}
end
}
print "Total files = #{total} \n"
out.close
end


cheers
Adartse
 
J

Jabari Zakiya

Osuka Adartse said:
my lil' changes FWIW

def dirsearch(namefile, topdir, outfile)
require 'find'
out = File.new(outfile, "w")
#Total number of files found
total = 0

# Array with file extensions to check for
exts = %w{.doc .rtf .txt}

# Check entries in each subdirectory of TopDir
Find.find(topdir){|entry|
# we're looking for files not dirs and files thta match exts so...
if !FileTest.directory?(entry) && exts.include?(File.extname(entry))
# Check for each name in namefile
File.open(namefile).each{|name| name = name.chomp
# If name is part of basename of file
if File.basename(entry) =~ /#{name}/
# Write line to output file and screen if there is a match
out.printf("%-16s: %-48s:
%s\n",name,entry,File.mtime(entry).strftime("%d-%b-%Y %H:%M"))
#either this line with mtime's output as it is or formatted
with strftime
#strftime makes things more my way omitting info I don't
need/want...
#option#printf("%-24s: %-48s:
%s\n",name,entry,File.mtime(entry))
printf("%-16s: %-48s:
%s\n",name,entry,File.mtime(entry).strftime("%d-%b-%Y %H:%M"))
total += 1
end
}
end
}
print "Total files = #{total} \n"
out.close
end


cheers
Adartse

---------------------------------------------------------
Taking everyones suggestions into account here is my new version.

This version should be more amenable to different systems.
I didn't bother to do elaborate output formatting because
the person who needs the output is satisfied with the
current format. I will use the formatting suggestions if
I need to in the future. There also is a default output file.

One thing I DO NEED is help on is EXCEPTION HANDLING!
I encountered CORRUPTED directories on a disk I searched
through, which caused the program to bomb with an error
message. I would like suggestions on including handling
DIR/FILE EXCEPTIONS. Ideally, I would like to be able to
continue searching when EXCEPTIONS are met and record
what the EXCEPTION (corrupted DIR/FILE) was. Is this possible?

Nobu, I didn't understand your comment about the Regexp
searching. Could you explain in more detail the issue, and
an alternative approach.

Also, I could have first put all the names in 'namefile'
in an array, and then iterate over that, for purposes of
speed, but in this case it doesn't matter. I might try it
though, just to see if it speeds things up appreciably.

Again, thanks is advance for your help and suggestions! ;-)

---------------------------------------------------------
require 'find'

def dirsearch(namefile, topdir, outputfile='searchresults.txt')

outfile = File.new(outputfile, "w")
#Total number of files found
total = 0

# Array with file extensions to check for
exts = %w{.doc .rtf .txt}

# Check entries in each subdirectory of topdir
Find.find(topdir){|entry|
# If entry is a file with a desired extension
if File.file?(entry) && exts.include?(File.extname(entry))
# Check for each name in namefile
File.open(namefile).each{|name| name = name.chomp
# If name is part of basename of file
if File.basename(entry) =~ /#{name}/
# Write line to output file and screen if there is a match
outfile.puts("#{name}:\n #{entry}, #{File.stat(entry).mtime.to_s}")
puts "#{name}:\n #{entry}, #{File.stat(entry).mtime.to_s}"
total += 1
end
}
# Else get next entry
else
next
end
}
print "Total files = ", total , "\n"
outfile.close
end
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,982
Messages
2,570,189
Members
46,734
Latest member
manin

Latest Threads

Top