K
Kaldrenon
Hi all (this is going to comp.lang.ruby and comp.lang.perl.misc),
The other day I wrote a basic program in Perl, and the following day I
rewrote it in Ruby. I'm curious about the differences in runtime of
the two versions, though.
Let me start by describing the program (I'll append full code for both
to the end): it reads in a list of alphanumeric codes from file
(format is [\w\d\S]+_\d{3}, but they're separated by a comma in the
file), then creates a hash with those codes as keys and empty arrays
as values. After the hash is built, the program traverses through a
given directory and its subdirectories (using File::Find in Perl and
Find.find in Ruby) and checks each file against the hash of codes
(with a few regexps and conditions to prevent lots of unnecessary
looping), adding it to the array for a code if the code is found in
the filename. Finally, it writes the contents of the hash to a .csv
file in the format CODE,PATH for each match.
Now, if it were the case that Ruby or Perl were simply -slower- than
the other, I wouldn't be bothering you folks. But here's where it gets
a little unusual: the number of elements in the code list has a
noticeable impact on the run time of the Ruby version, but far less on
the Perl version. I ran each one a few times with code lists of
various sizes, and they both print start/stop timestamps at the end,
so I collected the data:
Entries | Seconds
Ruby
4 | 153
64 | 133
256 | 222
512 | 327
1024 | 562
1500 | 683
Perl
4 | 291
64 | 258
256 | 253
512 | 248
1024 | 353
1500 | 363
Ruby runs faster for low numbers of entries, as you can see, but once
you get up to 1500, Ruby's time has more than tripled while Perl's
time has gone up about a fifth.
I've looked over the code for both versions several times, and I don't
see any significant differences. The only important feature the Ruby
version lacks is the sort() before writing the file.
I'd really appreciate any insight into why Ruby's runtime grows so
readily and Perl's does not.
Code of both versions follows.
Thanks,
Andrew Fallows
use File::Find;
use strict;
use warnings;
my $code;
my $type;
my %filecodes = ();
my $start_time = "Started: " . localtime();
$| = 1; #Enables flush on print.
$\ = "\n"; #Automatic newlines on print
open(ITEM_LIST, "(path)") or die "Error";
# This loop builds a hash whose keys are the codes/types from file
# and whose values are references to empty arrays
while(my $item = <ITEM_LIST>)
{
$item =~ s/,/_/;
$item =~ s/\n//g;
print $item;
my @files = ();
$filecodes{$item} = \@files;
}
print "Hash built";
# Uses File::Find to iterate over the entire subdirectory
find(\&file_seek, "(path)");
# The searching portion: gets each location from File::Find, then
compares it
# to all the targets. If there is a match, prints a message and adds
that file
# to the related array.
sub file_seek
{
my $file = $_;
# Kicks out if the file in question is not of the necessary format
if(!(-f $file) || !($file =~ /^[\d\w\S]+_\d{3}/)){ return; }
foreach my $target (keys(%filecodes))
{
# If the file name contains the code sought
if($file =~ /$target/)
{
print "found $file in $File::Find::dir";
# Jumps out if the list for this code already contains this file.
for (0..@{$filecodes{$target}})
{
if(defined(${$filecodes{$target}}[$_])
&& $File::Find::name eq ${$filecodes{$target}}[$_]) {return; }
}
push(@{$filecodes{$target}}, $File::Find::name);
}
}
}
# After the whole directory has been searched, prints each key and all
# values found for it.
open(RESULTS, "> (path)") or die "Error 2";
foreach my$target ( sort(keys( %filecodes )))
{
my @results = @{$filecodes{$target}};
if(@results == 0) { push(@results, "NO FILES FOUND") }
print $target;
foreach (@results)
{
print RESULTS "$target,$_";
print "\t$_";
}
}
close RESULTS;
print $start_time;
print "Ended: " . localtime();
Ruby:
class FileSearcher
$\ = "\n"
in_file = File.open( "(path)","r")
start_time = Time.now
filecodes = Hash.new
# This loop reads all the item codes in from file and then
# adds them to a hash, each linked to its own empty array
while item = in_file.gets
item = item.gsub(',','_')
item = item.gsub("\n","")
files = Array.new
files.push("empty");
filecodes[item]= files
end
in_file.close
# The searching portion: looks at each file/location, then compares
it
# to all the targets. If there is a match, prints a message and
adds
# that file to the related array.
require "Find"
require 'ftools'
Find.find("(path)") do |file|
if !(FileTest.file?(file)) || !(File.basename(file) =~ /^[\d\w\S]+_
\d{3}/)
next
else
filecodes.each_key do |target|
if(file =~ /#{target}/)
puts "found " + target + " at " + file
$stdout.flush
fail = 0
for i in 0..filecodes[target].size-1 do
if(filecodes[target] != "empty" &&
File.basename(file) == File.basename(filecodes[target]
))
fail = 1
break
end
end
if fail == 0
if filecodes[target][0] == "empty"
filecodes[target][0] = file
else
filecodes[target].push(file)
end
end
end
end
end
end
# After the whole directory has been searched, prints each key and
all
# values found for it to a file called Ruby_results.csv.
target_file = File.open("(path)","w")
filecodes.each_key do |target|
results = filecodes[target]
if results[0] == "empty"
results[0] = "NO FILES FOUND"
end
puts target
for i in 0..(results.size-1)
target_file.puts target + "," + results
end
end
target_file.close
end_time = Time.now
puts "Started: " + start_time.to_s
puts "Ended: " + end_time.to_s
end
The other day I wrote a basic program in Perl, and the following day I
rewrote it in Ruby. I'm curious about the differences in runtime of
the two versions, though.
Let me start by describing the program (I'll append full code for both
to the end): it reads in a list of alphanumeric codes from file
(format is [\w\d\S]+_\d{3}, but they're separated by a comma in the
file), then creates a hash with those codes as keys and empty arrays
as values. After the hash is built, the program traverses through a
given directory and its subdirectories (using File::Find in Perl and
Find.find in Ruby) and checks each file against the hash of codes
(with a few regexps and conditions to prevent lots of unnecessary
looping), adding it to the array for a code if the code is found in
the filename. Finally, it writes the contents of the hash to a .csv
file in the format CODE,PATH for each match.
Now, if it were the case that Ruby or Perl were simply -slower- than
the other, I wouldn't be bothering you folks. But here's where it gets
a little unusual: the number of elements in the code list has a
noticeable impact on the run time of the Ruby version, but far less on
the Perl version. I ran each one a few times with code lists of
various sizes, and they both print start/stop timestamps at the end,
so I collected the data:
Entries | Seconds
Ruby
4 | 153
64 | 133
256 | 222
512 | 327
1024 | 562
1500 | 683
Perl
4 | 291
64 | 258
256 | 253
512 | 248
1024 | 353
1500 | 363
Ruby runs faster for low numbers of entries, as you can see, but once
you get up to 1500, Ruby's time has more than tripled while Perl's
time has gone up about a fifth.
I've looked over the code for both versions several times, and I don't
see any significant differences. The only important feature the Ruby
version lacks is the sort() before writing the file.
I'd really appreciate any insight into why Ruby's runtime grows so
readily and Perl's does not.
Code of both versions follows.
Thanks,
Andrew Fallows
use File::Find;
use strict;
use warnings;
my $code;
my $type;
my %filecodes = ();
my $start_time = "Started: " . localtime();
$| = 1; #Enables flush on print.
$\ = "\n"; #Automatic newlines on print
open(ITEM_LIST, "(path)") or die "Error";
# This loop builds a hash whose keys are the codes/types from file
# and whose values are references to empty arrays
while(my $item = <ITEM_LIST>)
{
$item =~ s/,/_/;
$item =~ s/\n//g;
print $item;
my @files = ();
$filecodes{$item} = \@files;
}
print "Hash built";
# Uses File::Find to iterate over the entire subdirectory
find(\&file_seek, "(path)");
# The searching portion: gets each location from File::Find, then
compares it
# to all the targets. If there is a match, prints a message and adds
that file
# to the related array.
sub file_seek
{
my $file = $_;
# Kicks out if the file in question is not of the necessary format
if(!(-f $file) || !($file =~ /^[\d\w\S]+_\d{3}/)){ return; }
foreach my $target (keys(%filecodes))
{
# If the file name contains the code sought
if($file =~ /$target/)
{
print "found $file in $File::Find::dir";
# Jumps out if the list for this code already contains this file.
for (0..@{$filecodes{$target}})
{
if(defined(${$filecodes{$target}}[$_])
&& $File::Find::name eq ${$filecodes{$target}}[$_]) {return; }
}
push(@{$filecodes{$target}}, $File::Find::name);
}
}
}
# After the whole directory has been searched, prints each key and all
# values found for it.
open(RESULTS, "> (path)") or die "Error 2";
foreach my$target ( sort(keys( %filecodes )))
{
my @results = @{$filecodes{$target}};
if(@results == 0) { push(@results, "NO FILES FOUND") }
print $target;
foreach (@results)
{
print RESULTS "$target,$_";
print "\t$_";
}
}
close RESULTS;
print $start_time;
print "Ended: " . localtime();
Ruby:
class FileSearcher
$\ = "\n"
in_file = File.open( "(path)","r")
start_time = Time.now
filecodes = Hash.new
# This loop reads all the item codes in from file and then
# adds them to a hash, each linked to its own empty array
while item = in_file.gets
item = item.gsub(',','_')
item = item.gsub("\n","")
files = Array.new
files.push("empty");
filecodes[item]= files
end
in_file.close
# The searching portion: looks at each file/location, then compares
it
# to all the targets. If there is a match, prints a message and
adds
# that file to the related array.
require "Find"
require 'ftools'
Find.find("(path)") do |file|
if !(FileTest.file?(file)) || !(File.basename(file) =~ /^[\d\w\S]+_
\d{3}/)
next
else
filecodes.each_key do |target|
if(file =~ /#{target}/)
puts "found " + target + " at " + file
$stdout.flush
fail = 0
for i in 0..filecodes[target].size-1 do
if(filecodes[target] != "empty" &&
File.basename(file) == File.basename(filecodes[target]
))
fail = 1
break
end
end
if fail == 0
if filecodes[target][0] == "empty"
filecodes[target][0] = file
else
filecodes[target].push(file)
end
end
end
end
end
end
# After the whole directory has been searched, prints each key and
all
# values found for it to a file called Ruby_results.csv.
target_file = File.open("(path)","w")
filecodes.each_key do |target|
results = filecodes[target]
if results[0] == "empty"
results[0] = "NO FILES FOUND"
end
puts target
for i in 0..(results.size-1)
target_file.puts target + "," + results
end
end
target_file.close
end_time = Time.now
puts "Started: " + start_time.to_s
puts "Ended: " + end_time.to_s
end