Runtime disparity - Same program in Perl and Ruby

K

Kaldrenon

Hi all (this is going to comp.lang.ruby and comp.lang.perl.misc),

The other day I wrote a basic program in Perl, and the following day I
rewrote it in Ruby. I'm curious about the differences in runtime of
the two versions, though.

Let me start by describing the program (I'll append full code for both
to the end): it reads in a list of alphanumeric codes from file
(format is [\w\d\S]+_\d{3}, but they're separated by a comma in the
file), then creates a hash with those codes as keys and empty arrays
as values. After the hash is built, the program traverses through a
given directory and its subdirectories (using File::Find in Perl and
Find.find in Ruby) and checks each file against the hash of codes
(with a few regexps and conditions to prevent lots of unnecessary
looping), adding it to the array for a code if the code is found in
the filename. Finally, it writes the contents of the hash to a .csv
file in the format CODE,PATH for each match.

Now, if it were the case that Ruby or Perl were simply -slower- than
the other, I wouldn't be bothering you folks. But here's where it gets
a little unusual: the number of elements in the code list has a
noticeable impact on the run time of the Ruby version, but far less on
the Perl version. I ran each one a few times with code lists of
various sizes, and they both print start/stop timestamps at the end,
so I collected the data:

Entries | Seconds
Ruby
4 | 153
64 | 133
256 | 222
512 | 327
1024 | 562
1500 | 683
Perl
4 | 291
64 | 258
256 | 253
512 | 248
1024 | 353
1500 | 363

Ruby runs faster for low numbers of entries, as you can see, but once
you get up to 1500, Ruby's time has more than tripled while Perl's
time has gone up about a fifth.

I've looked over the code for both versions several times, and I don't
see any significant differences. The only important feature the Ruby
version lacks is the sort() before writing the file.

I'd really appreciate any insight into why Ruby's runtime grows so
readily and Perl's does not.

Code of both versions follows.

Thanks,
Andrew Fallows

use File::Find;
use strict;
use warnings;
my $code;
my $type;
my %filecodes = ();
my $start_time = "Started: " . localtime();
$| = 1; #Enables flush on print.
$\ = "\n"; #Automatic newlines on print
open(ITEM_LIST, "(path)") or die "Error";

# This loop builds a hash whose keys are the codes/types from file
# and whose values are references to empty arrays
while(my $item = <ITEM_LIST>)
{
$item =~ s/,/_/;
$item =~ s/\n//g;
print $item;
my @files = ();
$filecodes{$item} = \@files;
}
print "Hash built";

# Uses File::Find to iterate over the entire subdirectory
find(\&file_seek, "(path)");

# The searching portion: gets each location from File::Find, then
compares it
# to all the targets. If there is a match, prints a message and adds
that file
# to the related array.
sub file_seek
{
my $file = $_;
# Kicks out if the file in question is not of the necessary format
if(!(-f $file) || !($file =~ /^[\d\w\S]+_\d{3}/)){ return; }

foreach my $target (keys(%filecodes))
{
# If the file name contains the code sought
if($file =~ /$target/)
{
print "found $file in $File::Find::dir";

# Jumps out if the list for this code already contains this file.
for (0..@{$filecodes{$target}})
{
if(defined(${$filecodes{$target}}[$_])
&& $File::Find::name eq ${$filecodes{$target}}[$_]) {return; }
}
push(@{$filecodes{$target}}, $File::Find::name);
}
}
}

# After the whole directory has been searched, prints each key and all
# values found for it.
open(RESULTS, "> (path)") or die "Error 2";
foreach my$target ( sort(keys( %filecodes )))
{
my @results = @{$filecodes{$target}};
if(@results == 0) { push(@results, "NO FILES FOUND") }
print $target;
foreach (@results)
{
print RESULTS "$target,$_";
print "\t$_";
}
}
close RESULTS;
print $start_time;
print "Ended: " . localtime();

Ruby:

class FileSearcher
$\ = "\n"
in_file = File.open( "(path)","r")
start_time = Time.now
filecodes = Hash.new
# This loop reads all the item codes in from file and then
# adds them to a hash, each linked to its own empty array
while item = in_file.gets
item = item.gsub(',','_')
item = item.gsub("\n","")
files = Array.new
files.push("empty");
filecodes[item]= files
end
in_file.close

# The searching portion: looks at each file/location, then compares
it
# to all the targets. If there is a match, prints a message and
adds
# that file to the related array.
require "Find"
require 'ftools'
Find.find("(path)") do |file|
if !(FileTest.file?(file)) || !(File.basename(file) =~ /^[\d\w\S]+_
\d{3}/)
next
else
filecodes.each_key do |target|
if(file =~ /#{target}/)
puts "found " + target + " at " + file
$stdout.flush
fail = 0
for i in 0..filecodes[target].size-1 do
if(filecodes[target] != "empty" &&
File.basename(file) == File.basename(filecodes[target]
))
fail = 1
break
end
end
if fail == 0
if filecodes[target][0] == "empty"
filecodes[target][0] = file
else
filecodes[target].push(file)
end
end
end
end
end
end

# After the whole directory has been searched, prints each key and
all
# values found for it to a file called Ruby_results.csv.
target_file = File.open("(path)","w")
filecodes.each_key do |target|
results = filecodes[target]
if results[0] == "empty"
results[0] = "NO FILES FOUND"
end
puts target
for i in 0..(results.size-1)
target_file.puts target + "," + results
end
end
target_file.close
end_time = Time.now
puts "Started: " + start_time.to_s
puts "Ended: " + end_time.to_s
end
 
J

John W. Krahn

Kaldrenon said:
Hi all (this is going to comp.lang.ruby and comp.lang.perl.misc),

The other day I wrote a basic program in Perl,

Did you write it in basic or in Perl? :)
and the following day I
rewrote it in Ruby. I'm curious about the differences in runtime of
the two versions, though.

Let me start by describing the program (I'll append full code for both
to the end): it reads in a list of alphanumeric codes from file
(format is [\w\d\S]+_\d{3},

The character class \d is a subset of \w and they are both a subset of \S so
your expression could be simplified to:

\S+_\d{3}

but they're separated by a comma in the
file), then creates a hash with those codes as keys and empty arrays
as values. After the hash is built, the program traverses through a
given directory and its subdirectories (using File::Find in Perl and
Find.find in Ruby) and checks each file against the hash of codes
(with a few regexps and conditions to prevent lots of unnecessary
looping), adding it to the array for a code if the code is found in
the filename. Finally, it writes the contents of the hash to a .csv
file in the format CODE,PATH for each match.

Now, if it were the case that Ruby or Perl were simply -slower- than
the other, I wouldn't be bothering you folks. But here's where it gets
a little unusual: the number of elements in the code list has a
noticeable impact on the run time of the Ruby version, but far less on
the Perl version. I ran each one a few times with code lists of
various sizes, and they both print start/stop timestamps at the end,
so I collected the data:

Entries | Seconds
Ruby
4 | 153
64 | 133
256 | 222
512 | 327
1024 | 562
1500 | 683
Perl
4 | 291
64 | 258
256 | 253
512 | 248
1024 | 353
1500 | 363

Ruby runs faster for low numbers of entries, as you can see, but once
you get up to 1500, Ruby's time has more than tripled while Perl's
time has gone up about a fifth.

I've looked over the code for both versions several times, and I don't
see any significant differences. The only important feature the Ruby
version lacks is the sort() before writing the file.

I'd really appreciate any insight into why Ruby's runtime grows so
readily and Perl's does not.

Did you compare the output of the Perl and Ruby versions to see if there were
any differences?

Code of both versions follows.

Thanks,
Andrew Fallows

use File::Find;
use strict;
use warnings;
my $code;
my $type;
my %filecodes = ();
my $start_time = "Started: " . localtime();
$| = 1; #Enables flush on print.
$\ = "\n"; #Automatic newlines on print
open(ITEM_LIST, "(path)") or die "Error";

You should include the $! (or $^E) variable in the error message so you know
why it failed.

# This loop builds a hash whose keys are the codes/types from file
# and whose values are references to empty arrays
while(my $item = <ITEM_LIST>)
{
$item =~ s/,/_/;
$item =~ s/\n//g;

That is usually done with chomp:

chomp $item;
print $item;
my @files = ();
$filecodes{$item} = \@files;

You don't need to create an array, just assign an anonymous array:

$filecodes{$item} = [];

}
print "Hash built";

# Uses File::Find to iterate over the entire subdirectory
find(\&file_seek, "(path)");

# The searching portion: gets each location from File::Find, then
compares it
# to all the targets. If there is a match, prints a message and adds
that file
# to the related array.
sub file_seek
{
my $file = $_;
# Kicks out if the file in question is not of the necessary format
if(!(-f $file) || !($file =~ /^[\d\w\S]+_\d{3}/)){ return; }

Using $_ instead of the copy in $file:

return if !-f || !/^\S+_\d{3}/;

foreach my $target (keys(%filecodes))
{
# If the file name contains the code sought
if($file =~ /$target/)

Because $target may contain some regular expression meta-characters you should
quotemeta it:

if ( $file =~ /\Q$target/ )

Or use the index function:

if ( 0 <= index $file, $target )

{
print "found $file in $File::Find::dir";

# Jumps out if the list for this code already contains this file.
for (0..@{$filecodes{$target}})

You have an off-by-one error:

for (0..$#{$filecodes{$target}})
{
if(defined(${$filecodes{$target}}[$_])
&& $File::Find::name eq ${$filecodes{$target}}[$_]) {return; }

${$filecodes{$target}}[$_] can be written more simply as $filecodes{$target}[$_].


But you don't really need to use an array index:

for ( @{$filecodes{$target}} )
{
return if defined() && $File::Find::name eq $_;

(Or you could use a Hash of Hashes.)

}
push(@{$filecodes{$target}}, $File::Find::name);
}
}
}

# After the whole directory has been searched, prints each key and all
# values found for it.
open(RESULTS, "> (path)") or die "Error 2";

You should include the $! (or $^E) variable in the error message so you know
why it failed.

foreach my$target ( sort(keys( %filecodes )))
{
my @results = @{$filecodes{$target}};

Do you really need to make a copy of the array?

if(@results == 0) { push(@results, "NO FILES FOUND") }

If the array is empty you can just assign to it:

@results = 'NO FILES FOUND' unless @results;

print $target;
foreach (@results)
{
print RESULTS "$target,$_";
print "\t$_";
}
}
close RESULTS;
print $start_time;
print "Ended: " . localtime();



John
 
K

Kaldrenon

Thanks for the reply, John. There are a number of good tips in your
reply for making my code more "Perl"-y. I don't think many (if any)
will actually change the way the program runs, though, will they? A
lot of the things I did work, but are styled more like Java, the
language I use most. For example, I know I can just use $_ in sub
file_seek, but I prefer to give my vars names that make more sense at
a glance. But I'll keep all of your advice in mind.

Thanks again,
Andrew
 
E

Emmanuel Oga

Kaldrenon said:
Hi all (this is going to comp.lang.ruby and comp.lang.perl.misc),

The other day I wrote a basic program in Perl, and the following day I
rewrote it in Ruby. I'm curious about the differences in runtime of
the two versions, though.

It would help to have a view into the input file an to know what your
program should do. Im sure there has to be another way to code your
problem, and im pretty sure that constructs as the following can and
should be avoided:

Find.find("(path)") do |file|
if !(FileTest.file?(file)) || !(File.basename(file) =~ /^[\d\w\S]+_
\d{3}/)
next
else
filecodes.each_key do |target|
if(file =~ /#{target}/)
puts "found " + target + " at " + file
$stdout.flush
fail = 0
for i in 0..filecodes[target].size-1 do
if(filecodes[target] != "empty" &&
File.basename(file) == File.basename(filecodes[target]
))
fail = 1
break
end
end
if fail == 0
if filecodes[target][0] == "empty"
filecodes[target][0] = file
else
filecodes[target].push(file)
end
end
end
end
end
end

This is both difficult to read and error prone.
 
M

Michele Dondi

reply for making my code more "Perl"-y. I don't think many (if any)
will actually change the way the program runs, though, will they? A

Just try.
lot of the things I did work, but are styled more like Java, the
language I use most. For example, I know I can just use $_ in sub
file_seek, but I prefer to give my vars names that make more sense at
a glance. But I'll keep all of your advice in mind.

$_ is a pronoun and it makes sense in short enough phrases. If you
have a C<for> loop with a two or three lines block (or even a C<for>
modifier) then use it. If it's 100 lines long (probably not a good
idea in its own) then use an explicit name.


Michele
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,190
Members
46,740
Latest member
AdolphBig6

Latest Threads

Top