Help me understand why the Ruby block is slower than without

T

ts

J> Something is fishy there, for it works just fine on my own Mac:

Try it with

J> Neo:~/Desktop$ cat wordlist
J> one
J> two
J> three
J> 0123456789

01234567890123456789

J> five
J> 0123456789
J> Neo:~/Desktop$ cat tens.rb


Guy Decoux
 
M

Mike Stok

James:
This code doesn't work on my Mac. I do have a version that uses the
file block and each/foreach above, but I'm suspecting that when the
string becomes an array after the split something's breaking down as I
get words of all sizes out???

Doesn't James Gray's code print out words which contain exactly 11
different letters e.g.

abbreviations - 13 characters + \n, but because it wasn't checked for
size before splitting this boils down to 10 different characters.

irb(main):001:0> s = 'abbreviations'
=> "abbreviations"
irb(main):002:0> s.split('').uniq
=> ["a", "b", "r", "e", "v", "i", "t", "o", "n", "s"]
irb(main):003:0> s.split('').uniq.size
=> 10


Interesting. I crudely benchmarked this (using time on my mac):

#!/usr/bin/env ruby

File.foreach("K6wordlist.txt") do |word|
# puts word if word.size==11 && word.split(//).uniq.size == 11
puts word if word.length == 11 and word.chomp.split
(//).uniq.size == 10
# puts word if word.length == 11 and not word =~ /(.).*\1/
end

and then ran each of the three sending output to /dev/null (after
checking that they all worked the same on my test file. In order:

real 0m0.347s
user 0m0.294s
sys 0m0.017s

real 0m0.334s
user 0m0.288s
sys 0m0.018s

real 0m0.177s
user 0m0.137s
sys 0m0.015s

There may be interesting behaviour if the last line in the file
doesn't have a trailing \n, I would probably go for something more like

File.foreach("K6wordlist.txt") do |word|
word.chomp!
puts word if word.length == 10 and not word =~ /(.).*\1/
end

(timing intentionally omitted :)

Mike

--

Mike Stok <[email protected]>
http://www.stok.ca/~mike/

The "`Stok' disclaimers" apply.
 
J

James Edward Gray II

J> Something is fishy there, for it works just fine on my own Mac:

Try it with

J> Neo:~/Desktop$ cat wordlist
J> one
J> two
J> three
J> 0123456789

01234567890123456789

Ah, yes, duh. Thanks Guy.

Obviously, you do need a check for word size in there, as others have
used.

My point was actually to show that foreach() is open, read loop, and
close combined though.

James Edward Gray II
 
S

Stephen Waits

Well, I'm pretty darn sure you are in the minority on that one: ;)

Well, in this case, being in the majority doesn't necessarily make
you right. Like many things, I think we've got several shades of
gray here.. err... Gray? :) I'm all for not prematurely
optimizing. But in this case, Alan is attempting to better
understand Ruby's inner-workings which is a perfectly fine example of
playing with performance.

Additionally, the "no premature optimization ideal" is often taken a
little too far. I intentionally call it an "ideal". I work on video
games. A good portion of our job is optimization. If we didn't do
*some* premature optimization, we'd be in bad shape.

--Steve
 
G

Gary Wright

Additionally, the "no premature optimization ideal" is often taken
a little too far. I intentionally call it an "ideal". I work on
video games. A good portion of our job is optimization. If we
didn't do *some* premature optimization, we'd be in bad shape.

It isn't really premature optimization if you are dealing with a
known problem domain and you already have a reasonable
sense of the performance issues that you will face. It is
nonsensical to throw away the knowledge you've gained from past
experience in the matter.

But when crafting new software, where you don't have any
particular knowledge of the performance issues, it
makes more sense to get something working correctly and in
a timely manner than to make premature assumptions about the
bottlenecks.
 
A

Alan Burch

James said:
Well, I'm pretty darn sure you are in the minority on that one: ;)

http://www.google.com/search?q="premature+optimization"

James Edward Gray II
James:
I'm going to make a leap of faith here and guess that we're in agreement
on this one--it's just a difference in where we are in understanding the
language. I'm just learning it and need to run the profiler, debugger,
take timing measurements, and read lots of examples to fully understand
it still. I wouldn't peddle my (lack of) Ruby skills to any client at
this time, but it's by taking these steps that I will become a good Ruby
software developer. Others may be able to make the transition from a
developer who can make the code work to one who is actually good at it
(accurate, maintainable, resource appropriate code done quickly) without
taking these steps, but I cannot.

From your google search I have:
http://www.cookcomputing.com/blog/archives/000084.html

Premature Optimization....suggests the famous quote originating from
Tony Hoare and restated by Donald Knuth: "Premature optimization is the
root of all evil". I've always thought this quote has all too often led
software designers into serious mistakes because it has been applied to
a different problem domain to what was intended.

The full version of the quote is "We should forget about small
efficiencies, say about 97% of the time: premature optimization is the
root of all evil." and I agree with this. Its usually not worth spending
a lot of time micro-optimizing code before its obvious where the
performance bottlenecks are. But, conversely, when designing software at
a system level, performance issues should always be considered from the
beginning. A good software developer will do this automatically, having
developed a feel for where performance issues will cause problems. An
inexperienced developer will not bother, misguidedly believing that a
bit of fine tuning at a later stage will fix any problems.

Knowing the language well enough, will cause me as an experienced
software developer to automatically build the best code, while the
inexperienced developer will continue to write code that gives people
like me a well above average income :)

Take care,
Alan
 
E

Eric Young

Benjohn said:
!? :) How on earth does that work? Every time I think I've sort of got
the hang of regexp, they spring something new on me.

I was also going to ask why everyone was doing "split( // )" instead of
"split( '' )"?

- oooh, coffee's ready...

Cheers,
Benjohn


I was going to ask why everyone is not doing
w.unpack("C*").uniq! == nil
which seems faster than split(//) but slower than /(.).*\1/

3.2s puts l if l !~ /(.).*\1/
3.6s puts l unless l.unpack("C*").uniq!
7.3s puts l if l.split(//).uniq.size == 11

perhaps I deal with C code to much, but unpack seems a better String ->
array mechanism.

eric (rather new to ruby)
*** Free account sponsored by SecureIX.com ***
*** Encrypt your Internet usage with a free VPN account from http://www.SecureIX.com ***
 
W

William James

Alan said:
I just wrote my first Ruby script. I'm an experienced C and perl
programmer, so please, if it looks too much like these languages and not
Ruby, let me know. I've got a 100K word list (Linux dictionary) on my
Mac and am opening it then looking for any words that are exactly 10
letters long with no letters repeating ('profligate\n') == 11 is a
match. After I wrote my first version I did some playing. I first saw
that the array class mixed in enumerable and that I could use the to_a
call from there, but a quick check using -r profile showed that my
original call to split was a much quicker way to convert from a string
to an array. I then tried putting the File.open in a block and found
that this was much slower, even if I subtract out the time for the open,
which I assume is an error in how the profile is counting total time.

Here's the faster version:

f = File.open("./words")
begin
while f.gets
if $_.length == 11
ar = $_.split(//)
if ar.uniq! == nil
print "#{ar.to_s}"
end
end
end
rescue EOFError
f.close
end

And here's the slower block version:

File.open("./words") { |f|
while f.gets
if $_.length == 11
ar = $_.split(//)
if ar.uniq! == nil
print "#{ar.to_s}"
end
end
end
}

IO.foreach('words'){|s|puts s if s=~/(?!.*(.).*\1)^.{10}$/}
 
D

dblack

Hi --

Well, I'm pretty darn sure you are in the minority on that one: ;)

http://www.google.com/search?q="premature+optimization"

But one doesn't want to suppress one's knowledge. When I write a
program, if I happen to know that, say:

puts a

will run faster than:

eval "puts #{97.chr}"

then I can't really be blamed for using that knowledge, just because
the knowledge pertains to speed.

In other words, I don't think that avoiding premature optimization
means that one should never knowingly take speed into account when
choosing what to put in one's code. In fact, I would find it really
difficult to do that, because I wouldn't know how to choose among the
various alternatives available in a way that paid no attention to
execution speed.

This isn't an argument in favor of premature optimization; rather, I'm
suggesting that having and using some rough-cut knowledge of execution
speed (as between, say, split and unpack, or something like that)
isn't premature :) Nor is it optimization; it's really melioration.


David

--
David A. Black ([email protected])
Ruby Power and Light, LLC (http://www.rubypowerandlight.com)

"Ruby for Rails" chapters now available
from Manning Early Access Program! http://www.manning.com/books/black
 
R

Robert Klemme

Alan Burch said:
Others:
Any input as to why it runs slower inside the file block? Have I
overlooked something simple?

I'm not sure whether this question has been answered yet. It's probably
slower because you do not close the file handle in your first version (more
precisely, you close it only if there is an error, which doesn't happen the
way you did it).

f = File.open("./words")
begin
...
rescue EOFError
f.close
end

If you wanted it to be equivalent with the block version "f.close" would
have to go to an "ensure" section.
Generally he block form is preferred because it closes the file handle.
Your first version didn't do that.

This is probably what I'd do

IO.foreach("wordlist") do |line|
line.scan /\b\w{11}\b/ do |word|
puts word unless /(.).*\1/ =~ word
end
end

The first regexp finds all words with length 11 and the second excludes all
words that contain repeting characters. HTH

Kind regards

robert
 
A

Alan Burch

This isn't an argument in favor of premature optimization; rather, I'm
suggesting that having and using some rough-cut knowledge of execution
speed (as between, say, split and unpack, or something like that)
isn't premature :) Nor is it optimization; it's really melioration.


David
Thanks David and others who stated what I wanted to say better than I
did. Again, I don't think James and I disagree. I concede that
premature optimization is not a good thing, but that's not what I was
trying to do here. I'm trying to understand Ruby to the level that I
understand C. To me that means I know exactly why I use every call,
every construct. When I'm able to do this, I'll know Ruby the way I
want to and I'll be able to use Ruby to accomplish non-trivial tasks.
Knowing to use a faster or more easily understood construct at the time
of coding is what one would expect any experience programmer to do.
Trying to optimze beyond that from the beginning, is silly and
wrong--just as James pointed out.

Yes, Robert, I assumed that the gets threw an EOFError when it found
EOF, I just haven't read and understood all I need to yet. I really
appreciate all the input on this thread. It's proved to me that the
Ruby community is everything good that is being said about it on the
web.

That said, using the block form is trivially slower, according to the
time calls that I'm making on my Mac, no matter which solution. I'm not
really concerned about that, and agree with Robert's statement above
about the block being preferred. To not use the block form would be a
prime example of premature optimization.

Thanks again for all the input,
Alan
 
R

Randy Kramer

This isn't an argument in favor of premature optimization; rather, I'm
suggesting that having and using some rough-cut knowledge of execution
speed (as between, say, split and unpack, or something like that)
isn't premature :) Nor is it optimization; it's really melioration.

Thank you!

Randy Kramer
 
S

semmons99

File.open( "./words" ).readlines.each do |line|

print line if line.length == 11 and line.split.uniq

end


testing length 11 first makes it so that we don't split the string
unless needed.
 
J

James Edward Gray II

File.open( "./words" ).readlines.each do |line|

print line if line.length == 11 and line.split.uniq

end

That can be shortened to:

File.readlines("words").each do |line|
# ...
end

I think that's more Ruby-like.

Also, wouldn't line.split.uniq always return a true value?

James Edward Gray II
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,202
Messages
2,571,057
Members
47,667
Latest member
DaniloB294

Latest Threads

Top