autochomp?

D

Dave Thomas

What's the fastest way for someone to get the book?

Register for RubyConf by September 1st.. Oops. :)

The second fastest will be to order from us: Amazon will get the books
at best a couple of weeks after we're shipping, bookstore a week or so
later (I'm guessing--bookstores are a bit of a crapshoot, as they tend
or order at fixed intervals). Direct sales from O'Reilly is somewhere
between us and Amazon in terms of time.
What's the best deal for you? Is it better if people order directly
from your site?

The best deal for us is if you order directly from us: the middle tiers
in the book business extract a seriously large percentage on the way
through. Thanks for asking: we appreciate it.


I'll announce the pre-order information just as soon as we can get it
set up (there are lots of rules about accepting pre-order money by
credit card).


Cheers

Dave
 
K

Kristof Bastiaensen

Hi,

Kristof Bastiaensen said:
Hi,

[quoted text muted]

Not AFAIK. You can do

content = IO.readlines(file).each {|l| l.chomp!}

or (more efficient)

content = File.open(file) {|io| io.inject([]) {|ar,line| line.chomp!;
ar << line} }
I think in this case the first is more efficient than the second. Since
chomp! modifies the object inplace, no object gets created.

??? Please read again. The first is less efficient because it iterates
through the lines of the file twice and both use inplace modification
(#chomp!).

Kind regards

robert

It would seem logical, but benchmarking shows different:

#---- begin filetest.rb ----
require 'benchmark'

file = "testfile.txt"
Benchmark.bm(7) do |x|
x.report("readlines:") do
IO.readlines(file).each {|l| l.chomp!}
end

x.report("inject:") do
File.open(file) {|io| io.inject([]) {
|ar,line| line.chomp!; ar << line}}
end

x.report("foreach:") do
IO.foreach(file){|line| (lines||=[]) << line.chomp!}
end
end
#---- end filetest.rb -----

$ seq 10000 > testfile.txt
$ ruby filetest.rb
readlines: 0.050000 0.010000 0.060000 ( 0.085483)
inject: 0.130000 0.010000 0.140000 ( 0.172144)
foreach: 0.080000 0.000000 0.080000 ( 0.112690)

Regards,
KB
 
R

Robert Klemme

Dave Thomas said:
Why add just a single method when we can add an ENTIRE NEW FEATURE SET
and complicate the language at the same time?

Ah well, yes of course! Ruby is just too simple.
;-)

robert
 
R

Robert Klemme

Kristof Bastiaensen said:
Hi,

Kristof Bastiaensen said:
Hi,

On Thu, 02 Sep 2004 19:18:10 +0200, Robert Klemme wrote:


[quoted text muted]

Not AFAIK. You can do

content = IO.readlines(file).each {|l| l.chomp!}

or (more efficient)

content = File.open(file) {|io| io.inject([]) {|ar,line| line.chomp!;
ar << line} }


I think in this case the first is more efficient than the second. Since
chomp! modifies the object inplace, no object gets created.

??? Please read again. The first is less efficient because it iterates
through the lines of the file twice and both use inplace modification
(#chomp!).

Kind regards

robert

It would seem logical, but benchmarking shows different:

#---- begin filetest.rb ----
require 'benchmark'

file = "testfile.txt"
Benchmark.bm(7) do |x|
x.report("readlines:") do
IO.readlines(file).each {|l| l.chomp!}
end

x.report("inject:") do
File.open(file) {|io| io.inject([]) {
|ar,line| line.chomp!; ar << line}}
end

x.report("foreach:") do
IO.foreach(file){|line| (lines||=[]) << line.chomp!}
end
end
#---- end filetest.rb -----

$ seq 10000 > testfile.txt
$ ruby filetest.rb
readlines: 0.050000 0.010000 0.060000 ( 0.085483)
inject: 0.130000 0.010000 0.140000 ( 0.172144)
foreach: 0.080000 0.000000 0.080000 ( 0.112690)

That's really interesting. Did you also test with a real world file (a file
with longer lines presumably)? Interestingly enough, changing the block of
#inject does make it nearly as fast as readlines:

require 'benchmark'

file = "testfile.txt"
Benchmark.bm(7) do |x|
x.report("readlines:") do
IO.readlines(file).each {|l| l.chomp!}
end

x.report("inject:") do
File.open(file) {|io| io.inject([]) {
|ar,line| line.chomp!; ar << line}}
end

x.report("inject2:") do
File.open(file) {|io| io.inject([]) {
|ar,line| ar << line.chomp!}}
end


x.report("foreach:") do
IO.foreach(file){|line| (lines||=[]) << line.chomp!}
end

x.report("foreach2:") do
lines = []
IO.foreach(file){|line| lines << line.chomp!}
end
end



$ seq 10000 > testfile.txt
$ ruby filetest.rb
user system total real
readlines: 0.078000 0.000000 0.078000 ( 0.073000)
inject: 0.094000 0.015000 0.109000 ( 0.097000)
inject2: 0.047000 0.016000 0.063000 ( 0.072000)
foreach: 0.109000 0.000000 0.109000 ( 0.100000)
foreach2: 0.047000 0.000000 0.047000 ( 0.048000)

And with more lines:

$ seq 1000000 >| testfile.txt
$ ruby filetest.rb
user system total real
readlines: 5.625000 0.250000 5.875000 ( 5.907000)
inject: 13.985000 0.719000 14.704000 ( 20.221000)
inject2: 13.797000 0.046000 13.843000 ( 14.308000)
foreach: 13.000000 0.032000 13.032000 ( 13.408000)
foreach2: 7.281000 0.031000 7.312000 ( 7.455000)


Kind regards

robert
 
G

gabriele renzi

Robert Klemme ha scritto:
That's really interesting. Did you also test with a real world file (a file
with longer lines presumably)? Interestingly enough, changing the block of
#inject does make it nearly as fast as readlines:

did you (both) also considered that the undelying OS' caching strategy
may have some effect on this results?
 
R

Robert Klemme

gabriele renzi said:
Robert Klemme ha scritto:


did you (both) also considered that the undelying OS' caching strategy
may have some effect on this results?

Good point. Now:

require 'benchmark'
require 'stringio'

st = StringIO.new

100000.times { st << "foo " * rand(10) << "\n" }

Benchmark.bm(7) do |x|
st.seek 0

x.report("readlines:") do
st.readlines.each {|l| l.chomp!}
end

st.seek 0

x.report("inject:") do
st.inject([]) {|ar,line| line.chomp!; ar << line}
end

st.seek 0

x.report("inject2:") do
st.inject([]) {|ar,line| ar << line.chomp!}
end

st.seek 0

x.report("foreach:") do
st.each{|line| (lines||=[]) << line.chomp!}
end

st.seek 0

x.report("foreach2:") do
lines = []
st.each{|line| lines << line.chomp!}
end
end

Yields on my machine:

$ ruby filetest2.rb
user system total real
readlines: 0.031000 0.000000 0.031000 ( 0.032000)
inject: 0.078000 0.000000 0.078000 ( 0.077000)
inject2: 0.063000 0.000000 0.063000 ( 0.051000)
foreach: 0.062000 0.000000 0.062000 ( 0.073000)
foreach2: 0.094000 0.000000 0.094000 ( 0.088000)

$ ruby filetest2.rb
user system total real
readlines: 0.031000 0.000000 0.031000 ( 0.032000)
inject: 0.078000 0.000000 0.078000 ( 0.079000)
inject2: 0.047000 0.000000 0.047000 ( 0.051000)
foreach: 0.078000 0.000000 0.078000 ( 0.074000)
foreach2: 0.078000 0.000000 0.078000 ( 0.084000)

$ ruby filetest2.rb
user system total real
readlines: 0.281000 0.016000 0.297000 ( 0.303000)
inject: 0.844000 0.015000 0.859000 ( 0.855000)
inject2: 1.094000 0.000000 1.094000 ( 1.174000)
foreach: 0.906000 0.000000 0.906000 ( 0.905000)
foreach2: 0.375000 0.000000 0.375000 ( 0.393000)

$ ruby filetest2.rb
user system total real
readlines: 0.344000 0.032000 0.376000 ( 0.417000)
inject: 0.875000 0.000000 0.875000 ( 0.974000)
inject2: 0.891000 0.031000 0.922000 ( 0.943000)
foreach: 0.906000 0.016000 0.922000 ( 0.915000)
foreach2: 0.391000 0.000000 0.391000 ( 0.387000)

$ ruby filetest2.rb
user system total real
readlines: 0.312000 0.000000 0.312000 ( 0.304000)
inject: 0.860000 0.000000 0.860000 ( 0.858000)
inject2: 0.921000 0.015000 0.936000 ( 0.943000)
foreach: 0.922000 0.000000 0.922000 ( 0.916000)
foreach2: 0.391000 0.000000 0.391000 ( 0.384000)

Hm...

robert
 
G

George Ogata

Robert Klemme said:
Kristof Bastiaensen said:
Hi,

Hi,

On Thu, 02 Sep 2004 19:18:10 +0200, Robert Klemme wrote:


[quoted text muted]

Not AFAIK. You can do

content = IO.readlines(file).each {|l| l.chomp!}

or (more efficient)

content = File.open(file) {|io| io.inject([]) {|ar,line| line.chomp!;
ar << line} }


I think in this case the first is more efficient than the second. Since
chomp! modifies the object inplace, no object gets created.

??? Please read again. The first is less efficient because it iterates
through the lines of the file twice and both use inplace modification
(#chomp!).

Kind regards

robert

It would seem logical, but benchmarking shows different:

#---- begin filetest.rb ----
require 'benchmark'

file = "testfile.txt"
Benchmark.bm(7) do |x|
x.report("readlines:") do
IO.readlines(file).each {|l| l.chomp!}
end

x.report("inject:") do
File.open(file) {|io| io.inject([]) {
|ar,line| line.chomp!; ar << line}}
end

x.report("foreach:") do
IO.foreach(file){|line| (lines||=[]) << line.chomp!}
end
end
#---- end filetest.rb -----

$ seq 10000 > testfile.txt
$ ruby filetest.rb
readlines: 0.050000 0.010000 0.060000 ( 0.085483)
inject: 0.130000 0.010000 0.140000 ( 0.172144)
foreach: 0.080000 0.000000 0.080000 ( 0.112690)

That's really interesting. Did you also test with a real world file (a file
with longer lines presumably)? Interestingly enough, changing the block of
#inject does make it nearly as fast as readlines:

What's the interesting bit?

"readlines" is probably quicker than "inject" since there are less
method calls. In "inject" and "foreach" (and "inject2" and "foreach2"
in your later post), there are two method calls in the block: chomp!
and <<. In "readlines" there's only 1. All that pushing and popping
of ruby stack frames, method name looking up... it all adds up, more
so, evidently, than walking the array an extra time.

(Also note that "foreach" won't work right if chomp! returns nil, which
it will on the last line if there's no newline @ EOF.)
 
D

Dave Thomas

I don't think it's going to be released electronically, at
least at first.

We'll be selling the PDF version. We're currently planning on releasing
it when the print version becomes available.

Cheers

Dave
 
R

Robert Klemme

#---- end filetest.rb -----
What's the interesting bit?

The impact of the two statements vs. one statement in the block plus that -
once again - intuition is proven wrong. :)
"readlines" is probably quicker than "inject" since there are less
method calls. In "inject" and "foreach" (and "inject2" and "foreach2"
in your later post), there are two method calls in the block: chomp!
and <<. In "readlines" there's only 1. All that pushing and popping
of ruby stack frames, method name looking up... it all adds up, more
so, evidently, than walking the array an extra time.

(Also note that "foreach" won't work right if chomp! returns nil, which
it will on the last line if there's no newline @ EOF.)

Yes, that was exactly the reason why I first used "line.chomp!; ar << line".
But I wanted to make it comparable so I added the other variant.

Kind regards

robert
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,156
Messages
2,570,878
Members
47,413
Latest member
KeiraLight

Latest Threads

Top