N
Notmy Realname
I'm wondering if anyone knows much about Ruby's efficiency with IO#read.
Specifically, I'm wondering about libraries I might use to speed up disk
reads.
To see what I mean, here's some test code that iterates over an
11-megabyte file. All it does is call IO#read on a number of bytes (set
on the command-line) over the entire file, and times it.
#!/usr/bin/env ruby
# readspeed.rb
buf_size = ARGV[0].to_i
fd = File.open("some.txt")
start = Time.now
while (fd.read(buf_size))
end
stop = Time.now
puts (stop - start).to_s + " seconds"
#--- EOF
Running this on my system yields:
$ ruby readspeed.rb 4096
0.014 seconds
$ ruby readspeed.rb 1
7.547 seconds
Obviously a big difference! This is a simplified version of the test I
was actually running, which tried to account for the increased amount of
overhead when calling with 1 byte at a time. There's still an
order-of-magnitude difference between the two...reading one byte at a
time is *slow*, slow enough to bog down an entire program.
I know this is supposed to be the case with unbuffered input, such as
the C standard library "read", but isn't IO#read supposed to be
buffered? What's causing this slowdown? I'm writing a class that will
hopefully speed up smaller reads from binary files by explicitly caching
data in memory, but I'm wondering if there are any pre-built (i.e.,
tested) solutions that Ruby programmers might be using.
Specifically, I'm wondering about libraries I might use to speed up disk
reads.
To see what I mean, here's some test code that iterates over an
11-megabyte file. All it does is call IO#read on a number of bytes (set
on the command-line) over the entire file, and times it.
#!/usr/bin/env ruby
# readspeed.rb
buf_size = ARGV[0].to_i
fd = File.open("some.txt")
start = Time.now
while (fd.read(buf_size))
end
stop = Time.now
puts (stop - start).to_s + " seconds"
#--- EOF
Running this on my system yields:
$ ruby readspeed.rb 4096
0.014 seconds
$ ruby readspeed.rb 1
7.547 seconds
Obviously a big difference! This is a simplified version of the test I
was actually running, which tried to account for the increased amount of
overhead when calling with 1 byte at a time. There's still an
order-of-magnitude difference between the two...reading one byte at a
time is *slow*, slow enough to bog down an entire program.
I know this is supposed to be the case with unbuffered input, such as
the C standard library "read", but isn't IO#read supposed to be
buffered? What's causing this slowdown? I'm writing a class that will
hopefully speed up smaller reads from binary files by explicitly caching
data in memory, but I'm wondering if there are any pre-built (i.e.,
tested) solutions that Ruby programmers might be using.