Performance improvement possible?

P

Philip Rhoades

Ellie,


Eleanor said:
A few things:

- you left a line in the loop:

File.open( output_filename, 'w' ) do |fout|

which should be deleted

Paste in haste, repent at leisure ;) I've corrected it to read the
way it appeared in my head when I was looking at it:
http://pastie.org/222765
- I originally used:

stats = [] lines =
File.readlines(input_filename, 'r')

but found that reading the whole file (8871 lines) and then
processing the array was inefficient so I got rid of the array

- using:

stats << stats06

If you buffer it as a single read and then work through the file in
memory it guarantees that you minimise the IO costs of reading. I am
of course assuming that even at 8871 lines your file is much smaller
than your available RAM :)


When I did the profile, the array processing was the biggest hit - when
I got rid of the array, I almost halved the time! Ruby arrays are
pretty cool but I think you pay for the convenience . .

Doing the file write this way offloads making it efficient to the
Ruby runtime. The file.fsync call will cost you in terms of runtime
performance, but it ensures that the data is flushed to disk before
moving on to the next file which for a large data processing job is
often desirable.


See my other note but it didn't make much difference.

Personally I wouldn't store the results in separate
files but combine them into a single file (possibly even a database),
however I don't know how that would fit with your use case.


There is more post processing using R and for casual inspection it is
convenient to be able to find data according to it's file name. It
might still be possible to have fewer, larger files - I might ask
another question about that (basically I have paste the single column
output of this stuff into 32 column arrays). I have tried DBs for
storing output form the main simulation program when it was all in C/C++
and it was quite slow so I went back to text files . .

As to the file.puts *stats, there's no guarantee this approach will
be efficient but compared to doing something like:

File.open(output_filename, "a") do |file| stats.each { |stat|
file.puts stat } end

it feels more natural to the problem domain.


Yes, it is was good to find out about this alternative.

Another alternative would be:

File.open(output_filename, "a") do |file| file.puts stats.join("\n")
end

but that's likely to use more memory as first an in-memory string
will be created, then this will be passed to Ruby's IO code. For the
size of file you're working with that's not likely to be a problem.

I've a suspicion that your overall algorithm can also be greatly
improved.


I'm sure you are right about that!

In particular the fact that you're forming a cubic array and then
manipulating it raises warning bells and suggests you'll have data
sparsity issues which could be handled in a different way, but that
would require a deeper understanding of your data.


The cubic array was just a direct translation of the C pointer setup I
had - basically it is a rectangular grid of sub-populations each with an
array of allele lengths.

Thanks again,

Regards,

Phil.
--
Philip Rhoades

Pricom Pty Limited (ACN 003 252 275 ABN 91 003 252 275)
GPO Box 3411
Sydney NSW 2001
Australia
E-mail: (e-mail address removed)
 
M

Marc Heiler

Is it only me or is that profiling tiresome and arduous?
:)

When I did the profile, the array processing was the biggest hit - when
I got rid of the array, I almost halved the time! Ruby arrays are
pretty cool but I think you pay for the convenience . .

But interesting to read that arrays were one performance neck.
 
E

Eleanor McHugh

I'm surprised at the behaviour you're seeing so I ran a quick
benchmark on what I believe to be an equivalent array handling problem
with my laptop (OS X, Ruby 1.8.6-p111, Core 2 Duo 2GHz, 2GB RAM):

require 'benchmark'
include Benchmark
bm(6) do |y|
y.report("appending") { x = []; (1..20000000).each { |i| x << i } }
y.report("nested creation") { x = Array.new(1000)
{ Array.new(1000) { Array.new(20, 0) } } }
y.report("unrolling") { x.flatten.flatten.length }
end

user system total real
appending 7.350000 0.100000 7.450000 ( 7.685458)
nested creation 0.850000 0.000000 0.850000 ( 0.851341)
unrolling 15.240000 0.280000 15.520000 ( 20.927189)

which in each case is manipulating an array of 20,000,000 elements.
This would be equivalent to processing 32000 files where each
contained 625 occurrences of the 'k=' tag.
See my other note but it didn't make much difference.

Regardless of what the profiler says, any program that's opening 32000
files and writing to them is a prime candidate for IO optimisation...

Assuming I want to write 625 single-digit numbers to 32000 files the
difference with file.puts *stats compared to individual writes is
noticeable on my setup and would more than outweigh the cost of
appending these numbers to an array. A slower processor might behave
differently, as might a system with a higher-performance drive than my
laptop's 5400RPM SATA HDD.

x = Array.new(675, 1)
bm(6) do |y|
y.report("direct") { 32000.times { File.open("test.dat", "a") { |
file| file.puts *x } } }
y.report("enumerated") { 32000.times { File.open("test.dat", "a")
{ |file| x.each { |i| file.puts i } } } }
y.report("direct+fsync") { 32000.times { File.open("test.dat", "a")
{ |file| file.sync = false; file.puts *x; file.fsync } } }
y.report("enum+fsync") { 32000.times { File.open("test.dat", "a")
{ |file| file.sync = false; x.each { |i| file.puts i }; file.fsync } } }

end
user system total real
direct 25.160000 2.100000 27.260000 ( 38.775792)
enumerated 34.200000 2.170000 36.370000 ( 55.614969)
direct+fsync 25.300000 2.250000 27.550000 ( 60.225069)
enum+fsync 34.520000 2.420000 36.940000 ( 98.627757)

I'd be interested in equivalent benchmarking figures for your
configuration and also an estimate of how close to the actual case
this 32000x625 assumption is to your actual use case as it would help
me make sense of the doubled execution time you're seeing with my
array modification.


One option is to use a Hash as a Sparse Matrix by using Arrays as keys
for indexing:

stats = Hash.new(0)
lines.each |line|
...
v = stats06
stats[[x,y]] = v if v != 0
...
end
File.open(output_filename, "a") do |file|
file.puts *stats.values
end

which should pay off if the number of interesting results is much less
than the cubic search space. Obviously if you need the values to keep
a specific ordering you'd need additional logic...


Ellie

Eleanor McHugh
Games With Brains
http://slides.games-with-brains.net
 
M

M. Edward (Ed) Borasky

Eleanor said:
Regardless of what the profiler says, any program that's opening 32000
files and writing to them is a prime candidate for IO optimisation...

Not to mention directory fragmentation in many filesystems. :) I'd be
looking at an ORM and SQLite rather than attempting something involving
32,000 files.
 
P

Philip Rhoades

Ellie,


Eleanor said:
I'm surprised at the behaviour you're seeing so I ran a quick benchmark
on what I believe to be an equivalent array handling problem with my
laptop (OS X, Ruby 1.8.6-p111, Core 2 Duo 2GHz, 2GB RAM):

require 'benchmark'
include Benchmark
bm(6) do |y|
y.report("appending") { x = []; (1..20000000).each { |i| x << i } }
y.report("nested creation") { x = Array.new(1000) { Array.new(1000) {
Array.new(20, 0) } } }
y.report("unrolling") { x.flatten.flatten.length }
end

user system total real
appending 7.350000 0.100000 7.450000 ( 7.685458)
nested creation 0.850000 0.000000 0.850000 ( 0.851341)
unrolling 15.240000 0.280000 15.520000 ( 20.927189)

which in each case is manipulating an array of 20,000,000 elements. This
would be equivalent to processing 32000 files where each contained 625
occurrences of the 'k=' tag.


I haven't used BM yet but I can't see how doing a benchmark on an array
helps? When I had an array it was slow, when I ditched the array it was
fast?

Regardless of what the profiler says, any program that's opening 32000
files and writing to them is a prime candidate for IO optimisation...


Yes, you have started me thinking about this and the following
processing scripts now . .

Assuming I want to write 625 single-digit numbers to 32000 files the
difference with file.puts *stats compared to individual writes is
noticeable on my setup and would more than outweigh the cost of
appending these numbers to an array. A slower processor might behave
differently, as might a system with a higher-performance drive than my
laptop's 5400RPM SATA HDD.

x = Array.new(675, 1)
bm(6) do |y|
y.report("direct") { 32000.times { File.open("test.dat", "a") { |file|
file.puts *x } } }
y.report("enumerated") { 32000.times { File.open("test.dat", "a") {
|file| x.each { |i| file.puts i } } } }
y.report("direct+fsync") { 32000.times { File.open("test.dat", "a") {
|file| file.sync = false; file.puts *x; file.fsync } } }
y.report("enum+fsync") { 32000.times { File.open("test.dat", "a") {
|file| file.sync = false; x.each { |i| file.puts i }; file.fsync } } }

end
user system total real
direct 25.160000 2.100000 27.260000 ( 38.775792)
enumerated 34.200000 2.170000 36.370000 ( 55.614969)
direct+fsync 25.300000 2.250000 27.550000 ( 60.225069)
enum+fsync 34.520000 2.420000 36.940000 ( 98.627757)

I'd be interested in equivalent benchmarking figures for your
configuration and also an estimate of how close to the actual case this
32000x625 assumption is to your actual use case as it would help me make
sense of the doubled execution time you're seeing with my array
modification.


I think I am giving up on writing lots of small files to disk - see
following post.

One option is to use a Hash as a Sparse Matrix by using Arrays as keys
for indexing:

stats = Hash.new(0)
lines.each |line|
...
v = stats06
stats[[x,y]] = v if v != 0
...
end
File.open(output_filename, "a") do |file|
file.puts *stats.values
end

which should pay off if the number of interesting results is much less
than the cubic search space. Obviously if you need the values to keep a
specific ordering you'd need additional logic...


The problem with using a hash is I lose the mental picture of the
situation - a two dimensional array corresponds to the physical
situation and another array in each of these cells corresponds to the
numbers of organisms. If I have time I might look at this suggestion
though.

Thanks yet again!

Regards,

Phil.
--
Philip Rhoades

Pricom Pty Limited (ACN 003 252 275 ABN 91 003 252 275)
GPO Box 3411
Sydney NSW 2001
Australia
E-mail: (e-mail address removed)
 
P

Philip Rhoades

Ed, Ellie et al,


M. Edward (Ed) Borasky said:
Not to mention directory fragmentation in many filesystems. :) I'd be
looking at an ORM and SQLite rather than attempting something involving
32,000 files.


The dir organisation is not so bad - 20 files in each dir in tree of
50x32 dirs - however, this and previous comments from Ellie and others
have caused me to think that more of the processing chain can be
improved. Although I am incurring a 90% time increase on this current
step, the following steps have lots of awks, seds, pastes etc and it is
VERY slow. I think if I make use of DataMapper with SQLite (ie keep the
DB in memory) and then massage the data with some more Ruby I can
improve this stage still further and dramatically improve the next stage
as well! (and also keep it in one Ruby program). This is like a drug . .

Converting the first link in the chain, the main simulation C/C++
program would be a VERY big job though and I can't really justify the
time for that at the moment although I am thinking about it . .

Thanks to all for their help!

Regards,

Phil.
--
Philip Rhoades

Pricom Pty Limited (ACN 003 252 275 ABN 91 003 252 275)
GPO Box 3411
Sydney NSW 2001
Australia
E-mail: (e-mail address removed)
 
E

Eleanor McHugh

Eleanor said:
I'm surprised at the behaviour you're seeing so I ran a quick
benchmark on what I believe to be an equivalent array handling
problem with my laptop (OS X, Ruby 1.8.6-p111, Core 2 Duo 2GHz, 2GB
RAM):
require 'benchmark'
include Benchmark
bm(6) do |y|
y.report("appending") { x = []; (1..20000000).each { |i| x << i } }
y.report("nested creation") { x = Array.new(1000)
{ Array.new(1000) { Array.new(20, 0) } } }
y.report("unrolling") { x.flatten.flatten.length }
end
user system total real
appending 7.350000 0.100000 7.450000 ( 7.685458)
nested creation 0.850000 0.000000 0.850000 ( 0.851341)
unrolling 15.240000 0.280000 15.520000 ( 20.927189)
which in each case is manipulating an array of 20,000,000 elements.
This would be equivalent to processing 32000 files where each
contained 625 occurrences of the 'k=' tag.

I haven't used BM yet but I can't see how doing a benchmark on an
array helps? When I had an array it was slow, when I ditched the
array it was fast?

Your experience runs counter to all of my experience, both in terms of
several years of Ruby hacking and more generally in working on
performance-optimisation for large data manipulation problems.
Therefore being of a curious disposition I'd love to figure out why :)


Ellie

Eleanor McHugh
Games With Brains
http://slides.games-with-brains.net
 
R

Robert Klemme

2008/6/27 Eleanor McHugh said:
I'm surprised at the behaviour you're seeing so I ran a quick benchmark on
what I believe to be an equivalent array handling problem with my laptop (OS
X, Ruby 1.8.6-p111, Core 2 Duo 2GHz, 2GB RAM):

require 'benchmark'
include Benchmark
bm(6) do |y|
y.report("appending") { x = []; (1..20000000).each { |i| x << i } }
y.report("nested creation") { x = Array.new(1000) { Array.new(1000) {
Array.new(20, 0) } } }
y.report("unrolling") { x.flatten.flatten.length }
end

This can't be the exact code you were benchmarking since "x" is
undefined in the "unrolling" test. Also, why do you flatten twice?
You can't beat an Array flatter than flatten. :)

irb(main):009:0> Array.new(1000) { Array.new(1000) { Array.new(20, 0)
} }.flatten.select {|x| Array === x}.empty?
=> true
Regardless of what the profiler says, any program that's opening 32000 files
and writing to them is a prime candidate for IO optimisation...

... because IO in that case is also likely the slowest bit. Absolutely.

Kind regards

robert
 
E

Eleanor McHugh

2008/6/27 Eleanor McHugh said:
I'm surprised at the behaviour you're seeing so I ran a quick
benchmark on
what I believe to be an equivalent array handling problem with my
laptop (OS
X, Ruby 1.8.6-p111, Core 2 Duo 2GHz, 2GB RAM):

require 'benchmark'
include Benchmark
bm(6) do |y|
y.report("appending") { x = []; (1..20000000).each { |i| x << i } }
y.report("nested creation") { x = Array.new(1000)
{ Array.new(1000) {
Array.new(20, 0) } } }
y.report("unrolling") { x.flatten.flatten.length }
end

This can't be the exact code you were benchmarking since "x" is
undefined in the "unrolling" test.

I was using IRB and x was already defined earlier in my session, so
yes and no: for my session it was defined, but for this snippet it
won't be.
Also, why do you flatten twice?
You can't beat an Array flatter than flatten. :)

I was posting at 2am >;o


Ellie

Eleanor McHugh
Games With Brains
http://slides.games-with-brains.net
 
P

Philip Rhoades

Ellie,


Eleanor said:
Eleanor said:
I'm surprised at the behaviour you're seeing so I ran a quick
benchmark on what I believe to be an equivalent array handling
problem with my laptop (OS X, Ruby 1.8.6-p111, Core 2 Duo 2GHz, 2GB
RAM):
require 'benchmark'
include Benchmark
bm(6) do |y|
y.report("appending") { x = []; (1..20000000).each { |i| x << i } }
y.report("nested creation") { x = Array.new(1000) { Array.new(1000)
{ Array.new(20, 0) } } }
y.report("unrolling") { x.flatten.flatten.length }
end
user system total real
appending 7.350000 0.100000 7.450000 ( 7.685458)
nested creation 0.850000 0.000000 0.850000 ( 0.851341)
unrolling 15.240000 0.280000 15.520000 ( 20.927189)
which in each case is manipulating an array of 20,000,000 elements.
This would be equivalent to processing 32000 files where each
contained 625 occurrences of the 'k=' tag.

I haven't used BM yet but I can't see how doing a benchmark on an
array helps? When I had an array it was slow, when I ditched the
array it was fast?

Your experience runs counter to all of my experience, both in terms of
several years of Ruby hacking and more generally in working on
performance-optimisation for large data manipulation problems. Therefore
being of a curious disposition I'd love to figure out why :)


I don't think there is any great mystery here - in the first version I
thought it would be efficient to read the entire text file into an array
and then process the array, line by line. In the second case, I
eliminated the array and processed the data directly after reading it
(the same as I was doing in the first case) and relied on the Linux
system buffers/cache to be efficient about reading the data from disk.
I think the processing time saved (~50%) is just a function of not
putting data into an unnecessary structure and not manipulating the
structure?

Regards,

Phil.
--
Philip Rhoades

Pricom Pty Limited (ACN 003 252 275 ABN 91 003 252 275)
GPO Box 3411
Sydney NSW 2001
Australia
E-mail: (e-mail address removed)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,001
Messages
2,570,249
Members
46,846
Latest member
BettinaOsw

Latest Threads

Top