A Real World example for Ruby to "compiled" version discussion

Philip Rhoades · Oct 6, 2010

People,

The "how can we make a ruby compiler" thread has been very interesting -
I really like hearing from serious and competent programmers about the
theoretical problems involved with this issue.

William Rutiser asked for an expansion on the details of my C/C++
population genetics simulation program as a specific example of how one
might proceed depending on a particular situation. I am happy to
elaborate - not least because it would be good to get input from
experienced Ruby programmers before I just try to replicate the same
program in Ruby - I'm sure there will be more sensible/efficient ways of
doing things than what I would attempt first off and so comparisons
between my C/C++ version and a dodgy Ruby program might be even more
unfair . .

I will summarise the C/C++ program as it exists now (it has gone through
a number of versions and has a lot of code that does not need
replicating for the present comparison) with a general overview (I can
add more detail later if people are interested):

- A population is represented by a number of sub-populations which
occupy cells of a two dimensional array

- each cell has pointers to three ordered lists - representing the
parental population, the offspring population and a temporary population
and each element of a list represents an individual

- at each generation (of potentially hundreds), the whole array is
iterated through, new offspring are produced, migrants move to adjacent
cells, parents die off etc

As well as this main simulation program, I have already replaced all the
original shell scripts and some of the statistical processing with Ruby
scripts but the main simulation program is where I couldn't afford an
order or two increase in running times by rewriting in Ruby.

The main problem I see with some of the Ruby conversions that I have
looked at (eg RubyInline) is that the performance problem comes in in
repeating the WHOLE simulation with different starting parameters which
is done thousands of times - so it is not like you have a single
recursive algorithm which is a bottleneck that can be optimised or
rewritten in C or something. There are lots of little steps that happen
millions of times . .

I hope that is sort of clear?

Regards,

Phil.
--
Philip Rhoades

GPO Box 3411
Sydney NSW 2001
Australia
E-mail: (e-mail address removed)

Ryan Davis · Oct 7, 2010

The main problem I see with some of the Ruby conversions that I have =

looked at (eg RubyInline) is that the performance problem comes in in =
repeating the WHOLE simulation with different starting parameters which =
is done thousands of times - so it is not like you have a single =
recursive algorithm which is a bottleneck that can be optimised or =
rewritten in C or something. There are lots of little steps that happen =
millions of times . .

Could you expand on this??

Robert Klemme · Oct 7, 2010

ked at (eg RubyInline) is that the performance problem comes in in repeatin=
g the WHOLE simulation with different starting parameters which is done tho=
usands of times - so it is not like you have a single recursive algorithm w=
hich is a bottleneck that can be optimised or rewritten in C or something. =
=A0There are lots of little steps that happen millions of times . .

Could you expand on this??

Specifically

- What properties (conceptually) does an individual / a population have?

- What calculations are done for each individual / population?

- What kind of iteration is applied?

Kind regards

robert

--=20
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

William Rutiser · Oct 7, 2010

Philip said:
People,

The "how can we make a ruby compiler" thread has been very interesting
- I really like hearing from serious and competent programmers about
the theoretical problems involved with this issue.

William Rutiser asked for an expansion on the details of my C/C++
population genetics simulation program as a specific example of how
one might proceed depending on a particular situation. I am happy to
elaborate - not least because it would be good to get input from
experienced Ruby programmers before I just try to replicate the same
program in Ruby - I'm sure there will be more sensible/efficient ways
of doing things than what I would attempt first off and so comparisons
between my C/C++ version and a dodgy Ruby program might be even more
unfair . .

I will summarise the C/C++ program as it exists now (it has gone
through a number of versions and has a lot of code that does not need
replicating for the present comparison) with a general overview (I can
add more detail later if people are interested):

- A population is represented by a number of sub-populations which
occupy cells of a two dimensional array

- each cell has pointers to three ordered lists - representing the
parental population, the offspring population and a temporary
population and each element of a list represents an individual

- at each generation (of potentially hundreds), the whole array is
iterated through, new offspring are produced, migrants move to
adjacent cells, parents die off etc

As well as this main simulation program, I have already replaced all
the original shell scripts and some of the statistical processing with
Ruby scripts but the main simulation program is where I couldn't
afford an order or two increase in running times by rewriting in Ruby.

The main problem I see with some of the Ruby conversions that I have
looked at (eg RubyInline) is that the performance problem comes in in
repeating the WHOLE simulation with different starting parameters
which is done thousands of times - so it is not like you have a single
recursive algorithm which is a bottleneck that can be optimised or
rewritten in C or something. There are lots of little steps that
happen millions of times . .

I hope that is sort of clear?

Regards,

Phil.

Some questions with my guesses at possible answers:

How much data is carried for each individual? a few alleles? a lot? the
whole human genome?

What is done in the innermost loop? I imagine you select a pair of
individuals, cross them, etc to produce more individuals. Or perhaps an
individual dies or migrated to another population.

What do you do to run a new experiment? generate the initial population,
specify the rules for combination, depth, migration, etc? gather data
and process it at the end?

How much code do you need to write or rewrite for an experiment? How
many parameters are involved?

Can you give some examples of aspects of the C++ program that you hope
to improve by moving the sim to Ruby?

Can you say something about the context of your modeling? How is your
simulator different from others used in the field?

-- Bill

Philip Rhoades · Oct 8, 2010

People,

I have accumulated the questions from different people into this response:

Specifically

- What properties (conceptually) does an individual / a population
have?

This is an Individual Based Model (IBM) as opposed to populations that
can be modelled as a whole with equations etc - so the simulation
reflects this. In the C/C++ version an individual organism is an object
with a number of attributes: ID, parent ID(s), (microsatellite) allele
repeat length, randomizer int etc

- What calculations are done for each individual / population?

The genetics of the population changes as the individuals change because
of mutations which affect repeat length of microsatellites ie they are
not selected against/for (neutral mutations).

- What kind of iteration is applied?

For this particular discussion:

for k in 1..K # generations
for x in 1..X # array width
for y in 1..Y # array height
destroy parent ordered list
destroy tmp ordered list
offspring list becomes parent list
for i in 1..I # organism in parent ordered list
asexually reproduce offspring with possible mutation
possibly migrate offspring to neighbouring array cell
end
end
end
end

data is written to disk after each generation and genetics statistics
are done later.

How much data is carried for each individual? a few alleles? a lot?
the whole human genome?

For this exercise, one loci and allele (asexual reproduction) - multiple
loci can be modelled by multiple runs.

What is done in the innermost loop? I imagine you select a pair of
individuals, cross them, etc to produce more individuals. Or perhaps
an individual dies or migrated to another population.

Asexual reproduction in this case and see above.

What do you do to run a new experiment? generate the initial
population, specify the rules for combination, depth, migration, etc?
gather data and process it at the end?

A script calls the simulation program hundreds of times with different
starting populations. These starting populations are the output of the
same simulation program which has previously been run, starting with
different random number seeds, until they reach "quasi-equilibrium"
(typically 500 generations). Also see above.

How much code do you need to write or rewrite for an experiment? How
many parameters are involved?

The code doesn't change - a configuration file and the starting
(initial) generation can be changed.

Can you give some examples of aspects of the C++ program that you
hope to improve by moving the sim to Ruby?

Not really - it was a while ago now that I did the actual coding (I
don't need to touch the code much these days) and I just remember it was
a major pain trying to learn about the eccentricities of OOP with C++ -
I'm sure it would have been a much more pleasant experience doing it in
Ruby . .

I actually don't need to change to Ruby now - I just need to write a
thesis . . What would be nice would be to further develop the program,
make it more generalised, more friendly for other users, be able to
respond to requests for enhancements/changes more quickly etc.

Can you say something about the context of your modeling? How is your
simulator different from others used in the field?

There are a couple of particular issues we wanted to investigate and my
supervisor thought that the existing stuff that was around then (about
nine years ago now) didn't conveniently allow us to do what we wanted -
maybe it is has changed now but I haven't had much exposure to other
programs. Also, a lot of them then were written in some sort of
graphical environment for Windows or Macs and I wanted to have a CLI
interface on Linux and have more direct control over what was happening . .

Thanks,

Phil.
--
Philip Rhoades

GPO Box 3411
Sydney NSW 2001
Australia
E-mail: (e-mail address removed)

Charles Oliver Nutter · Oct 22, 2010

This sounds like something JRuby would be good at it, since it can JIT
the loops, and the Java VM reaches nearly-native speeds these days.

So, yes, improved technology could make this faster these days.

Re JRuby: Eventually, yes...the current compiler is largely unchanged
from JRuby 1.1, however, and so we're not taking as good advantage of
the JVM as you might hope. Lots of opportunity to improve it, and
obviously we'd like these loops to be as fast as Java loops if
possible...

Soonish, perhaps! Help us fix bugs so I can spend more time on perf

- Charlie

Ruby substitute for Miro?	1	Jul 14, 2009
Compiled version of Ruby	11	Nov 2, 2005
Methods for understanding complex, real world, C++ code?	50	Apr 10, 2012
Ruby in the Real World™	9	Aug 23, 2005
Real World Scalability and Ruby - Top 20	18	Sep 10, 2006
Any population geneticists out there? interested in a Ruby SIG?	5	May 14, 2008
Problem on an example from ruby-lang.org	1	Sep 18, 2010
[QUIZ] Hello, world? (#158)	40	Mar 1, 2008

A Real World example for Ruby to "compiled" version discussion

Philip Rhoades

Ryan Davis

Robert Klemme

William Rutiser

Philip Rhoades

Charles Oliver Nutter

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads