Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > A Real World example for Ruby to "compiled" version discussion

Reply
Thread Tools

A Real World example for Ruby to "compiled" version discussion

 
 
Philip Rhoades
Guest
Posts: n/a
 
      10-06-2010
People,

The "how can we make a ruby compiler" thread has been very interesting -
I really like hearing from serious and competent programmers about the
theoretical problems involved with this issue.

William Rutiser asked for an expansion on the details of my C/C++
population genetics simulation program as a specific example of how one
might proceed depending on a particular situation. I am happy to
elaborate - not least because it would be good to get input from
experienced Ruby programmers before I just try to replicate the same
program in Ruby - I'm sure there will be more sensible/efficient ways of
doing things than what I would attempt first off and so comparisons
between my C/C++ version and a dodgy Ruby program might be even more
unfair . .

I will summarise the C/C++ program as it exists now (it has gone through
a number of versions and has a lot of code that does not need
replicating for the present comparison) with a general overview (I can
add more detail later if people are interested):

- A population is represented by a number of sub-populations which
occupy cells of a two dimensional array

- each cell has pointers to three ordered lists - representing the
parental population, the offspring population and a temporary population
and each element of a list represents an individual

- at each generation (of potentially hundreds), the whole array is
iterated through, new offspring are produced, migrants move to adjacent
cells, parents die off etc

As well as this main simulation program, I have already replaced all the
original shell scripts and some of the statistical processing with Ruby
scripts but the main simulation program is where I couldn't afford an
order or two increase in running times by rewriting in Ruby.

The main problem I see with some of the Ruby conversions that I have
looked at (eg RubyInline) is that the performance problem comes in in
repeating the WHOLE simulation with different starting parameters which
is done thousands of times - so it is not like you have a single
recursive algorithm which is a bottleneck that can be optimised or
rewritten in C or something. There are lots of little steps that happen
millions of times . .

I hope that is sort of clear?

Regards,

Phil.
--
Philip Rhoades

GPO Box 3411
Sydney NSW 2001
Australia
E-mail: http://www.velocityreviews.com/forums/(E-Mail Removed)

 
Reply With Quote
 
 
 
 
Ryan Davis
Guest
Posts: n/a
 
      10-07-2010

On Oct 6, 2010, at 09:55 , Philip Rhoades wrote:

> The main problem I see with some of the Ruby conversions that I have =

looked at (eg RubyInline) is that the performance problem comes in in =
repeating the WHOLE simulation with different starting parameters which =
is done thousands of times - so it is not like you have a single =
recursive algorithm which is a bottleneck that can be optimised or =
rewritten in C or something. There are lots of little steps that happen =
millions of times . .

Could you expand on this??


 
Reply With Quote
 
 
 
 
Robert Klemme
Guest
Posts: n/a
 
      10-07-2010
On Thu, Oct 7, 2010 at 2:57 AM, Ryan Davis <(E-Mail Removed)> wrote=
:
>
> On Oct 6, 2010, at 09:55 , Philip Rhoades wrote:
>
>> The main problem I see with some of the Ruby conversions that I have loo=

ked at (eg RubyInline) is that the performance problem comes in in repeatin=
g the WHOLE simulation with different starting parameters which is done tho=
usands of times - so it is not like you have a single recursive algorithm w=
hich is a bottleneck that can be optimised or rewritten in C or something. =
=A0There are lots of little steps that happen millions of times . .
>
> Could you expand on this??


Specifically

- What properties (conceptually) does an individual / a population have?

- What calculations are done for each individual / population?

- What kind of iteration is applied?

Kind regards

robert

--=20
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

 
Reply With Quote
 
William Rutiser
Guest
Posts: n/a
 
      10-07-2010
Philip Rhoades wrote:
> People,
>
> The "how can we make a ruby compiler" thread has been very interesting
> - I really like hearing from serious and competent programmers about
> the theoretical problems involved with this issue.
>
> William Rutiser asked for an expansion on the details of my C/C++
> population genetics simulation program as a specific example of how
> one might proceed depending on a particular situation. I am happy to
> elaborate - not least because it would be good to get input from
> experienced Ruby programmers before I just try to replicate the same
> program in Ruby - I'm sure there will be more sensible/efficient ways
> of doing things than what I would attempt first off and so comparisons
> between my C/C++ version and a dodgy Ruby program might be even more
> unfair . .
>
> I will summarise the C/C++ program as it exists now (it has gone
> through a number of versions and has a lot of code that does not need
> replicating for the present comparison) with a general overview (I can
> add more detail later if people are interested):
>
> - A population is represented by a number of sub-populations which
> occupy cells of a two dimensional array
>
> - each cell has pointers to three ordered lists - representing the
> parental population, the offspring population and a temporary
> population and each element of a list represents an individual
>
> - at each generation (of potentially hundreds), the whole array is
> iterated through, new offspring are produced, migrants move to
> adjacent cells, parents die off etc
>
> As well as this main simulation program, I have already replaced all
> the original shell scripts and some of the statistical processing with
> Ruby scripts but the main simulation program is where I couldn't
> afford an order or two increase in running times by rewriting in Ruby.
>
> The main problem I see with some of the Ruby conversions that I have
> looked at (eg RubyInline) is that the performance problem comes in in
> repeating the WHOLE simulation with different starting parameters
> which is done thousands of times - so it is not like you have a single
> recursive algorithm which is a bottleneck that can be optimised or
> rewritten in C or something. There are lots of little steps that
> happen millions of times . .
>
> I hope that is sort of clear?
>
> Regards,
>
> Phil.

Some questions with my guesses at possible answers:

How much data is carried for each individual? a few alleles? a lot? the
whole human genome?

What is done in the innermost loop? I imagine you select a pair of
individuals, cross them, etc to produce more individuals. Or perhaps an
individual dies or migrated to another population.

What do you do to run a new experiment? generate the initial population,
specify the rules for combination, depth, migration, etc? gather data
and process it at the end?

How much code do you need to write or rewrite for an experiment? How
many parameters are involved?

Can you give some examples of aspects of the C++ program that you hope
to improve by moving the sim to Ruby?

Can you say something about the context of your modeling? How is your
simulator different from others used in the field?

-- Bill



 
Reply With Quote
 
Philip Rhoades
Guest
Posts: n/a
 
      10-08-2010
People,

I have accumulated the questions from different people into this response:


On 2010-10-07 23:12, Robert Klemme wrote:
> On Thu, Oct 7, 2010 at 2:57 AM, Ryan Davis<(E-Mail Removed)>
> wrote:
>>
>> On Oct 6, 2010, at 09:55 , Philip Rhoades wrote:
>>
>>> The main problem I see with some of the Ruby conversions that I
>>> have looked at (eg RubyInline) is that the performance problem
>>> comes in in repeating the WHOLE simulation with different
>>> starting parameters which is done thousands of times - so it is
>>> not like you have a single recursive algorithm which is a
>>> bottleneck that can be optimised or rewritten in C or something.
>>> There are lots of little steps that happen millions of times . .

>>
>> Could you expand on this??

>
> Specifically
>
> - What properties (conceptually) does an individual / a population
> have?



This is an Individual Based Model (IBM) as opposed to populations that
can be modelled as a whole with equations etc - so the simulation
reflects this. In the C/C++ version an individual organism is an object
with a number of attributes: ID, parent ID(s), (microsatellite) allele
repeat length, randomizer int etc


> - What calculations are done for each individual / population?



The genetics of the population changes as the individuals change because
of mutations which affect repeat length of microsatellites ie they are
not selected against/for (neutral mutations).


> - What kind of iteration is applied?



For this particular discussion:

for k in 1..K # generations
for x in 1..X # array width
for y in 1..Y # array height
destroy parent ordered list
destroy tmp ordered list
offspring list becomes parent list
for i in 1..I # organism in parent ordered list
asexually reproduce offspring with possible mutation
possibly migrate offspring to neighbouring array cell
end
end
end
end

data is written to disk after each generation and genetics statistics
are done later.


> How much data is carried for each individual? a few alleles? a lot?
> the whole human genome?



For this exercise, one loci and allele (asexual reproduction) - multiple
loci can be modelled by multiple runs.


> What is done in the innermost loop? I imagine you select a pair of
> individuals, cross them, etc to produce more individuals. Or perhaps
> an individual dies or migrated to another population.



Asexual reproduction in this case and see above.


> What do you do to run a new experiment? generate the initial
> population, specify the rules for combination, depth, migration, etc?
> gather data and process it at the end?



A script calls the simulation program hundreds of times with different
starting populations. These starting populations are the output of the
same simulation program which has previously been run, starting with
different random number seeds, until they reach "quasi-equilibrium"
(typically 500 generations). Also see above.


> How much code do you need to write or rewrite for an experiment? How
> many parameters are involved?



The code doesn't change - a configuration file and the starting
(initial) generation can be changed.


> Can you give some examples of aspects of the C++ program that you
> hope to improve by moving the sim to Ruby?



Not really - it was a while ago now that I did the actual coding (I
don't need to touch the code much these days) and I just remember it was
a major pain trying to learn about the eccentricities of OOP with C++ -
I'm sure it would have been a much more pleasant experience doing it in
Ruby . .

I actually don't need to change to Ruby now - I just need to write a
thesis . . What would be nice would be to further develop the program,
make it more generalised, more friendly for other users, be able to
respond to requests for enhancements/changes more quickly etc.


> Can you say something about the context of your modeling? How is your
> simulator different from others used in the field?



There are a couple of particular issues we wanted to investigate and my
supervisor thought that the existing stuff that was around then (about
nine years ago now) didn't conveniently allow us to do what we wanted -
maybe it is has changed now but I haven't had much exposure to other
programs. Also, a lot of them then were written in some sort of
graphical environment for Windows or Macs and I wanted to have a CLI
interface on Linux and have more direct control over what was happening . .

Thanks,

Phil.
--
Philip Rhoades

GPO Box 3411
Sydney NSW 2001
Australia
E-mail: (E-Mail Removed)

 
Reply With Quote
 
Charles Oliver Nutter
Guest
Posts: n/a
 
      10-22-2010
On Fri, Oct 8, 2010 at 10:57 AM, Phillip Gawlowski
<(E-Mail Removed)> wrote:
> This sounds like something JRuby would be good at it, since it can JIT
> the loops, and the Java VM reaches nearly-native speeds these days.
>
> So, yes, improved technology could make this faster these days.


Re JRuby: Eventually, yes...the current compiler is largely unchanged
from JRuby 1.1, however, and so we're not taking as good advantage of
the JVM as you might hope. Lots of opportunity to improve it, and
obviously we'd like these loops to be as fast as Java loops if
possible...

Soonish, perhaps! Help us fix bugs so I can spend more time on perf

- Charlie

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: Where to get stand alone Dot Net Framework version 1.1, version2.0, version 3.0, version 3.5, version 2.0 SP1, version 3.0 SP1 ? MowGreen [MVP] ASP .Net 5 02-09-2008 01:55 AM
Re: Where to get stand alone Dot Net Framework version 1.1, version 2.0, version 3.0, version 3.5, version 2.0 SP1, version 3.0 SP1 ? PA Bear [MS MVP] ASP .Net 0 02-05-2008 03:28 AM
Re: Where to get stand alone Dot Net Framework version 1.1, version 2.0, version 3.0, version 3.5, version 2.0 SP1, version 3.0 SP1 ? V Green ASP .Net 0 02-05-2008 02:45 AM



Advertisments