A
Ara.T.Howard
what do you do when ruby just won't quite cut the mustard?
here's a scenario from last week:
i'm doing some trival satelite image processing: find some pixels in one huge
image which are essentially pointers (indexes) into another image. for each
pixel found to be a certain value set those indexes in a bunch of other
images. here's the catch: the images are all HUGE - 1-3gb - and i've got to
be processing sets of 6-8 of them at a time. most amazingly, a combination of
guy's mmap and strscan does a good job with this task: the mmap ensures good
memory management w/o blowing the top off system limits and strscan is nice
and fast. using this combination i'm able to process an image set in 2-4
minutes. if you think about it this is a really amazing to be doing with a
scripting language. however, we are going to be using this code on 10s of
thousands of image sets - so every second counts. i spent a day writing
equivalent code in c which runs in < 1 minute - nice. the problems is this:
it took a DAY! the ruby code took about 35 minutes to write. we move at an
insane pace around here and i hardly ever have a DAY to do anything. i spent
monday scanning the web to check out the latest developments in languages.
ocaml grabbed my eye. then i spent 4 hours writing my first program in it
that used Bigarray and it's mmap facility to behave as my c program did. it
takes about 1.5 minutes to run and, i must admit, i wasn't really enjoying the
functinal paradigm - suppose that might change though...
so here's my delima:
when you want to write something faster than ruby, but want basic (IMHO)
tools at your disposal like hashes, good string handling, exceptions, etc.
what is the way to go? as i see it there are a few options:
* do it in another lang. i really don't like this because of learning
curve and adding extra dependancies (ocamlc for eg.).
* c++ is simply out. ;-)
* do it in pure c. have you used getoptlong lately - sheesh.
* do it using a nice library for c. glib is good - lot's of bells.
extra dependancy though...
these are the options i've been mulling over. lately, however, i'm starting
to favour this option:
* just code it in c using ruby's builtin libs. gives you hashes, eval'ing
code, lists, GC, etc. i'm not adding additional dependancies and
guarunteed (if my c stays pretty posix) that my code will run where ruby
will. don't need autotools, no stl., etc. etc.
are there any other options i'm missing?
what do you do when it needs to be __really__ fast BUT you also have to
develop it __really__ fast and would __prefer__ not to add dependancies.
regards.
-a
--
===============================================================================
| EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE :: 303.497.6469
| A flower falls, even though we love it; and a weed grows, even though we do
| not love it. --Dogen
===============================================================================
here's a scenario from last week:
i'm doing some trival satelite image processing: find some pixels in one huge
image which are essentially pointers (indexes) into another image. for each
pixel found to be a certain value set those indexes in a bunch of other
images. here's the catch: the images are all HUGE - 1-3gb - and i've got to
be processing sets of 6-8 of them at a time. most amazingly, a combination of
guy's mmap and strscan does a good job with this task: the mmap ensures good
memory management w/o blowing the top off system limits and strscan is nice
and fast. using this combination i'm able to process an image set in 2-4
minutes. if you think about it this is a really amazing to be doing with a
scripting language. however, we are going to be using this code on 10s of
thousands of image sets - so every second counts. i spent a day writing
equivalent code in c which runs in < 1 minute - nice. the problems is this:
it took a DAY! the ruby code took about 35 minutes to write. we move at an
insane pace around here and i hardly ever have a DAY to do anything. i spent
monday scanning the web to check out the latest developments in languages.
ocaml grabbed my eye. then i spent 4 hours writing my first program in it
that used Bigarray and it's mmap facility to behave as my c program did. it
takes about 1.5 minutes to run and, i must admit, i wasn't really enjoying the
functinal paradigm - suppose that might change though...
so here's my delima:
when you want to write something faster than ruby, but want basic (IMHO)
tools at your disposal like hashes, good string handling, exceptions, etc.
what is the way to go? as i see it there are a few options:
* do it in another lang. i really don't like this because of learning
curve and adding extra dependancies (ocamlc for eg.).
* c++ is simply out. ;-)
* do it in pure c. have you used getoptlong lately - sheesh.
* do it using a nice library for c. glib is good - lot's of bells.
extra dependancy though...
these are the options i've been mulling over. lately, however, i'm starting
to favour this option:
* just code it in c using ruby's builtin libs. gives you hashes, eval'ing
code, lists, GC, etc. i'm not adding additional dependancies and
guarunteed (if my c stays pretty posix) that my code will run where ruby
will. don't need autotools, no stl., etc. etc.
are there any other options i'm missing?
what do you do when it needs to be __really__ fast BUT you also have to
develop it __really__ fast and would __prefer__ not to add dependancies.
regards.
-a
--
===============================================================================
| EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE :: 303.497.6469
| A flower falls, even though we love it; and a weed grows, even though we do
| not love it. --Dogen
===============================================================================