ruby and mustard

A

Ara.T.Howard

what do you do when ruby just won't quite cut the mustard?

here's a scenario from last week:

i'm doing some trival satelite image processing: find some pixels in one huge
image which are essentially pointers (indexes) into another image. for each
pixel found to be a certain value set those indexes in a bunch of other
images. here's the catch: the images are all HUGE - 1-3gb - and i've got to
be processing sets of 6-8 of them at a time. most amazingly, a combination of
guy's mmap and strscan does a good job with this task: the mmap ensures good
memory management w/o blowing the top off system limits and strscan is nice
and fast. using this combination i'm able to process an image set in 2-4
minutes. if you think about it this is a really amazing to be doing with a
scripting language. however, we are going to be using this code on 10s of
thousands of image sets - so every second counts. i spent a day writing
equivalent code in c which runs in < 1 minute - nice. the problems is this:
it took a DAY! the ruby code took about 35 minutes to write. we move at an
insane pace around here and i hardly ever have a DAY to do anything. i spent
monday scanning the web to check out the latest developments in languages.
ocaml grabbed my eye. then i spent 4 hours writing my first program in it
that used Bigarray and it's mmap facility to behave as my c program did. it
takes about 1.5 minutes to run and, i must admit, i wasn't really enjoying the
functinal paradigm - suppose that might change though...

so here's my delima:

when you want to write something faster than ruby, but want basic (IMHO)
tools at your disposal like hashes, good string handling, exceptions, etc.
what is the way to go? as i see it there are a few options:

* do it in another lang. i really don't like this because of learning
curve and adding extra dependancies (ocamlc for eg.).

* c++ is simply out. ;-)

* do it in pure c. have you used getoptlong lately - sheesh.

* do it using a nice library for c. glib is good - lot's of bells.
extra dependancy though...

these are the options i've been mulling over. lately, however, i'm starting
to favour this option:

* just code it in c using ruby's builtin libs. gives you hashes, eval'ing
code, lists, GC, etc. i'm not adding additional dependancies and
guarunteed (if my c stays pretty posix) that my code will run where ruby
will. don't need autotools, no stl., etc. etc.

are there any other options i'm missing?

what do you do when it needs to be __really__ fast BUT you also have to
develop it __really__ fast and would __prefer__ not to add dependancies.

regards.

-a
--
===============================================================================
| EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE :: 303.497.6469
| A flower falls, even though we love it; and a weed grows, even though we do
| not love it. --Dogen
===============================================================================
 
J

Jim Freeze

what do you do when it needs to be __really__ fast BUT you also have to
develop it __really__ fast and would __prefer__ not to add dependancies.

It depends. For what you are doing, my preference is to use Ruby
to do all the admin work like parse command lines and manage
the non critical sections of code, and then use either C or Ruby C
to write the speed critical sections.

I too looked at ocaml just last week. The problem I had was that
I could not easily wrap its calls into Ruby for OS X. A method
exists for Linux, but I need BDS support and didn't want to
take the time to do this myself.

Now the question is to use Ruby C or just plain ole C. For
things that I can really abstract, I prefer C (sort of like
the OO style used in the examples in the Pickaxe book) and
then use Swig to wrap my libraries. That way, I can use the
library independent of Ruby if I (or anyone else) ever need to.
 
K

Kaspar Schiess

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello,

| when you want to write something faster than ruby, but want basic (IMHO)
| tools at your disposal like hashes, good string handling, exceptions, etc.
| what is the way to go? as i see it there are a few options:

Do you know about psyco.sf.net ? You might try writing it in Python and
get Psyco's speedup.

Or wait around for another few months until I come up with Psyco for
Ruby. (flame me now).


greetings,
kaspar

semantics & semiotics
code manufacture

www.tua.ch/ruby
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (MingW32)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFAxeAbFifl4CA0ImQRAkbDAKCZv7SEg75d64oC5phjaP6N0yir0ACfdtqn
YB9yh0Ai5b/WVYh3Vw8SJOI=
=z7Vb
-----END PGP SIGNATURE-----
 
G

Gavin Sinclair

* do it using a nice library for c. glib is good - lot's of bells.
extra dependancy though...

If you've got a serious problem to solve, the dependency added by
glib surely pales as a concern. Heck, I don't know what it is, but if
*you* think it will help, use it!

Gavin
 
A

Aredridel

If you've got a serious problem to solve, the dependency added by
glib surely pales as a concern. Heck, I don't know what it is, but if
*you* think it will help, use it!

Glib is the base library for GTK+ -- it's an object and signal toolkit
in C. Very nice API, for C anyway. It has a lot of ruby-ish features.
It's nice to use.

However, I'd use Ruby for the same: Same functionality, plus the
language to code (or even just prototype) additional parts in.

Ari
 
C

Conan

Can you just use a faster computer? If your working for a company its a
good excuse to bug your boss for better hardware, and if your working at
a university you might be able to convince them to let you use their
high-performance computer.(every university has at least 1 supercomputer
right? :) )

The nice thing about code is that your algorithms complexity will be the
same no matter what language it's written in, so scaling hardware is
often a nice easy solution. (Unless your already running it on top of
the line system).



what do you do when ruby just won't quite cut the mustard?

here's a scenario from last week:

i'm doing some trival satelite image processing: find some pixels in one
huge
image which are essentially pointers (indexes) into another image. for each
pixel found to be a certain value set those indexes in a bunch of other
images. here's the catch: the images are all HUGE - 1-3gb - and i've got to
be processing sets of 6-8 of them at a time. most amazingly, a combination
of
guy's mmap and strscan does a good job with this task: the mmap ensures
good
memory management w/o blowing the top off system limits and strscan is nice
and fast. using this combination i'm able to process an image set in 2-4
minutes. if you think about it this is a really amazing to be doing with a
scripting language. however, we are going to be using this code on 10s of
thousands of image sets - so every second counts. i spent a day writing
equivalent code in c which runs in < 1 minute - nice. the problems is this:
it took a DAY! the ruby code took about 35 minutes to write. we move at an
insane pace around here and i hardly ever have a DAY to do anything. i
spent
monday scanning the web to check out the latest developments in languages.
ocaml grabbed my eye. then i spent 4 hours writing my first program in it
that used Bigarray and it's mmap facility to behave as my c program did. it
takes about 1.5 minutes to run and, i must admit, i wasn't really enjoying
the
functinal paradigm - suppose that might change though...

so here's my delima:

when you want to write something faster than ruby, but want basic (IMHO)
tools at your disposal like hashes, good string handling, exceptions, etc.
what is the way to go? as i see it there are a few options:

* do it in another lang. i really don't like this because of learning
curve and adding extra dependancies (ocamlc for eg.).

* c++ is simply out. ;-)

* do it in pure c. have you used getoptlong lately - sheesh.

* do it using a nice library for c. glib is good - lot's of bells.
extra dependancy though...

these are the options i've been mulling over. lately, however, i'm starting
to favour this option:

* just code it in c using ruby's builtin libs. gives you hashes, eval'ing
code, lists, GC, etc. i'm not adding additional dependancies and
guarunteed (if my c stays pretty posix) that my code will run where ruby
will. don't need autotools, no stl., etc. etc.

are there any other options i'm missing?

what do you do when it needs to be __really__ fast BUT you also have to
develop it __really__ fast and would __prefer__ not to add dependancies.

regards.

-a
--
===============================================================================
| EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE :: 303.497.6469
| A flower falls, even though we love it; and a weed grows, even though we
do
| not love it. --Dogen
===============================================================================
 
B

Ben Giddings

Conan said:
Can you just use a faster computer? If your working for a company its a
good excuse to bug your boss for better hardware, and if your working at
a university you might be able to convince them to let you use their
high-performance computer.(every university has at least 1 supercomputer
right? :) )

Heh, look at his signature. "@noaa.gov". That may mean he has access
to supercomputers, but it also means that if he doesn't, getting
something faster may be incredibly painful.

Btw, Ara, I don't have any answers to your question, but I think the
fact you're asking it is great. It's really amazing to show some real
world Ruby applications, esp. when it involves serious number crunching.

Ben
 
G

gabriele renzi

il Tue, 8 Jun 2004 08:47:52 -0600, "Ara.T.Howard"
<[email protected]> ha scritto::

have you considered using stuff like narray? (I have this strange
vision of images=~matrices, I may be completely wrong, anyway, but
there is a NImage IIRC)
 
A

Ara.T.Howard

Can you just use a faster computer? If your working for a company its a
good excuse to bug your boss for better hardware, and if your working at
a university you might be able to convince them to let you use their
high-performance computer.(every university has at least 1 supercomputer
right? :) )

The nice thing about code is that your algorithms complexity will be the
same no matter what language it's written in, so scaling hardware is
often a nice easy solution. (Unless your already running it on top of
the line system).

~ > cat /proc/cpuinfo | grep GHz
model name : Intel(R) Xeon(TM) CPU 2.80GHz
model name : Intel(R) Xeon(TM) CPU 2.80GHz
model name : Intel(R) Xeon(TM) CPU 2.80GHz
model name : Intel(R) Xeon(TM) CPU 2.80GHz


yes - there ARE four - it's plenty __fast__. we simply need it to be faster.

;-)


-a
--
===============================================================================
| EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE :: 303.497.6469
| A flower falls, even though we love it; and a weed grows, even though we do
| not love it. --Dogen
===============================================================================
 
A

Ara.T.Howard

That may mean he has access to supercomputers,
yes.

but it also means that if he doesn't, getting something faster may be
incredibly painful.

yes. emphasis on the 'incredibly'.


;-)
Btw, Ara, I don't have any answers to your question, but I think the fact
you're asking it is great. It's really amazing to show some real
world Ruby applications, esp. when it involves serious number crunching.

Ben

definitely serious number crunching. my latest project was ruby(mine)/idl(not
mine) fire detection - which is being used by the india government, check out

http://dmsp.ngdc.noaa.gov/images/poster_world.jpg

red is ruby/idl found fires!

primary data source is

http://dmsp.ngdc.noaa.gov/html/sensors/doc_ols.html

-a
--
===============================================================================
| EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE :: 303.497.6469
| A flower falls, even though we love it; and a weed grows, even though we do
| not love it. --Dogen
===============================================================================
 
J

Jeff Mitchell

--- "Ara.T.Howard said:
* do it in another lang. i really don't like this because of learning
curve and adding extra dependancies (ocamlc for eg.).

* c++ is simply out. ;-)

* do it in pure c. have you used getoptlong lately - sheesh.

* do it using a nice library for c. glib is good - lot's of bells.
extra dependancy though...

these are the options i've been mulling over. lately, however, i'm starting
to favour this option:

* just code it in c using ruby's builtin libs. gives you hashes, eval'ing
code, lists, GC, etc. i'm not adding additional dependancies and
guarunteed (if my c stays pretty posix) that my code will run where ruby
will. don't need autotools, no stl., etc. etc.

are there any other options i'm missing?

what do you do when it needs to be __really__ fast BUT you also have to
develop it __really__ fast and would __prefer__ not to add dependancies.

Is it possible to somehow isolate the inner-loop functionality? As long
as you can stay away from raw iterations of every data point -- calling only
row or column operations, for example -- you have a pretty good chance of
being fast enough.

Don't forget you can call any function in any shared library with ruby/dl,
including memcpy and so fourth. Make some trivial extension classes which
simply hold raw chunks of data in a C array, or just pack strings. You
can then pass this data to C lib functions or ruby extensions.





__________________________________
Do you Yahoo!?
Friends. Fun. Try the all-new Yahoo! Messenger.
http://messenger.yahoo.com/
 
A

Ara.T.Howard

Glib is the base library for GTK+ -- it's an object and signal toolkit
in C. Very nice API, for C anyway. It has a lot of ruby-ish features.
It's nice to use.

However, I'd use Ruby for the same: Same functionality, plus the
language to code (or even just prototype) additional parts in.

Ari

this is about where i'm at... glib is really nice, but the ruby api is
sufficient for most purposes...

-a
--
===============================================================================
| EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE :: 303.497.6469
| A flower falls, even though we love it; and a weed grows, even though we do
| not love it. --Dogen
===============================================================================
 
A

Ara.T.Howard

It depends. For what you are doing, my preference is to use Ruby
to do all the admin work like parse command lines and manage
the non critical sections of code, and then use either C or Ruby C
to write the speed critical sections.

I too looked at ocaml just last week. The problem I had was that
I could not easily wrap its calls into Ruby for OS X. A method
exists for Linux, but I need BDS support and didn't want to
take the time to do this myself.

Now the question is to use Ruby C or just plain ole C. For
things that I can really abstract, I prefer C (sort of like
the OO style used in the examples in the Pickaxe book) and
then use Swig to wrap my libraries. That way, I can use the
library independent of Ruby if I (or anyone else) ever need to.

good points - esp. bits about abstraction. i have done that approch before
(swig wrapped generic c code) - perhaps i'll return to it...

-a
--
===============================================================================
| EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE :: 303.497.6469
| A flower falls, even though we love it; and a weed grows, even though we do
| not love it. --Dogen
===============================================================================
 
A

Ara.T.Howard

Is it possible to somehow isolate the inner-loop functionality? As long
as you can stay away from raw iterations of every data point -- calling only
row or column operations, for example -- you have a pretty good chance of
being fast enough.

not really. i was trying out a combo of mmap an narray:

mmap = Mmap.new 'huge', 'rw', Mmap::MAP_SHARED
na = NArray.to_na mmap.to_s, NArray::BYTE
positions = (na.eq val).where

(how cool is it that this works!)

but this blows the top right off memory/swap. eg. if i could to this

(na.eq val).where do |pos|
...
end

eg. iff #where took a block i could do it this way. any sort of collection
will blow up since i'm looking for potentially 2 ** 30 positions and streaming
them on stdout (to another program) and each of those positions can occupy
(guessing) 30 bytes or so (how big is '1234567' in ruby?)... in otherwords i
really do have to handle them one at the time.

i can hear you thinking... why doesn't he mmap a peice at a time and use some
sort of buffering?

see

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=124624

for the reason why...

Don't forget you can call any function in any shared library with ruby/dl,
including memcpy and so fourth. Make some trivial extension classes which
simply hold raw chunks of data in a C array, or just pack strings. You
can then pass this data to C lib functions or ruby extensions.

ah, ruby/dl - i've never used this. know of any good examples? this __is__
interesting.

-a
--
===============================================================================
| EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE :: 303.497.6469
| A flower falls, even though we love it; and a weed grows, even though we do
| not love it. --Dogen
===============================================================================
 
J

James Britt

Ben said:
Btw, Ara, I don't have any answers to your question, but I think the
fact you're asking it is great. It's really amazing to show some real
world Ruby applications, esp. when it involves serious number crunching.


Timely for me, too, as I've been pondering writing a (potentially)
commercial app in Ruby, and speed may be an issue. My argument to
skeptics is that any slow stuff can be replaced with C code; that when
you code in Ruby you're essentially scripting a C app (full source
available, too!), and extending that C code is relatively easy.


James
 
C

Charles Comstock

these are the options i've been mulling over. lately, however, i'm starting
to favour this option:

* just code it in c using ruby's builtin libs. gives you hashes, eval'ing
code, lists, GC, etc. i'm not adding additional dependancies and
guarunteed (if my c stays pretty posix) that my code will run where ruby
will. don't need autotools, no stl., etc. etc.

are there any other options i'm missing?

what do you do when it needs to be __really__ fast BUT you also have to
develop it __really__ fast and would __prefer__ not to add dependancies.

Have you taken a look at rubyinline? It lets you embed compiled c
code inline in the middle of a ruby script. That way if you need to
run a fast loop you can switch to C but the cheap code that takes a
while in C is still doable in ruby. Can't remember the url for it,
should find it if you google. I want to say it's zenspiders work.

Charlie
 
C

Carl Youngblood

Conan said:
Can you just use a faster computer? If your working for a company its a
good excuse to bug your boss for better hardware, and if your working at
a university you might be able to convince them to let you use their
high-performance computer.(every university has at least 1 supercomputer
right? :) )

The nice thing about code is that your algorithms complexity will be the
same no matter what language it's written in, so scaling hardware is
often a nice easy solution. (Unless your already running it on top of
the line system).
I think the problem here is that if they could get it to process 100
pictures a second, that would be better. They want to squeeze as much
performance per CPU as possible, since any wait time is more than
ideal. So if you get heavier iron, then they are going to want to use
it to process that many more pictures, and you're left with the same
problem you started with.
 
C

Carl Youngblood

Charles said:
Have you taken a look at rubyinline? It lets you embed compiled c
code inline in the middle of a ruby script. That way if you need to
run a fast loop you can switch to C but the cheap code that takes a
while in C is still doable in ruby. Can't remember the url for it,
should find it if you google. I want to say it's zenspiders work.

Charlie
All of this discussion about C also reminds me that C code can itself be
optimized considerably. If you really want to squeeze the last drop out
of performance you can sometimes get even a twenty-fold improvement by
doing architecture-specific optimizations. There are also plenty of
platform-agnostic optimizations that can be done. A good book on this is
/Computer Systems: A Programmer's Perspective/, Randal E. Bryant and
David O'Hallaron.

Carl
 
R

Ryan Paul

what do you do when ruby just won't quite cut the mustard? ...
when you want to write something faster than ruby, but want basic (IMHO)
tools at your disposal like hashes, good string handling, exceptions, etc.
what is the way to go?

It may not be a practical solution now, because it will take you some time
to learn, but you might want to look into using OCaml in the future. OCaml
is a powerful functional language that can produce native binary
executables on a number of platforms. OCaml execution speed is extremely
impressive. In just about every scenario, the speed of an OCaml program
exceeds that of a comparable c++ program, and there some situations where
Ocaml execution speed may even exceed that of a comperable c program.

Ocaml is also a good choice, because it is object oriented, and it
provides a lot of really useful/powerful extras that vastly simplify code.
Hashes are available via the Hashtbl module, and type-homogenous lists are
a native part of the language. OCaml has a map function, and handles lists
with incredible grace.

-- SegPhault
 
J

Jean-Hugues ROBERT

It may not be a practical solution now, because it will take you some time
to learn, but you might want to look into using OCaml in the future. OCaml
is a powerful functional language that can produce native binary
executables on a number of platforms. OCaml execution speed is extremely
impressive. In just about every scenario, the speed of an OCaml program
exceeds that of a comparable c++ program, and there some situations where
Ocaml execution speed may even exceed that of a comperable c program.

Ocaml is also a good choice, because it is object oriented, and it
provides a lot of really useful/powerful extras that vastly simplify code.
Hashes are available via the Hashtbl module, and type-homogenous lists are
a native part of the language. OCaml has a map function, and handles lists
with incredible grace.

-- SegPhault

Last "Recent News" is september 2003. I wonder if that means that the
thing is not very active anymore ?

Yours,

JeanHuguesRobert
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,146
Messages
2,570,832
Members
47,374
Latest member
EmeliaBryc

Latest Threads

Top