Ruby, Analysis, and Tons of RAM

B

ben

Does anyone have experience with using Ruby for analysis (*lots* of
maths), on a machine with a ridiculous amount of RAM? For example, a
hip 64-bit Linux kernel on a machine with 32 or 64 GB of physical RAM.

Are there any "gotchas" I should be aware of? Would all the RAM be
addressable by a given Ruby process? Or would I still have to be
forking a number of processes, each allocated a bit of the address
space (blech)?

Thanks, oh Ruby masters.
 
M

MonkeeSage

Does anyone have experience with using Ruby for analysis (*lots* of
maths), on a machine with a ridiculous amount of RAM? For example, a
hip 64-bit Linux kernel on a machine with 32 or 64 GB of physical RAM.

No...but, I'm always willing to help out! Just send me one such
workstation and I'll send you the results post haste! ;)

Regards,
Jordan
 
M

M. Edward (Ed) Borasky

MonkeeSage said:
No...but, I'm always willing to help out! Just send me one such
workstation and I'll send you the results post haste! ;)

I have lots of experience doing that sort of thing, but none of it is in
Ruby. You can contact me off list for some ideas.
 
L

Logan Capaldo

Does anyone have experience with using Ruby for analysis (*lots* of
maths), on a machine with a ridiculous amount of RAM? For example, a
hip 64-bit Linux kernel on a machine with 32 or 64 GB of physical RAM.

Are there any "gotchas" I should be aware of? Would all the RAM be
addressable by a given Ruby process? Or would I still have to be
forking a number of processes, each allocated a bit of the address
space (blech)?
Not having done this myself, you should take everything I say with a
grain of salt, but since ruby allocations (eventually) use malloc, how
much of this massive address space it gets and all that jazz strike be
as being something that is entirely up to the operating system.
(Excepting things in C extensions which may use mmap or whatever.)
 
A

ara.t.howard

Does anyone have experience with using Ruby for analysis (*lots* of
maths), on a machine with a ridiculous amount of RAM? For example, a
hip 64-bit Linux kernel on a machine with 32 or 64 GB of physical RAM.

i've had issues using mmap with files larger than 32gb - i'm not sure if the
latest release has fixed this or not... in general you can run into issues
with extenstions since ruby fixnums keep a bit to mark them as objects...
Are there any "gotchas" I should be aware of? Would all the RAM be
addressable by a given Ruby process? Or would I still have to be forking a
number of processes, each allocated a bit of the address space (blech)?

assuming you have two or four cpus this might not be a bad idea - ipc is so
dang easy with ruby it's trivial to coordinate processes. i have a slave
class i've used for this before:

http://codeforpeople.com/lib/ruby/slave/
http://codeforpeople.com/lib/ruby/slave/slave-0.0.1/README

regards.


-a
 
M

M. Edward (Ed) Borasky

Does anyone have experience with using Ruby for analysis (*lots* of
maths), on a machine with a ridiculous amount of RAM? For example, a
hip 64-bit Linux kernel on a machine with 32 or 64 GB of physical RAM.

Are there any "gotchas" I should be aware of? Would all the RAM be
addressable by a given Ruby process? Or would I still have to be
forking a number of processes, each allocated a bit of the address
space (blech)?

Thanks, oh Ruby masters.

1. Again, you can contact me off list for some ideas ... without knowing
your goal, it's difficult for me to know what steps you should take to
reach it.

2. Assume a properly working state-of-the-art 64-bit dual-core AMD or
Intel hardware platform with 64 GB of RAM and an appropriate SAN for
storage from a major vendor like IBM. That severely limits your OS
choices; last time I looked you need to be running either RHEL or SUSE
Enterprise Linux. I don't know about the other vendors, but IBM has a
marvelous document on performance tuning humongous servers at

http://www.redbooks.ibm.com/redbooks/pdfs/sg245287.pdf

3. OK, now you've purchased a high-end server and a *supported*
enterprise-grade Linux, and you want to do some serious number crunching
on it, and you want to do it in Ruby, possibly augmented by libraries in
C, Fortran or assembler for speed. You will need to recompile
*everything* -- Ruby, the math libraries, and the compiler itself -- to
use 64-bit addressing. There are some hacks and workarounds, but pretty
much this is required. If you end up with an Intel server, you might
want to have a look at the Intel compilers instead of GCC. Intel also
has some highly-tuned math libraries, as does AMD.

My point here is that you are "exploring" realms in Ruby that are
"usually" addressed using "more traditional" techniques, so you're going
to need to do a fair amount of testing. That kind of server costs a lot
of money, and for that kind of money, you'll get lots of support from
the vendor, coupled with strong incentives to do your job in ways that
are tested and proven to work and supported by said vendor.That may or
may not include Ruby, and if it does include Ruby, it may or may not
involve a small number of monolithic Ruby scripts directly addressing a
large address space.

There is a lot of help available on the Internet from people like me who
love challenges like this. :)
 
M

M. Edward (Ed) Borasky

i've had issues using mmap with files larger than 32gb - i'm not sure if
the
latest release has fixed this or not... in general you can run into issues
with extenstions since ruby fixnums keep a bit to mark them as objects...


assuming you have two or four cpus this might not be a bad idea - ipc is so
dang easy with ruby it's trivial to coordinate processes. i have a slave
class i've used for this before:

http://codeforpeople.com/lib/ruby/slave/
http://codeforpeople.com/lib/ruby/slave/slave-0.0.1/README

regards.


-a

Ah, someone *has* done some of this! What compiler did you use to
recompile Ruby for 64-bit addressing? Did it work out of the box?

What's the bottleneck in Ruby's built-in IPC? Network traffic to
"localhost" and to the other hosts? System V IPC? Something else?

I haven't really looked at the whole "lots of coordinated tiny
processes" thing in Ruby, since Erlang seems to have nailed that
approach and made it the core of the Erlang way to do things. I'm not a
big fan of re-inventing wheels; I'd much rather just get my numbers
crunched.
 
J

Joel VanderWerf


Ara, why does Slave.new keep a copy of the object (the one you want to
be served up) on both sides of the fork?

Wouldn't it make more sense for it to work like this (modifying the
example in slave.rb):

class Server
def add_two n
n + 2
end
end

slave = Slave.new {Server.new} # <-- note addition of {...}
server = slave.object

p server.add_two(40) #=> 42

Slave.new would call the block _only_ in the child, and the parent would
never have an instance of Server, only the drb handle to it.

This might matter if Server.new consumes resources, sets up a data
structure in memory, opens files, etc.
 
A

Ara.T.Howard

Ara, why does Slave.new keep a copy of the object (the one you want to be
served up) on both sides of the fork?

Wouldn't it make more sense for it to work like this (modifying the example
in slave.rb):

class Server
def add_two n
n + 2
end
end

slave = Slave.new {Server.new} # <-- note addition of {...}
server = slave.object

p server.add_two(40) #=> 42

Slave.new would call the block _only_ in the child, and the parent would
never have an instance of Server, only the drb handle to it.

This might matter if Server.new consumes resources, sets up a data structure
in memory, opens files, etc.

that's a great point joel. i'll add that capability and release on monday or
so. right now, if a block is given it's called with the object - but i can
detect the case based on whether or not and obj is also passed.

thanks a bunch.

-a
 
B

ben

Thank you all for the excellent suggestions and links.

More about the problem domain: Linguistic modeling with lots of
posterior (Bayesian) inference maths. The probability matrices involved
can easily grow into the multiple GB's range, and obviously I'm
completely hosed if I keep things disk-based. (I've tried. Even with
hip and sexy paging.) The process is only somewhat parallelizable, but
I've gotten a nasty hit in the past from the IPC's. (This IPC hit was
probably my fault, damn "3.14159".to_f.)

Obviously, I'm doing the real maths with C routines called from Ruby.

I'm quite happy to hack around and compile everything from scratch. On
the other hand, I'm not happy with expensive support agreements or
using "traditional techniques" (FORTRAN, shudder). So maybe it's
reasonable to consider me 1) short on cash, 2) short on processing
time, 3) long on Linux admin skillZ, 4) long-ish on coding time.

Thanks again, folks.
 
M

M. Edward (Ed) Borasky

Thank you all for the excellent suggestions and links.

More about the problem domain: Linguistic modeling with lots of
posterior (Bayesian) inference maths. The probability matrices involved
can easily grow into the multiple GB's range, and obviously I'm
completely hosed if I keep things disk-based. (I've tried. Even with
hip and sexy paging.) The process is only somewhat parallelizable, but
I've gotten a nasty hit in the past from the IPC's. (This IPC hit was
probably my fault, damn "3.14159".to_f.)

Hmmm ... large matrices and "only somewhat parallelizable" ... that's
counterintuitive to me. Dense or sparse?
Obviously, I'm doing the real maths with C routines called from Ruby.

Who does the memory management? Ruby? C? Linux?
I'm quite happy to hack around and compile everything from scratch. On
the other hand, I'm not happy with expensive support agreements or
using "traditional techniques" (FORTRAN, shudder). So maybe it's
reasonable to consider me 1) short on cash, 2) short on processing
time, 3) long on Linux admin skillZ, 4) long-ish on coding time.

This sounds to me more like a computational linear algebra problem than
a Linux system administration problem -- at least, once you've got a
64-bit Linux distro and toolchain up and running. :) Given that you've
gone to C, I can't imagine there not being an efficient open-source C
library that won't handle your problem at near-optimal speeds, at least
on dense matrices.

Although -- in my application area, performance modelling, most of the
well-known existing packages are academic licensed rather than true open
source. You can get them free if you're an academic researcher, but if
you want to use them commercially, you have to pay for them. Which is
why I'm writing Rameau. But I don't have a large-memory SMP machine, and
my matrices are either sparse, small and dense, or easily converted
into, say, a Kronecker product of small dense matrices.

If nobody has invited you yet, check out

http://sciruby.codeforpeople.com/sr.cgi/FrontPage
 
B

ben

M. Edward (Ed) Borasky said:
Hmmm ... large matrices and "only somewhat parallelizable" ... that's
counterintuitive to me. Dense or sparse?

Then maybe my terminology is weak. So they matrices are very large: 3
dimensions, about 30000x1000x3 elements, each element a float but the
3rd dimension could be an int. Very dense, typically about 30% zeros.
And by only somewhat parallelizable, I just mean that the algorithm
that builds the matrix bounces around like crazy -- it does not work on
a particularly "local area" of the matrix. (Read 2123x501x1, mutate
1x991x3, read 29820x11x2, and so on.)
Who does the memory management? Ruby? C? Linux?

A mix between Ruby -- previously allocated arrays -- and C malloc()'ing
temporarily array for scratch space.
This sounds to me more like a computational linear algebra problem than
a Linux system administration problem -- at least, once you've got a
64-bit Linux distro and toolchain up and running. :) Given that you've
gone to C, I can't imagine there not being an efficient open-source C
library that won't handle your problem at near-optimal speeds, at least
on dense matrices.

The comfortably-licensed libraries I might shoe-horn into working --
for linguistic inference, that is -- are either groddy
proofs-of-concept, or optimzed for much smaller dimensions. :( I'm in
relatively new territory here.
Although -- in my application area, performance modelling, most of the
well-known existing packages are academic licensed rather than true open
source. You can get them free if you're an academic researcher, but if
you want to use them commercially, you have to pay for them. Which is
why I'm writing Rameau. But I don't have a large-memory SMP machine, and
my matrices are either sparse, small and dense, or easily converted
into, say, a Kronecker product of small dense matrices.

If nobody has invited you yet, check out

http://sciruby.codeforpeople.com/sr.cgi/FrontPage

Thanks for the link. Gotta love the Web -- one of my projects
("integral") is already indexed as an InterestingProject.
 
J

John Carter

Does anyone have experience with using Ruby for analysis (*lots* of
maths), on a machine with a ridiculous amount of RAM? For example, a
hip 64-bit Linux kernel on a machine with 32 or 64 GB of physical RAM.

Believe it or not, make or Rake or something like that is your friend.
(I tend to roll a few lines of Ruby to do the heart of it)

Break your computation into a pipeline of processes and store
intermediate results on disk.

Since Life's a Bitch, your program will have bugs / crash / wrong data /
....

So fix the appropriate input, the Makefile or whatever knows the
dependency net and recomputes only the steps needed. (You did say Big
didn't you? That in my experience means lots and lots of wall clock time
for each run. This way decreases your run,debug,fix cycle time hugely)

Since you have multiple processes, the problems you mention fade away.

If you have multiple CPU's or machines, distributing the load becomes
easy.




John Carter Phone : (64)(3) 358 6639
Tait Electronics Fax : (64)(3) 359 4632
PO Box 1645 Christchurch Email : (e-mail address removed)
New Zealand

"We have more to fear from
The Bungling of the Incompetent
Than from the Machinations of the Wicked." (source unknown)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,214
Messages
2,571,110
Members
47,702
Latest member
gancflex

Latest Threads

Top