The other option is to attach ruby workers to an erlang backbone
This would miss out on one of the biggest wins for Erlang, which I'm not sure
is compatible with Ruby as a language -- the VM and the concept of shared-
nothing processes with immutable storage.
Disclaimer: The following is based on assumptions I haven't bothered to check.
However, if the Erlang VM doesn't behave this way, I'm pretty sure it _could_.
See, in Erlang, you only need to worry about your messages being too big when
they're going to go over a network. Short of that, you can pass around data
structures as big as you want, without much slowdown.
The reason is that in Erlang, all data structures are immutable. Erlang
carries this to a perverse level, by making variables assign-once, which
really isn't necessary. But the point is this:
In Ruby, if I pass a hash to another thread, I now have two threads which can
see the same hash. Since the hash is mutable, my two threads now have to
coordinate on who gets to change it when. The only way to fix this would be to
pass a duplicated hash to that thread, wasting time and RAM -- after all, the
original hash might be about to be GC'd -- or to freeze the hash, so that now,
if either thread wants to make changes, they each have to duplicate it,
meaning possibly _two_ copies of the hash. Not pretty.
In fact, the most Erlang-like way to do this is separate worker processes, as
you describe. Great, now messages have to be serialized as strings, sent over
a pipe, and then parsed -- even more of a performance hit.
In Erlang, since data structures are immutable, they can simply be passed by
reference -- the other thread can't change them, so why not let them be
shared? What's more, if I need to create a slightly modified data structure,
the most natural way to do that results in Rope-like structures -- so even
within a single process, it's probably more efficient -- but it also means
message-passing is almost free.
That's the big draw of Erlang for me -- I get to program with hundreds, even
thousands of loosely-coupled processes that are at least as safe as separate
Unix processes, yet the performance penalty is far less than even tens of Ruby
processes trying to do the same thing. Ruby threads would be less safe _and_
less concurrent, whereas Erlang will _automatically_ scale to multiple cores.
It might be a bit weird to hear me arguing for the performance win, given that
I'm not ashamed to suggest throwing more hardware at a problem, or repeat
"premature optimization..." when people criticize Ruby for being slow.
But this is a bit like Git. One of the main reasons I use Git is the
performance improvement -- up to a certain point, the extra performance buys
you nothing. Past that point, you realize that branches, merges, and commits
are essentially free, and it liberates you to work and collaborate in ways
that, while it's technically possible to do with other DVCSes (even with SVN),
you're much more likely to do it in Git, where 'git checkout -b' is
instantaneous and 'git merge' is seconds at most.
Erlang processes are like that. Up to a point, it's just a nice performance
improvement, and you're still manually twiddling the balance between threads,
processes, and event models, possibly using some monstrous combination of all
three. Each choice might give you more or less concurrency, more or less
performance, and more or less weird edge cases -- and in the case of
traditional threads, no matter how efficient they get, you're going to be wary
about adding more of them, and having to lock more things, and by the time you
lock everything as you should, much of the performance is gone.
By removing that barrier of performance, and by making processes easy to spawn
and manage, you can suddenly stop worrying about it. You can easily spawn one
process per connection -- to anything. You can spawn processes whose entire
job is to keep track of a single counter. You can spawn processes like you
don't care, like it's going out of style -- much the same way you'd spawn
objects in Ruby, only more so.
It's the kind of performance improvement that's not just squeezing a few more
percent out of the hardware you've got, or shaving a few milliseconds off a
task that was already fast enough. It's the kind of performance improvement
that fundamentally changes the way you work.
It's the difference between 'git merge' taking a few seconds and 'svn merge'
taking half an hour. (And no, I'm not making that up. It _routinely_ did so,
when I was using it at work. People switched to git-svn for that reason
alone.)
If I'm just going to have a bunch of Ruby workers anyway, I'd actually save
some RAM by getting myself a COW-friendly Ruby and using fork directly to
create workers, instead of running them from Erlang. In fact, if that's what I
was going to do, I'm not entirely sure why I'd want the Erlang backbone
anyway. But Ezra is a smart guy, so I figure there must be some reason he
wrote Nanite that way.
Anyway, I should probably go hang out on the Reia list, huh?