how about ruby's threads?

R

Ruby Newbee

Hello,

Is ruby's threads safe enough?
Once I used perl for programming a lot, but perl's threads is not so
nice, and modperl's threads is not safe.

Thanks.
 
A

andrew mcelroy

Hello,

Is ruby's threads safe enough?
Once I used perl for programming a lot, but perl's threads is not so
nice, and modperl's threads is not safe.

I just have to say, I read in my mind as borat (the movie) speaking.

Seriously though, it depends on your application.
Threads are in much better shape in Ruby 1.9 than in 1.8.
I would say that 1.8 uses green threads, while 1.9 you have real
threads and fibers.

Also, keep in mind which interpreter you are using.
My aforementioned statements are only true for the ruby interpreter on
ruby-lang.org

JRuby is a totally different story.

Andrew McElroy
 
D

David Masover

Is ruby's threads safe enough?

What do you mean by "safe"?
Once I used perl for programming a lot, but perl's threads is not so
nice,

What was wrong with them?
and modperl's threads is not safe.

What wasn't safe about them?

I'm really not sure how to answer this question.

On the one hand, I can say that Ruby threads are probably about as "safe" as
Perl threads, or Python threads, or any threads, in ANY language. It depends
whether a specific library or application is threadsafe -- for example, you
mentioned mod_perl. That is really more mod_perl's fault, not so much Perl's
fault.

On the other hand, no threads are safe, really -- in ANY language. Look into
other concurrency models, like actors, processes, or events.
 
T

Tony Arcieri

[Note: parts of this message were removed to make it a legal post.]

Is ruby's threads safe enough?

Threads as a concurrency primitive are about as "safe" as C's pointers were
at managing memory.
 
D

David Masover

First, it's just a preference, but I think most on the list agree to me --
please don't top-post. Start your post after the quote, preferably after the
relevant section.

Well, I asked this because Perl thread documentation warns that
multithreading should not be used in production systems.

I don't know of any similar limitation in Ruby, but I will say that you
probably shouldn't use Ruby threads in production systems. That doesn't have
anything to do with Ruby, it has to do with the concept of threads in general.
And Perl's threads has many limitations, as this article said:
http://www.perlmonks.org/index.pl?displaytype=print;node_id=288022

This lists one limitation and one weird design feature.

The weird design feature is that apparently Perl threads don't share variables
-- this is like fork(), and you may as well use fork anyway. The one thing
they give you that fork doesn't is "shared variables" -- so you can explicitly
share variables between threads.

Ruby doesn't do this. In Ruby, all variables are essentially shared between
threads, and I'm pretty sure there aren't massive data structures copied
between threads, so they're much lighter weight than Perl threads. But this
means that any time you change a variable that _might_ be visible elsewhere in
your program, you have to make sure you synchronize access (with locks and
such). So Ruby threads will probably be faster, but much more dangerous than
Perl threads unless you know what you're doing.
Python's is better.

I would guess Ruby threads are similar to Python.


All that said, you are probably asking the wrong question:

http://dobbscodetalk.com/index.php?option=com_myblog&show=Threads-considered-
harmful.html&Itemid=29

The question you are asking is, "Are Ruby threads at least as good as Python
threads?" The answer to that is probably yes, and better if you use JRuby.

The question you should be asking is, "What's the best way to handle
concurrency in Ruby?" The answer is, it depends what you're doing, but it's
probably not threads.
 
B

Brian Candler

David said:
The question you should be asking is, "What's the best way to handle
concurrency in Ruby?" The answer is, it depends what you're doing, but
it's
probably not threads.

I agree that it depends on what you're doing - but Ruby threads *are*
often useful, especially when used in a coarse-grained way.

For example, suppose you have a bunch of objects and each is opening a
HTTP connection to some remote server and pulling down content, and you
want this to happen concurrently. Each object is essentially
self-contained. Having these doing concurrent downloads within threads
is straightforward to program and pretty robust.

The alternatives aren't pretty: rewrite your application in an
event-driven way (so you have to find a HTTP client library which works
this way too), or fork off separate processes (which then have to
communicate back to the central one with the results, which might mean
select'ing across the children, or using the filesystem as a temporary
data store)

Not every application has to be mega-scalable and bombproof; a Sinatra
web app with a handful of concurrent clients might usefully use threads
too.

Of course, the assumption here is that you're programming in Ruby. If
you want to avoid threads (which I agree is a good thing to do) and
still have concurrency, then it might be better to switch to Erlang
rather than jump through hoops in Ruby.
 
D

David Masover

Ruby threads *are*
often useful, especially when used in a coarse-grained way.

I agree. However, I don't think threads are the best primitive to use for
coarse-grained multithreading. I much prefer processes and message-passing.
For example, suppose you have a bunch of objects and each is opening a
HTTP connection to some remote server and pulling down content, and you
want this to happen concurrently. Each object is essentially
self-contained. Having these doing concurrent downloads within threads
is straightforward to program and pretty robust.

I agree, and this is how I do that -- I should clarify. I like threads
technologically. I think they can be much cleaner than Unix processes. A
fork() is nice to prevent one crash from bringing down your entire app -- but
your app shouldn't be crashing that badly in the first place.

You mentioned Erlang. It will do some N:M threading -- that is, there really
will be some OS threads involved. In theory, one crash could bring down your
entire app. Also in theory, the Erlang runtime is robust enough that this will
Never Happen -- and to ensure that, the preferred way to write C extensions is
as separate processes which talk to Erlang via RPC. More efficient than fork
on Unix, but much more reliable than "threads" in just about any language.

That is: I see threads as both as harmful and as useful as Goto. All CPUs
essentially implement Goto, but no one in their right mind codes in terms of
Goto. We abstract it away, and use structured code.
The alternatives aren't pretty: rewrite your application in an
event-driven way (so you have to find a HTTP client library which works
this way too),

A quick Google turns up Rev::HttpClient, so this probably wasn't the best
example.
or fork off separate processes (which then have to
communicate back to the central one with the results, which might mean
select'ing across the children, or using the filesystem as a temporary
data store)

Or abstracting this away until it's more manageable. You can do that with
threads, too, but in Ruby, more processes means more concurrency, unless
you're doing JRuby -- and it definitely means something safer.
Of course, the assumption here is that you're programming in Ruby. If
you want to avoid threads (which I agree is a good thing to do) and
still have concurrency, then it might be better to switch to Erlang
rather than jump through hoops in Ruby.

That's probably true, if you can manage it -- but even in Ruby, there are
things that will abstract away threads for you.

The biggest problem I have with Erlang is that the syntax is hideous,
especially after Ruby. The second biggest problem I have is that while it
handles concurrency and binary data very well, Ruby handles just about
everything else better -- Unicode, string processing, metaprogramming and
reflection, DSLs...

This is why I have such high hopes for Reia, and why I'm tempted to dabble in
io -- I want something that's at least as beautiful as Ruby (though I do like
prototypal inheritance), but at least as good at concurrency as Erlang.

But in the mean time, I'm going to say that processes are likely to have way
fewer surprises for the average newbie, while hypocritically building wrappers
around threads for fun.
 
M

Martin DeMello

The biggest problem I have with Erlang is that the syntax is hideous,
especially after Ruby. The second biggest problem I have is that while it
handles concurrency and binary data very well, Ruby handles just about
everything else better -- Unicode, string processing, metaprogramming and
reflection, DSLs...

This is why I have such high hopes for Reia, and why I'm tempted to dabble in
io -- I want something that's at least as beautiful as Ruby (though I do like
prototypal inheritance), but at least as good at concurrency as Erlang.

The other option is to attach ruby workers to an erlang backbone

m.
 
D

David Masover

The other option is to attach ruby workers to an erlang backbone

This would miss out on one of the biggest wins for Erlang, which I'm not sure
is compatible with Ruby as a language -- the VM and the concept of shared-
nothing processes with immutable storage.

Disclaimer: The following is based on assumptions I haven't bothered to check.
However, if the Erlang VM doesn't behave this way, I'm pretty sure it _could_.

See, in Erlang, you only need to worry about your messages being too big when
they're going to go over a network. Short of that, you can pass around data
structures as big as you want, without much slowdown.

The reason is that in Erlang, all data structures are immutable. Erlang
carries this to a perverse level, by making variables assign-once, which
really isn't necessary. But the point is this:

In Ruby, if I pass a hash to another thread, I now have two threads which can
see the same hash. Since the hash is mutable, my two threads now have to
coordinate on who gets to change it when. The only way to fix this would be to
pass a duplicated hash to that thread, wasting time and RAM -- after all, the
original hash might be about to be GC'd -- or to freeze the hash, so that now,
if either thread wants to make changes, they each have to duplicate it,
meaning possibly _two_ copies of the hash. Not pretty.

In fact, the most Erlang-like way to do this is separate worker processes, as
you describe. Great, now messages have to be serialized as strings, sent over
a pipe, and then parsed -- even more of a performance hit.

In Erlang, since data structures are immutable, they can simply be passed by
reference -- the other thread can't change them, so why not let them be
shared? What's more, if I need to create a slightly modified data structure,
the most natural way to do that results in Rope-like structures -- so even
within a single process, it's probably more efficient -- but it also means
message-passing is almost free.

That's the big draw of Erlang for me -- I get to program with hundreds, even
thousands of loosely-coupled processes that are at least as safe as separate
Unix processes, yet the performance penalty is far less than even tens of Ruby
processes trying to do the same thing. Ruby threads would be less safe _and_
less concurrent, whereas Erlang will _automatically_ scale to multiple cores.



It might be a bit weird to hear me arguing for the performance win, given that
I'm not ashamed to suggest throwing more hardware at a problem, or repeat
"premature optimization..." when people criticize Ruby for being slow.

But this is a bit like Git. One of the main reasons I use Git is the
performance improvement -- up to a certain point, the extra performance buys
you nothing. Past that point, you realize that branches, merges, and commits
are essentially free, and it liberates you to work and collaborate in ways
that, while it's technically possible to do with other DVCSes (even with SVN),
you're much more likely to do it in Git, where 'git checkout -b' is
instantaneous and 'git merge' is seconds at most.

Erlang processes are like that. Up to a point, it's just a nice performance
improvement, and you're still manually twiddling the balance between threads,
processes, and event models, possibly using some monstrous combination of all
three. Each choice might give you more or less concurrency, more or less
performance, and more or less weird edge cases -- and in the case of
traditional threads, no matter how efficient they get, you're going to be wary
about adding more of them, and having to lock more things, and by the time you
lock everything as you should, much of the performance is gone.

By removing that barrier of performance, and by making processes easy to spawn
and manage, you can suddenly stop worrying about it. You can easily spawn one
process per connection -- to anything. You can spawn processes whose entire
job is to keep track of a single counter. You can spawn processes like you
don't care, like it's going out of style -- much the same way you'd spawn
objects in Ruby, only more so.

It's the kind of performance improvement that's not just squeezing a few more
percent out of the hardware you've got, or shaving a few milliseconds off a
task that was already fast enough. It's the kind of performance improvement
that fundamentally changes the way you work.

It's the difference between 'git merge' taking a few seconds and 'svn merge'
taking half an hour. (And no, I'm not making that up. It _routinely_ did so,
when I was using it at work. People switched to git-svn for that reason
alone.)



If I'm just going to have a bunch of Ruby workers anyway, I'd actually save
some RAM by getting myself a COW-friendly Ruby and using fork directly to
create workers, instead of running them from Erlang. In fact, if that's what I
was going to do, I'm not entirely sure why I'd want the Erlang backbone
anyway. But Ezra is a smart guy, so I figure there must be some reason he
wrote Nanite that way.



Anyway, I should probably go hang out on the Reia list, huh?
 
T

Tony Arcieri

[Note: parts of this message were removed to make it a legal post.]

Anyway, I should probably go hang out on the Reia list, huh?

Probably :)

For what it's worth Reia is still being worked on. The new branch is
focusing on getting a minimal feature set (what already exists in Erlang
itself) up to production quality without adding new, additional features
until that's done.

You could have a more Ruby-like language than Reia which still afforded many
of these same benefits. For example there's no reason process-local state
can't be mutable. However, when a message is sent to another process, it
should make a copy of the original state (hopefully using a mechanism like
COW to keep things sane). The receiver gets a "snapshot" of a given chunk
of state at the time it's sent.

As for what virtual machine such a hypothetical language could run on...
dunno.
 
B

Bill Kelly

Brian said:
The alternatives aren't pretty: rewrite your application in an
event-driven way (so you have to find a HTTP client library which works
this way too), or fork off separate processes (which then have to
communicate back to the central one with the results, which might mean
select'ing across the children, or using the filesystem as a temporary
data store)

An additional alternative that is pretty neat is combining an
event-driven library with Fibers in ruby 1.9.x.

Assuming of course, the existence of an HTTP library in that
idiom.

But I have a homegrown RPC library implemented using EventMachine,
which I recently adapted to use Fibers. So far, I am really
pleased with the result. It's like the best of both worlds.
My app is still single threaded, so I don't need any mutex /
synchronization / locking. But, I still get linear-style method
call semantics, in separate "parallel" execution Fibers. (In
essence the same sort of thing people have previously done with
continuation-based event libraries, only without the overhead of
continuations.)

So in any given fiber I can write pretty normal looking code,
like:

paths = @catalog.search("caption" => cap, "filename" => fname)
unless paths.empty?
title = "Search: #{str}"
doc_id = @window_svc.new_document_with_search_results(title, paths)
end

...and even though @catalog.search and @window_svc.new_document
may be making RPC calls to a remote host, only the current Fiber
will block waiting for the result.

I haven't used this technique long enough to have discovered if
there may be any pitfalls. But--so far--it's like the
convenience of threaded programming without the synchronization
issues.


Regards,

Bill
 
T

Tony Arcieri

[Note: parts of this message were removed to make it a legal post.]

An additional alternative that is pretty neat is combining an
event-driven library with Fibers in ruby 1.9.x.

Assuming of course, the existence of an HTTP library in that
idiom.

Revactor provides exactly this with its HTTP client. Revactor models actors
as fibers and wraps an underlying evented client (the afforementioned
Rev::HttpClient) with them:

http://github.com/tarcieri/revactor/blob/master/lib/revactor/http_client.rb

Here's a concurrent HTTP fetcher written around this HTTP client:

http://github.com/tarcieri/revactor/blob/master/lib/revactor/http_fetcher.rb
 
B

Brian Candler

David said:
You mentioned Erlang. It will do some N:M threading -- that is, there
really
will be some OS threads involved. In theory, one crash could bring down
your
entire app. Also in theory, the Erlang runtime is robust enough that
this will
Never Happen -- and to ensure that, the preferred way to write C
extensions is
as separate processes which talk to Erlang via RPC. More efficient than
fork
on Unix, but much more reliable than "threads" in just about any
language.

That is: I see threads as both as harmful and as useful as Goto.

Absolutely all true.

Of course, threads are going to be necessary at some level (just as goto
is necessary at a low level), because that's how SMP hardware actually
works.

The benefit of erlang is that is uses threads on your behalf but
provides you a much better message-passing abstraction in its place.

(It's possible to have message-passing at the hardware level. The
Transputer is an example of that. Writing in Occam for the Transputer is
like writing in C for a regular CPU - one step above machine code)
The biggest problem I have with Erlang is that the syntax is hideous,
especially after Ruby.

When I get a few spare cycles I'm trying to hack together a
ruby-flavoured erlang: just a front end which emits either the erlang
abstract syntax form, or regular erlang source.
This is why I have such high hopes for Reia, and why I'm tempted to
dabble in
io -- I want something that's at least as beautiful as Ruby (though I do
like
prototypal inheritance), but at least as good at concurrency as Erlang.

After erlang, when I look at ocaml again it starts to make a lot more
sense (for example, all the pattern-matching stuff). And ocaml can
compile directly to machine code too.
 
T

Tony Arcieri

[Note: parts of this message were removed to make it a legal post.]

When I get a few spare cycles I'm trying to hack together a
ruby-flavoured erlang: just a front end which emits either the erlang
abstract syntax form, or regular erlang source.

For what it's worth, that's what Reia is already.
 
B

Brian Candler

Tony said:
For what it's worth, that's what Reia is already.

No, I don't think so. Reia will run on the Erlang VM but will be a
substantially different language to Erlang: it will have destructive
assignment, and be a lot more dynamic. Being able to compile AOT to
beam files is not necessarily going to be provided. Furthermore it will
have different function call semantics, which you'll have to take
account of if calling reia from erlang or vice versa.
http://groups.google.com/group/reia/browse_thread/thread/668e6b302bba98b6

What I'm toying with is just a different syntax for standard Erlang,
which you'd compile to .beam and would be indistinguishable from Erlang
at that level. Not sure I'm going to have the time, but I've just
started hacking erlang's existing yecc grammar which looks like the path
of least resistance.
 
T

Tony Arcieri

[Note: parts of this message were removed to make it a legal post.]

So, psst, I'm the author of Reia, so you can take my responses as
authoritative :)

No, I don't think so. Reia will run on the Erlang VM but will be a
substantially different language to Erlang: it will have destructive
assignment, and be a lot more dynamic.


Yes, it has destructive assignment and late(r) binding. However, these are
the only major departures from Erlang, at least for the initial version.

Being able to compile AOT to .beam files is not necessarily going to be
provided.


I've had so many requests for this feature I'm certainly going to support
it.

Furthermore it will have different function call semantics, which you'll
have to take
account of if calling reia from erlang or vice versa.
http://groups.google.com/group/reia/browse_thread/thread/668e6b302bba98b6

This is true, however those call semantics are what enable blocks. I guess
you don't plan on having blocks.

What I'm toying with is just a different syntax for standard Erlang,
which you'd compile to .beam and would be indistinguishable from Erlang
at that level. Not sure I'm going to have the time, but I've just
started hacking erlang's existing yecc grammar which looks like the path
of least resistance.

Well, good luck. I'm not sure how wild people are going to be about a
language with a Ruby-like grammar and single assignment.
 
D

David Masover

Reia will run on the Erlang VM but will be a
substantially different language to Erlang: it will have destructive
assignment, and be a lot more dynamic. Being able to compile AOT to
.beam files is not necessarily going to be provided. Furthermore it will
have different function call semantics,

For what it's worth, just about all of these sound like improvements to me.

The "destructive assignment" seems to come in two forms -- private variables
can be destructively assigned, but that's a purely syntactical treatment, as
those will (I assume) be compiled into singly-assigned Erlang variables.

Instance variables can also be destructively assigned, and that's the biggest
difference. But they also cannot be shared between processes (that I know of),
so they don't break the advantages of Erlang.
What I'm toying with is just a different syntax for standard Erlang,
which you'd compile to .beam and would be indistinguishable from Erlang
at that level.

I should clarify, then -- while I complain about Erlang's syntax, that's not
the only problem I have with it. I _like_ object-oriented semantics, and I
think they'd map well onto actors, which is exactly the approach Reia is
taking.

Think about the features that make Ruby shine. A few of those are purely
syntactical. A few of them are based on the core idea that objects don't have
to inherently be any particular type -- they're just entities you send
messages to, and you don't know what response you'll get until you actually
send the message.

This is both the core of duck typing and a fair description of Erlang
processes. Essentially, an Erlang function that expects a process doesn't care
what kind of process you give it, so long as that process can handle the
messages it wants to send.

So while I can see the usefulness of pattern-matching for handling incoming
requests, that's a bit like method_missing -- or maybe the opposite, putting
something in front of the method calls. Still, most messages make sense
mapping directly to method calls, and I think doing so would work well as a
convention-over-configuration approach -- for the same reason that it makes
sense to have URLs correspond to methods on a controller, by convention.

I guess what I'm saying is, I want to have my cake and eat it. Ruby is already
built in such a way that it would map very naturally onto the actor model.
Unfortunately, since that's never been tried, any attempt to do so is probably
doomed -- there are entirely too many assumptions that code is executed
linearly.

I'm actually trying anyway, for fun. I've been (on and off for about a year)
writing an actor library for Ruby that actually does exactly what I'm
describing. But it will be slow (it uses Threads and Queues) and only safe if
you know what you're doing.
 
B

Brian Candler

Tony said:
So, psst, I'm the author of Reia, so you can take my responses as
authoritative :)

D'oh! I can't believe I missed that. My sincere apologies.
This is true, however those call semantics are what enable blocks. I
guess
you don't plan on having blocks.

I was going to do them in a rather trivial way, just by passing the
block as the first argument:

f(a,b) { |c| d(c) } --> f(fun(c) -> d(c) end, a, b)

That's on the observation that lists:map, lists:foldl and lists:filter
all seem to take a function as the first argument.
I'm not sure how wild people are going to be about a
language with a Ruby-like grammar and single assignment.

You're probably right. It's just a toy.

The first things I plan to do are:

- atoms in ruby syntax: :foo
- hence :foo() is a regular function call
- ditto a = :foo; a()
- barewords which are not seen in LHS of match expressions are also
implicitly function calls:
foo ---> foo()
foo "bar" ---> foo("bar")
- io.write --> io:write()
- def...rescue...ensure...end for function definitions (wrapping in a
'try' automatically if necessary)
- || -> orelse, && -> andalso

then see what it looks like.

I'd also like to use the "..." syntax for binaries rather than lists,
but that complicates matters when you have to write
a = "hello"
b = " world"
c = <<a/binary, b/binary>>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,164
Messages
2,570,901
Members
47,439
Latest member
elif2sghost

Latest Threads

Top