[ANN] threadify-0.0.1

A

ara howard

this one's for you charlie ;-)






NAME
threadify.rb


SYNOPSIS
enumerable = %w( a b c d )
enumerable.threadify(2){ 'process this block using two worker
threads' }

DESCRIPTION
threadify.rb makes it stupid easy to process a bunch of data using
'n'
worker threads

INSTALL
gem install threadify

URI
http://rubyforge.org/projects/codeforpeople

SAMPLES

<========< sample/a.rb >========>

~ > cat sample/a.rb

require 'open-uri'
require 'yaml'

require 'rubygems'
require 'threadify'


uris =
%w(
http://google.com
http://yahoo.com
http://rubyforge.org
http://ruby-lang.org
http://kcrw.org
http://drawohara.com
http://codeforpeople.com
)


time 'without threadify' do
uris.each do |uri|
body = open(uri){|pipe| pipe.read}
end
end


time 'with threadify' do
uris.threadify do |uri|
body = open(uri){|pipe| pipe.read}
end
end


BEGIN {
def time label
a = Time.now.to_f
yield
ensure
b = Time.now.to_f
y label => (b - a)
end
}

~ > ruby sample/a.rb

---
without threadify: 7.41900205612183
---
with threadify: 3.69886112213135




a @ http://codeforpeople.com/
 
F

fedzor

Thank you! This gem pretty much makes my life simpler, and will
continue to make it simpler!

(stdlib please?)

~ ari
 
C

Charles Oliver Nutter

ara said:
this one's for you charlie ;-)

Appears to work just dandy under JRuby:

âž” time jruby --server -rthreadify -e "nums = *(1..35); def fib(n); if n
< 2; return n; else; return fib(n - 1) + fib(n - 2); end; end; nums.each
{|i| p fib(i)}"
...
real 0m11.889s
user 0m11.733s
sys 0m0.188s
~/NetBeansProjects/jruby âž” time jruby --server -rthreadify -e "nums =
*(1..35); def fib(n); if n < 2; return n; else; return fib(n - 1) +
fib(n - 2); end; end; nums.threadify {|i| p fib(i)}"
...
real 0m8.213s
user 0m12.722s
sys 0m0.178s

(One thread on my system consumes roughly 65-70% CPU, which explains why
full CPU on both cores doesn't double performance here)

I also found some weird bug where Thread#kill/exit from within the
thread interacts weirdly with join happening outside, and never
terminates. Fixing that now.

- Cahrlie
 
M

Martin DeMello

mirror delay. check codeforpeople svn, it's only one file.

thanks, gotit. will also install the gem when it propagates, just to
keep my system informed :)

m.
 
A

ara.t.howard

Appears to work just dandy under JRuby:

=E2=9E=94 time jruby --server -rthreadify -e "nums =3D *(1..35); def = fib(n); =20
if n < 2; return n; else; return fib(n - 1) + fib(n - 2); end; end; =20=
nums.each {|i| p fib(i)}"

wow that's cool - now that's a a seriously easy way to parallelize ;-)
I also found some weird bug where Thread#kill/exit from within the =20
thread interacts weirdly with join happening outside, and never =20
terminates. Fixing that now.


glad to have helped ;-)

i just pushed out 0.0.2 and it just lets the thread die rather that =20
self-destructing. see how that works...

cheers.

a @ http://codeforpeople.com/
 
A

ara.t.howard

Appears to work just dandy under JRuby:

=E2=9E=94 time jruby --server -rthreadify -e "nums =3D *(1..35); def = fib(n); =20
if n < 2; return n; else; return fib(n - 1) + fib(n - 2); end; end; =20=
nums.each {|i| p fib(i)}"

wow that's cool - now that's a a seriously easy way to parallelize ;-)
I also found some weird bug where Thread#kill/exit from within the =20
thread interacts weirdly with join happening outside, and never =20
terminates. Fixing that now.


glad to have helped ;-)

i just pushed out 0.0.2 and it just lets the thread die rather that =20
self-destructing. see how that works...

cheers.

a @ http://codeforpeople.com/
 
C

Charles Oliver Nutter

ara.t.howard said:
i just pushed out 0.0.2 and it just lets the thread die rather that
self-destructing. see how that works...

I fixed in JRuby just now (Thread#kill does an implicit join in JRuby to
make sure the thread dies...but if target == caller it was still trying
to join itself in a weird way) but basically breaking out of the loop
instead of Thread#exit solved it. Your 0.0.2 change is probably equivalent.

- Charlie
 
M

Michael Guterl

Appears to work just dandy under JRuby:

I was doing some comparison between threadify and peach with JRuby,
when I noticed some interesting behavior with using
Enumerator#to_enum.

Sample code and three different results are posted here:
http://pastie.org/230287. Each result randomly occurs and sometimes
the code produces no error whatsoever. MRI does not seem to exhibit
the same behavior.

I am not sure that what I am doing in the code is even reasonable,
however, I thought it might be worth pointing out.

threadify-0.0.2
jruby 1.1.3-dev (ruby 1.8.6 patchlevel 114) (2008-07-08 rev 7130) [i386-java]
java version "1.5.0_13"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_13-b05-237)
Java HotSpot(TM) Client VM (build 1.5.0_13-119, mixed mode, sharing)

OS X 10.5.4

Thanks,
Michael Guterl
 
A

ara.t.howard

Appears to work just dandy under JRuby:

I was doing some comparison between threadify and peach with JRuby,
when I noticed some interesting behavior with using
Enumerator#to_enum.

Sample code and three different results are posted here:
http://pastie.org/230287. Each result randomly occurs and sometimes
the code produces no error whatsoever. MRI does not seem to exhibit
the same behavior.

I am not sure that what I am doing in the code is even reasonable,
however, I thought it might be worth pointing out.

threadify-0.0.2
jruby 1.1.3-dev (ruby 1.8.6 patchlevel 114) (2008-07-08 rev 7130)
[i386-java]
java version "1.5.0_13"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_13-
b05-237)
Java HotSpot(TM) Client VM (build 1.5.0_13-119, mixed mode, sharing)

OS X 10.5.4

Thanks,
Michael Guterl



bunch of 'java.lang' stuff in there - i'm out! ;-)

a @ http://codeforpeople.com/
 
C

Charles Oliver Nutter

Michael said:
I was doing some comparison between threadify and peach with JRuby,
when I noticed some interesting behavior with using
Enumerator#to_enum.

Sample code and three different results are posted here:
http://pastie.org/230287. Each result randomly occurs and sometimes
the code produces no error whatsoever. MRI does not seem to exhibit
the same behavior.

I am not sure that what I am doing in the code is even reasonable,
however, I thought it might be worth pointing out.

Thanks for filing the bug. I'm looking into it now.

In general we have inserted synchronization code only where it really
appears to be necessary to maintain the integrity of data structures.
That means that in some cases, you need to be mindful of code actually
running in parallel against e.g. arrays, hashes, strings, and so on. But
we do want to reduce the possibility of a Java exception, so I'll
investigate a bit.

- Charlie
 
C

Charles Oliver Nutter

Charles said:
In general we have inserted synchronization code only where it really
appears to be necessary to maintain the integrity of data structures.
That means that in some cases, you need to be mindful of code actually
running in parallel against e.g. arrays, hashes, strings, and so on. But
we do want to reduce the possibility of a Java exception, so I'll
investigate a bit.

I did find a few threading bugs in JRuby, and I'm working on them now.
Most of them seem specific to Enumerator...

- Charlie
 
D

Daniel Berger

this one's for you charlie ;-)

NAME
=A0 =A0threadify.rb

SYNOPSIS
=A0 =A0enumerable =3D %w( a b c d )
=A0 =A0enumerable.threadify(2){ 'process this block using two worker =A0
threads' }

DESCRIPTION
=A0 =A0threadify.rb makes it stupid easy to process a bunch of data using= =A0
'n'
=A0 =A0worker threads

INSTALL
=A0 =A0gem installthreadify

URI
=A0 =A0http://rubyforge.org/projects/codeforpeople

SAMPLES

=A0 =A0<=3D=3D=3D=3D=3D=3D=3D=3D< sample/a.rb >=3D=3D=3D=3D=3D=3D=3D=3D>

=A0 =A0~ > cat sample/a.rb

=A0 =A0 =A0require 'open-uri'
=A0 =A0 =A0require 'yaml'

=A0 =A0 =A0require 'rubygems'
=A0 =A0 =A0require 'threadify'

=A0 =A0 =A0uris =3D
=A0 =A0 =A0 =A0%w(
=A0 =A0 =A0 =A0 =A0http://google.com
=A0 =A0 =A0 =A0 =A0http://yahoo.com
=A0 =A0 =A0 =A0 =A0http://rubyforge.org
=A0 =A0 =A0 =A0 =A0http://ruby-lang.org
=A0 =A0 =A0 =A0 =A0http://kcrw.org
=A0 =A0 =A0 =A0 =A0http://drawohara.com
=A0 =A0 =A0 =A0 =A0http://codeforpeople.com
=A0 =A0 =A0 =A0)

=A0 =A0 =A0time 'withoutthreadify' do
=A0 =A0 =A0 =A0uris.each do |uri|
=A0 =A0 =A0 =A0 =A0body =3D open(uri){|pipe| pipe.read}
=A0 =A0 =A0 =A0end
=A0 =A0 =A0end

=A0 =A0 =A0time 'withthreadify' do
=A0 =A0 =A0 =A0uris.threadifydo |uri|
=A0 =A0 =A0 =A0 =A0body =3D open(uri){|pipe| pipe.read}
=A0 =A0 =A0 =A0end
=A0 =A0 =A0end

=A0 =A0 =A0BEGIN {
=A0 =A0 =A0 =A0def time label
=A0 =A0 =A0 =A0 =A0a =3D Time.now.to_f
=A0 =A0 =A0 =A0 =A0yield
=A0 =A0 =A0 =A0ensure
=A0 =A0 =A0 =A0 =A0b =3D Time.now.to_f
=A0 =A0 =A0 =A0 =A0y label =3D> (b - a)
=A0 =A0 =A0 =A0end
=A0 =A0 =A0}

=A0 =A0~ > ruby sample/a.rb

=A0 =A0 =A0---
=A0 =A0 =A0withoutthreadify: 7.41900205612183
=A0 =A0 =A0---
=A0 =A0 =A0withthreadify: 3.69886112213135

Pretty cool. I tried it with file-find. Here was the code:

require 'file/find'
require 'threadify'

rule =3D File::Find.new(
:pattern =3D> "*.rb",
:path =3D> "C:\\ruby"
)

start =3D Time.now

rule.find.threadify(10){ |f|
p f
}

p start
p Time.now

Without threadify, it took 1:40 on my laptop. With threadify(10) it
dropped to 44 seconds.

I think I'll add a "threads" option directly, and borrow some of your
code. :)

Thanks,

Dan
 
A

ara.t.howard

Pretty cool. I tried it with file-find. Here was the code:

require 'file/find'
require 'threadify'

rule = File::Find.new(
:pattern => "*.rb",
:path => "C:\\ruby"
)

start = Time.now

rule.find.threadify(10){ |f|
p f
}

p start
p Time.now

Without threadify, it took 1:40 on my laptop. With threadify(10) it
dropped to 44 seconds.

I think I'll add a "threads" option directly, and borrow some of your
code. :)

Thanks,

Dan



sweet. i wouldn't launch rockets with it - but it a cheap speedup for
a bunch of ruby code. btw - check out my find method

http://codeforpeople.com/lib/ruby/alib/alib-0.5.1/lib/alib-0.5.1/find2.rb

very stolen and hacked


a @ http://codeforpeople.com/
 
C

Charles Oliver Nutter

Michael said:
I am not sure that what I am doing in the code is even reasonable,
however, I thought it might be worth pointing out.

Ok, there's good news and bad news. First the good news.

I've found several egregious threading bugs in JRuby's Enumerable
implementation that probably caused the bulk of errors you saw.
Basically, the runtime information for the main Ruby thread in JRuby was
getting reused by the blocks passed into threadify, causing all sorts of
wacky errors (multiple threads all sharing runtime thread data...fun!).
Fixing that seems to have resolved most of the errors.

Now the bad news...

What you're doing is a bit suspect. In this case, it works out
reasonable well, since you're just doing a map and gathering results.
There's some remaining bugs in JRuby wrt the temporary data structure
used to gather map results (it needs to be made thread-safe) but it can
work. However in general I don't think this use of threadify is going to
apply well to Enumera(ble|tor) since so many of the operations depend on
the result of the previous iteration.

I'll have the remaining issues wrapped up shortly, but I'd love to see
someone come up with a safe set of Enumerable-like operations that can
run in parallel. For example, a detect that uses a cross-thread trigger
to stop all iterations (rather than the naive threadification of detect
which would not propagate a successful detection out of the thread).
Things like that could be very useful.

I'd also love to see someone come up with a nice installable gem of
truly thread-safe wrappers around the core collections, since in general
I don't believe the core array and friends should suffer the perf
penalty that comes from always synchronizing.

- Charlie
 
A

ara.t.howard

Ok, there's good news and bad news. First the good news.

I've found several egregious threading bugs in JRuby's Enumerable
implementation that probably caused the bulk of errors you saw.
Basically, the runtime information for the main Ruby thread in JRuby
was getting reused by the blocks passed into threadify, causing all
sorts of wacky errors (multiple threads all sharing runtime thread
data...fun!). Fixing that seems to have resolved most of the errors.

Now the bad news...

What you're doing is a bit suspect. In this case, it works out
reasonable well, since you're just doing a map and gathering
results. There's some remaining bugs in JRuby wrt the temporary data
structure used to gather map results (it needs to be made thread-
safe) but it can work. However in general I don't think this use of
threadify is going to apply well to Enumera(ble|tor) since so many
of the operations depend on the result of the previous iteration.

I'll have the remaining issues wrapped up shortly, but I'd love to
see someone come up with a safe set of Enumerable-like operations
that can run in parallel. For example, a detect that uses a cross-
thread trigger to stop all iterations (rather than the naive
threadification of detect which would not propagate a successful
detection out of the thread). Things like that could be very useful.

I'd also love to see someone come up with a nice installable gem of
truly thread-safe wrappers around the core collections, since in
general I don't believe the core array and friends should suffer the
perf penalty that comes from always synchronizing.

check out 0.0.3, it allows this, but the sync overhead is prohibitive
for in memory stuff - for network scraping it'd be great though.
anyhow, 0.0.3 allows one the 'break' from parallel processing and the
value broken with will be the same as if the jobs were run serially.
damn tricky that.

cheers.

a @ http://codeforpeople.com/
 
M

Michael Guterl

Ok, there's good news and bad news. First the good news.

I've found several egregious threading bugs in JRuby's Enumerable
implementation that probably caused the bulk of errors you saw. Basically,
the runtime information for the main Ruby thread in JRuby was getting reused
by the blocks passed into threadify, causing all sorts of wacky errors
(multiple threads all sharing runtime thread data...fun!). Fixing that seems
to have resolved most of the errors.

Now the bad news...

What you're doing is a bit suspect. In this case, it works out reasonable
well, since you're just doing a map and gathering results. There's some
remaining bugs in JRuby wrt the temporary data structure used to gather map
results (it needs to be made thread-safe) but it can work. However in
general I don't think this use of threadify is going to apply well to
Enumera(ble|tor) since so many of the operations depend on the result of the
previous iteration.

I'll have the remaining issues wrapped up shortly, but I'd love to see
someone come up with a safe set of Enumerable-like operations that can run
in parallel. For example, a detect that uses a cross-thread trigger to stop
all iterations (rather than the naive threadification of detect which would
not propagate a successful detection out of the thread). Things like that
could be very useful.

I'd also love to see someone come up with a nice installable gem of truly
thread-safe wrappers around the core collections, since in general I don't
believe the core array and friends should suffer the perf penalty that comes
from always synchronizing.

Thanks Charlie, I just verified that my script no longer crashes with
my latest pull of JRuby.

jruby 1.1.3-dev (ruby 1.8.6 patchlevel 114) (2008-07-12 rev 7146) [i386-java]

Regards,
Michael Guterl
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

Forum statistics

Threads
473,969
Messages
2,570,161
Members
46,705
Latest member
Stefkari24

Latest Threads

Top