How to wait for multiple threads to finish

G

Gary J

I have a java program that spawns a lot of threads to process a very large
tree data model. How do I force a "parent" thread to wait until all of its
"child" threads are finished.

Here's a totally contrived example. TreeCounter simply counts the elements
in a tree of Elements:

public class TreeCounter
{
private int count;

void countChildren(Element element)
{
count++;

List children = element.getChildren();
for (Iterator iter = children.iterator(); iter.hasNext();)
{
Element child = iter.next();

new Thread()
{

public void run()
{
countChildren(child)
}

}.start();

}

// wait for all threads created by addChild() to finish before returning

// print out results
System.out.println(count + "elements processed");
}

}

This code was written just for this post so there may be some errors but you
get the idea.

In the above code, each incarnation of countChildren() can create multiple
threads.

How can I ensure that I don't get to the println() statement before all the
threads have finished? I know I can make threads run one at a time by
making a method "synchronized" but here I really want to make sure they all
run at the same time;

Thanks
 
F

Frank

Gary said:
I have a java program that spawns a lot of threads to process a very large
tree data model. How do I force a "parent" thread to wait until all of its
"child" threads are finished.

How can I ensure that I don't get to the println() statement before all the
threads have finished? I know I can make threads run one at a time by
making a method "synchronized" but here I really want to make sure they all
run at the same time;

Thanks

Didn't read your example, but here we go;

One way to solve this would be to keep a count of the running threads.
Increase the counter for each thread created and have the threads
decrease it again just before leaving the run() method. Once the counter
hits zero again, the main thread can continue.

A few implementation hints:
1. You need to synchronize access to the counter.
2. Make the counter volatile, it's going to be accessed by multiple threads.
3. To prevent unneccessary polling from the main thread, have it wait()
on the lock object and make the worker threads notify() before leaving
the synchronized method.

-Frank
 
P

Paul Lutus

Gary said:
I have a java program that spawns a lot of threads to process a very large
tree data model. How do I force a "parent" thread to wait until all of
its "child" threads are finished.


1. Create and start the threads. Each thread increments a synchronized
global thread counter.

2. Have each thread signal that is is finished by decrementing the global
counter.

3. Put the main process on a timer that periodically tests the counter, and
idles the rest of the time. When the count gets to zero, take the next
action.
 
M

Michael Borgwardt

Gary said:
I have a java program that spawns a lot of threads to process a very large
tree data model.

Is it running on a computer with a lot of processors? otherwise, using many
threads is just a waste of CPU time.
 
C

Chris Uppal

Gary said:
I have a java program that spawns a lot of threads to process a very large
tree data model. How do I force a "parent" thread to wait until all of
its "child" threads are finished.

Thread.join() is designed for exactly this. Simply put each thread you create
into some sort of Collection (an ArrayList for example) and then use a loop to
call join() on each in turn. If the "joinee" has finished then join() will
return immediately, if not then it will wait until it has finished. By the end
of the loop you can be sure that all of them have finished.

-- chris
 
G

Gary J

I thought of using a counter but it seemed too simple a solution. I was a
little unsure what to make volatile. I made counter volatile but its inside
a synchronized method so does it matter?

For those interested Here's what I implemented:

class ThreadLock
{
volatile int counter;

synchronized void add()
{
System.out.println(counter + " threads.");
counter++;
}

synchronized void del()
{
counter--;
System.out.println(counter + " threads.");
notify();
}

synchronized int getCount()
{
return counter;
}
}

in the Main Method I declare:

private ThreadLock threads = new ThreadLock();

and the threads are created like this:

threads.add();
new Thread()
{
public void run()
{
doStuff()
threads.del();
}
}.start();


The main method checks for all the threads to be completed like this:

while (threads.getCount() > 0)
try
{
threads.wait();
} catch(Exception e)
{}
System.out.println("Worker threads complete.");


Is it possible to put this last code inside the ThreadLock class? In other
words:
class ThreadLock
{
....

void resume()
{
while (getCount() > 0)
try
{
wait();
} catch(Exception e)
{}
}
}

then the main nethod would just call:
counter.resume();

thanks




Frank said:
I have a java program that spawns a lot of threads to process a very large
tree data model. How do I force a "parent" thread to wait until all of its
"child" threads are finished.

How can I ensure that I don't get to the println() statement before all the
threads have finished? I know I can make threads run one at a time by
making a method "synchronized" but here I really want to make sure they all
run at the same time;

Thanks

Didn't read your example, but here we go;

One way to solve this would be to keep a count of the running threads.
Increase the counter for each thread created and have the threads
decrease it again just before leaving the run() method. Once the counter
hits zero again, the main thread can continue.

A few implementation hints:
1. You need to synchronize access to the counter.
2. Make the counter volatile, it's going to be accessed by multiple threads.
3. To prevent unneccessary polling from the main thread, have it wait()
on the lock object and make the worker threads notify() before leaving
the synchronized method.

-Frank
 
G

Gary J

In this case it is indeed running on a multi-processor computer. Running
without multiple threads takes about 120% longer than running with them.
The more threads, the faster it runs. There's probably a point of
diminishing returns of course.


Gary said:
I have a java program that spawns a lot of threads to process a very large
tree data model.

Is it running on a computer with a lot of processors? otherwise, using many
threads is just a waste of CPU time.
 
G

Gary J

You're absolutely correct.

When I initially saw the join() method I thought my problems were over but I
discovered a possible problem with that approach.

The problem is that in traversing a tree threads are created and finished
the time which means the contents of the Collection would constantly change.
The main method would have to constantly recheck and join to all the threads
in the list on every pass until there were no more threads to join to. At
that point I'm really doing the same thing as checking a counter but with
more work :)

Of course I could be mistaken and haven't tried this one out yet. There's
probably a flaw in my logic.


Gary said:
I have a java program that spawns a lot of threads to process a very large
tree data model. How do I force a "parent" thread to wait until all of
its "child" threads are finished.

Thread.join() is designed for exactly this. Simply put each thread you
create
into some sort of Collection (an ArrayList for example) and then use a loop
to
call join() on each in turn. If the "joinee" has finished then join() will
return immediately, if not then it will wait until it has finished. By the
end
of the loop you can be sure that all of them have finished.

-- chris
 
A

anon

Gary said:
In this case it is indeed running on a multi-processor computer. Running
without multiple threads takes about 120% longer than running with them.
The more threads, the faster it runs. There's probably a point of
diminishing returns of course.
Absolutely. The rule of thumb I've always heard is no more
active/running threads _per_process_ than 2x the number of available CPUs.

YMMV, but I've been more-or-less obeying that limit for years, and it
does seem to hold true for *most* situations.
 
B

Babu Kalakrishnan

Michael said:
Is it running on a computer with a lot of processors? otherwise, using many
threads is just a waste of CPU time.

Not necessarily. Consider a case when each thread is waiting for some inputs
that would arrive at some indeterminate time. Processing in parallel instead of
serially would allow the inputs that arrived to be processed while other threads
(probably started first) are still waiting.

BK
 
M

Michael Borgwardt

Not necessarily. Consider a case when each thread is waiting for some inputs
that would arrive at some indeterminate time. Processing in parallel instead of
serially would allow the inputs that arrived to be processed while other threads
(probably started first) are still waiting.

True, but it seems pretty clear that this is not such a case.
 
M

Michael Borgwardt

Gary said:
In this case it is indeed running on a multi-processor computer. Running
without multiple threads takes about 120% longer than running with them.
The more threads, the faster it runs. There's probably a point of
diminishing returns of course.

If the problem is purely CPU-bound, then this point should be reached exactly
when the number of threads equals the number of processors, if not earlier
(if the threads have to synchronize a lot, see Amdahl's law).
 
C

Chris Uppal

Michael said:
Is it running on a computer with a lot of processors? otherwise, using
many threads is just a waste of CPU time.

Just for interest: this is not necessarily true even for compute-bound
algorithms.

Consider the case where there are two techniques for solving a given problem,
one is slow but is guaranteed to find the correct answer, the other is 10 times
faster but only gets the right answer about half the time (and the other cases
are easily detectable /after/ they've failed but are hard to recognise in
advance). In that case, attempting to solve the problem using two threads (one
for each technique) will reduce the average time to produce an answer to about
65%, plus the overhead of the extra thread and thread switching. Obviously
this can be generalised; for instance the analysis is essentially the same if
you have different techniques that all always work, but are fast in different
cases.

That's just for academic interest -- I don't think I've ever encountered such a
situation in a real application.

-- chris
 
C

Chris Uppal

Gary said:
The problem is that in traversing a tree threads are created and finished
the time which means the contents of the Collection would constantly
change.

Hmm, then that is actually a different problem -- you are not trying to wait
for threads to finish, you are trying to detect when <something else> has
/stopped/ creating new threads.

There are loads of ways of doing that. If I had that kind of system, then my
first idea would be to handle it by introducing the simple convention that any
thread that started another thread had to join() with it before dying itself.

If that doesn't work for you, then the kinds of counting solutions that have
been discussed earlier are an option. I've rather lost track of where the rest
of the thread has got to, are you happy that you now know how to do that ?

BTW, do take the warnings about not expecting magic from threads seriously --
it sounds to me as if you may be overdoing the threading thing (unless this is
some sort of academic exercise).

-- chris
 
J

John C. Bollinger

Gary said:
You're absolutely correct.

When I initially saw the join() method I thought my problems were over but I
discovered a possible problem with that approach.

The problem is that in traversing a tree threads are created and finished
the time which means the contents of the Collection would constantly change.
The main method would have to constantly recheck and join to all the threads
in the list on every pass until there were no more threads to join to. At
that point I'm really doing the same thing as checking a counter but with
more work :)

Of course I could be mistaken and haven't tried this one out yet. There's
probably a flaw in my logic.

Apply the join() technique in each thread that starts threads. Each
thread will have its own collection of the Threads it started, which
will not change once that thread stops creating new Threads and starts
join()ing the ones it created.


John Bollinger
(e-mail address removed)
 
T

thoff

Concurrency is way of structuring programs. It's not really related
to the number of CPUs. Some languages like java make this costly.
Other languages like erlang do not.
 
X

xarax

thoff said:
Concurrency is way of structuring programs. It's not really related
to the number of CPUs. Some languages like java make this costly.
Other languages like erlang do not.

Even on uni-processors, if the threads are using
I/O, then concurrency can help tremendously. Those
threads are "working" will get service while the
other threads that are "waiting" on I/O won't impact
the application through-put.

Note that I/O also includes virtual paging on
most systems. However, there is little that a
Java application can do to isolate its virtual
storage usage (thread local storage comes to
mind, but most folks rarely use that).
 
C

Chris Uppal

thoff said:
Concurrency is way of structuring programs. It's not really related
to the number of CPUs. Some languages like java make this costly.
Other languages like erlang do not.

It's not really a language issue, as such. It'd be perfectly possible to
create a Java implementation (or JVM) where threads were as lightweight as in
Erlang.

The implementors have a choice in this matter. At one extreme they can
implement Java threads as real OS-level threads, which on most OSes is going to
mean that threads are heavyweight things. That has the considerable advantage
that Java programs are then using the same implementation of concurrency as any
other code (e.g. system calls only block the one thread, the OS can maps
threads to CPUs, threaded external libraries work as expected, ...). At the
other extreme the implementors can go for very lightweight threads; that would
lose the advantages of "real" OS threads, but enable (for instance) an
Erlang-style of programming with many small threads (and be a /good deal/
easier to implement, too).

To some extent the decision will reflect what programmers are actually /doing/
with threads, and that in turn will depend on programmers' expectations of how
threads are implemented. The Java programming community seems to have settled
on an expectation that threads are heavyweight, and so it makes sense for JVM
designers to optimise for the resulting patterns of usage.

(In any case, I'm not convinced that the Erlang way of programming with
micro-threads would work so well without Erlang's "share nothing" semantics, or
something similar.)

(Of course, all this assumes that the OS's threads are not themselves
lightweight, and it also assumes that a hybrid model with both lightweight and
heavyweight flavours of threads would be infeasible.)

-- chris
 
T

thoff

Chris said:
(In any case, I'm not convinced that the Erlang way of programming with
micro-threads would work so well without Erlang's "share nothing" semantics, or
something similar.)

That's why it's a program language issue. With a share nothing
model and a global heap for efficient message sends, mere mortals can
make concurrent programs work. Not something you can say about java.
I could make an actor model and i could say everyone don't touch stuff
in other actors, but it would never work as people will touch
stuff, even assuming that threads could be made efficient enough to
handle high levels of concurrency.
 
C

Chris Uppal

thoff said:
That's why it's a program language issue. With a share nothing
model and a global heap for efficient message sends, mere mortals can
make concurrent programs work. Not something you can say about java.

Fair point, I think.

-- chris
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,819
Latest member
masterdaster

Latest Threads

Top