Process or Thread?

P

Pito Salas

I have a parent application (which I think of as a test harness) that
wants to invoke a fairly intensive image processing application against
a directory full of image files. Each image is processed independently.

So, to get performance, I wanted to get the work happening on each of
those images in parallel. So I could divide the files in the directory
into two sets, and submit one set for processing in one process/thread
and the other set in another process/thread. Note that the
sub-process/threads are almost totally separate from the parent app, so
relatively little information needs to go back and forth.

Here is what I've learned so far from reading two books and lots of
googling:

One point is that there's no process support on Windows, which isn't a
deal killer for me.

Another point is the operation on multi-core CPUs: processes will, and
threads will not use the mutliple cores. This too is fairly "don't care"
for me.

I am interested in ease of implementation and debugging. And I am also
very interested in getting the cpu and disk active at the same time as
there is a fairly large amount of data to be read form the disk.

What are your recommendations?
 
7

7stud --

Pito said:
I have a parent application (which I think of as a test harness) that
wants to invoke a fairly intensive image processing application against
a directory full of image files. Each image is processed independently.

It doesn't sound like your situation will result in improved performance
with threads. Things don't actually get done at the same time with
threads--that's an illusion. What happens is that there is very fast
switching between different tasks. However, if your tasks do not have
dead time during the processing, then using threads won't improve
performance. For instance, suppose you have two tasks that each take 3
minutes to complete. The processing might happen in this order with
threads:

task1: 1 minute
task2: 1 minute
task1: 1 minute
task2: 1 minute
task1: 1 minute
task2: 1 minute
--------------
total = 6 minutes

But if you just ran each task sequentially without using threads, the
total time would also be 6 minutes. Using threads will only speed up
processing time if your tasks have idle time when they are doing
nothing. During that down time, if you switch to another task in
another thread, then total processing time will be lower.
 
M

Mario Camou

Pito Salas wrote:
It doesn't sound like your situation will result in improved performance
with threads. Things don't actually get done at the same time with
threads--that's an illusion. What happens is that there is very fast
switching between different tasks. However, if your tasks do not have
dead time during the processing, then using threads won't improve
performance. For instance, suppose you have two tasks that each take 3
minutes to complete. The processing might happen in this order with
threads:

task1: 1 minute
task2: 1 minute
task1: 1 minute
task2: 1 minute
task1: 1 minute
task2: 1 minute

That's true if you=B4re running MRI, since it uses "green" threads (i.e., =
you
really have a single OS-level thread that gets task-switched by Ruby
itself). However, if you run on JRuby, the Ruby Thread support gets mapped
onto the Java Thread support, which *does* map to OS-level threads and
therefore will take advantage of multiple cores if you have them. In that
case you *would* get faster processing.

Hope this helps,
-Mario.
 
G

Gary Wright

That's true if you=B4re running MRI, since it uses "green" threads =20=
(i.e., you
really have a single OS-level thread that gets task-switched by Ruby
itself). However, if you run on JRuby, the Ruby Thread support gets =20=
mapped
onto the Java Thread support, which *does* map to OS-level threads and
therefore will take advantage of multiple cores if you have them. In =20=
that
case you *would* get faster processing.

For a CPU intensive task (image processing), i doubt that two OS
threads running on two core's is going to be any more efficient
than two processes running on two cores. Multi-threading introduces
complications that are neatly avoided by using multiple processes.
I'd much rather deal with a multi-process architecture than a
multi-threaded architecture.

Gary Wright
 
P

Pito Salas

Gary said:
For a CPU intensive task (image processing), i doubt that two OS
threads running on two core's is going to be any more efficient
than two processes running on two cores. Multi-threading introduces
complications that are neatly avoided by using multiple processes.
I'd much rather deal with a multi-process architecture than a
multi-threaded architecture.

Gary Wright

Thanks all for your responses.

A note: he files being processed are quite large and numerous. So
there's also plenty of file IO that has to happen. In the vanilla 'green
thread' case, would you expect performance improvements, because while
one thread was blocked for IO the other one could run?

Thanks again,

Pito
 
K

Kent Friis

Den Sat, 22 Aug 2009 09:30:36 -0500 skrev Pito Salas:
I have a parent application (which I think of as a test harness) that
wants to invoke a fairly intensive image processing application against
a directory full of image files. Each image is processed independently.

So, to get performance, I wanted to get the work happening on each of
those images in parallel. So I could divide the files in the directory
into two sets, and submit one set for processing in one process/thread
and the other set in another process/thread. Note that the
sub-process/threads are almost totally separate from the parent app, so
relatively little information needs to go back and forth.

Here is what I've learned so far from reading two books and lots of
googling:

One point is that there's no process support on Windows, which isn't a
deal killer for me.

Not quite. Look in Task Manager, there is a list of processes running.
What Windows possibly lacks is fork(), the unix way of creating
processes. It does however have CreateProcess (I think that's what
is called), which behaves like fork+exec.

If you split the "controller" process and the "worker" process into
two different programs, it won't be a problem. If you insist on
having them as one program, you'll need to do a bit more work
(add a comamnd line argument telling the new process that it's a
worker process).
Another point is the operation on multi-core CPUs: processes will, and
threads will not use the mutliple cores. This too is fairly "don't care"
for me.

Native threads will, Ruby green threads won't.
I am interested in ease of implementation and debugging.

Debugging is lots easier with processes, as one process cannot
accidentally overwrite data of another (shared memory is possible,
but needs to be allocated explicitly).

That may not be as big a problem with Ruby green threads, as the
runtime knows what each thread is up to.
And I am also
very interested in getting the cpu and disk active at the same time as
there is a fairly large amount of data to be read form the disk.

What are your recommendations?

I would go for processes. But that's coming from C, where there is no
runtime keeping track of what each thread is doing. With processes,
the OS will prevent one OS from overwriting the data of another.

/Kent
 
G

Gary Wright

A note: he files being processed are quite large and numerous. So
there's also plenty of file IO that has to happen. In the vanilla
'green
thread' case, would you expect performance improvements, because while
one thread was blocked for IO the other one could run?

Whether you use threads or processes your CPU-bound tasks will run while
your IO-bound tasks are waiting for the disk.

Gary Wright
 
R

Robert Klemme

Debugging is lots easier with processes, as one process cannot
accidentally overwrite data of another (shared memory is possible,
but needs to be allocated explicitly).

IMHO a multitude of processes does not necessarily ease debugging. If
you need to find out which process is running berserk or exhibiting a
bug that may be more difficult than debugging of a single interpreter
process. Also, if there are communication issues between two processes
that may be difficult to debug as well.

Having said that, both approaches are pretty easy to implement, given
that DRb is a full fledged remote object call feature (similar to RMI
and CORBA).

Kind regards

robert
 
C

Charles Oliver Nutter

For a CPU intensive task (image processing), i doubt that two OS
threads running on two core's is going to be any more efficient
than two processes running on two cores. =C2=A0Multi-threading introduces
complications that are neatly avoided by using multiple processes.
I'd much rather deal with a multi-process architecture than a
multi-threaded architecture.

You're correct, if the processes don't talk to each other. But if you
have to pass information across processes, things suddenly get a lot
more tangled and IPC-bound than with threads. It's a tradeoff, as
always.

- Charlie
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,954
Messages
2,570,116
Members
46,704
Latest member
BernadineF

Latest Threads

Top