Process or Thread?

Pito Salas · Aug 22, 2009

I have a parent application (which I think of as a test harness) that
wants to invoke a fairly intensive image processing application against
a directory full of image files. Each image is processed independently.

So, to get performance, I wanted to get the work happening on each of
those images in parallel. So I could divide the files in the directory
into two sets, and submit one set for processing in one process/thread
and the other set in another process/thread. Note that the
sub-process/threads are almost totally separate from the parent app, so
relatively little information needs to go back and forth.

Here is what I've learned so far from reading two books and lots of
googling:

One point is that there's no process support on Windows, which isn't a
deal killer for me.

Another point is the operation on multi-core CPUs: processes will, and
threads will not use the mutliple cores. This too is fairly "don't care"
for me.

I am interested in ease of implementation and debugging. And I am also
very interested in getting the cpu and disk active at the same time as
there is a fairly large amount of data to be read form the disk.

What are your recommendations?

7stud -- · Aug 22, 2009

Pito said:
I have a parent application (which I think of as a test harness) that
wants to invoke a fairly intensive image processing application against
a directory full of image files. Each image is processed independently.

It doesn't sound like your situation will result in improved performance
with threads. Things don't actually get done at the same time with
threads--that's an illusion. What happens is that there is very fast
switching between different tasks. However, if your tasks do not have
dead time during the processing, then using threads won't improve
performance. For instance, suppose you have two tasks that each take 3
minutes to complete. The processing might happen in this order with
threads:

task1: 1 minute
task2: 1 minute
task1: 1 minute
task2: 1 minute
task1: 1 minute
task2: 1 minute
--------------
total = 6 minutes

But if you just ran each task sequentially without using threads, the
total time would also be 6 minutes. Using threads will only speed up
processing time if your tasks have idle time when they are doing
nothing. During that down time, if you switch to another task in
another thread, then total processing time will be lower.

Mario Camou · Aug 22, 2009

Pito Salas wrote:

It doesn't sound like your situation will result in improved performance

with threads. Things don't actually get done at the same time with
threads--that's an illusion. What happens is that there is very fast
switching between different tasks. However, if your tasks do not have
dead time during the processing, then using threads won't improve
performance. For instance, suppose you have two tasks that each take 3
minutes to complete. The processing might happen in this order with
threads:

task1: 1 minute
task2: 1 minute
task1: 1 minute
task2: 1 minute
task1: 1 minute
task2: 1 minute

That's true if you=B4re running MRI, since it uses "green" threads (i.e., =
you
really have a single OS-level thread that gets task-switched by Ruby
itself). However, if you run on JRuby, the Ruby Thread support gets mapped
onto the Java Thread support, which *does* map to OS-level threads and
therefore will take advantage of multiple cores if you have them. In that
case you *would* get faster processing.

Hope this helps,
-Mario.

Gary Wright · Aug 22, 2009

That's true if you=B4re running MRI, since it uses "green" threads =20=

(i.e., you
really have a single OS-level thread that gets task-switched by Ruby
itself). However, if you run on JRuby, the Ruby Thread support gets =20=

mapped
onto the Java Thread support, which *does* map to OS-level threads and
therefore will take advantage of multiple cores if you have them. In =20=

that
case you *would* get faster processing.

For a CPU intensive task (image processing), i doubt that two OS
threads running on two core's is going to be any more efficient
than two processes running on two cores. Multi-threading introduces
complications that are neatly avoided by using multiple processes.
I'd much rather deal with a multi-process architecture than a
multi-threaded architecture.

Gary Wright

Pito Salas · Aug 22, 2009

Gary said:
For a CPU intensive task (image processing), i doubt that two OS
threads running on two core's is going to be any more efficient
than two processes running on two cores. Multi-threading introduces
complications that are neatly avoided by using multiple processes.
I'd much rather deal with a multi-process architecture than a
multi-threaded architecture.

Gary Wright

Thanks all for your responses.

A note: he files being processed are quite large and numerous. So
there's also plenty of file IO that has to happen. In the vanilla 'green
thread' case, would you expect performance improvements, because while
one thread was blocked for IO the other one could run?

Thanks again,

Pito

Kent Friis · Aug 23, 2009

Den Sat, 22 Aug 2009 09:30:36 -0500 skrev Pito Salas:

I have a parent application (which I think of as a test harness) that
wants to invoke a fairly intensive image processing application against
a directory full of image files. Each image is processed independently.

So, to get performance, I wanted to get the work happening on each of
those images in parallel. So I could divide the files in the directory
into two sets, and submit one set for processing in one process/thread
and the other set in another process/thread. Note that the
sub-process/threads are almost totally separate from the parent app, so
relatively little information needs to go back and forth.

Here is what I've learned so far from reading two books and lots of
googling:

One point is that there's no process support on Windows, which isn't a
deal killer for me.

Not quite. Look in Task Manager, there is a list of processes running.
What Windows possibly lacks is fork(), the unix way of creating
processes. It does however have CreateProcess (I think that's what
is called), which behaves like fork+exec.

If you split the "controller" process and the "worker" process into
two different programs, it won't be a problem. If you insist on
having them as one program, you'll need to do a bit more work
(add a comamnd line argument telling the new process that it's a
worker process).

Another point is the operation on multi-core CPUs: processes will, and
threads will not use the mutliple cores. This too is fairly "don't care"
for me.

Native threads will, Ruby green threads won't.

I am interested in ease of implementation and debugging.

Debugging is lots easier with processes, as one process cannot
accidentally overwrite data of another (shared memory is possible,
but needs to be allocated explicitly).

That may not be as big a problem with Ruby green threads, as the
runtime knows what each thread is up to.

And I am also
very interested in getting the cpu and disk active at the same time as
there is a fairly large amount of data to be read form the disk.

What are your recommendations?

I would go for processes. But that's coming from C, where there is no
runtime keeping track of what each thread is doing. With processes,
the OS will prevent one OS from overwriting the data of another.

/Kent

Gary Wright · Aug 23, 2009

A note: he files being processed are quite large and numerous. So
there's also plenty of file IO that has to happen. In the vanilla
'green
thread' case, would you expect performance improvements, because while
one thread was blocked for IO the other one could run?

Whether you use threads or processes your CPU-bound tasks will run while
your IO-bound tasks are waiting for the disk.

Gary Wright

Robert Klemme · Aug 23, 2009

Debugging is lots easier with processes, as one process cannot
accidentally overwrite data of another (shared memory is possible,
but needs to be allocated explicitly).

IMHO a multitude of processes does not necessarily ease debugging. If
you need to find out which process is running berserk or exhibiting a
bug that may be more difficult than debugging of a single interpreter
process. Also, if there are communication issues between two processes
that may be difficult to debug as well.

Having said that, both approaches are pretty easy to implement, given
that DRb is a full fledged remote object call feature (similar to RMI
and CORBA).

Kind regards

robert

Charles Oliver Nutter · Sep 3, 2009

For a CPU intensive task (image processing), i doubt that two OS
threads running on two core's is going to be any more efficient
than two processes running on two cores. =C2=A0Multi-threading introduces
complications that are neatly avoided by using multiple processes.
I'd much rather deal with a multi-process architecture than a
multi-threaded architecture.

You're correct, if the processes don't talk to each other. But if you
have to pass information across processes, things suddenly get a lot
more tangled and IPC-bound than with threads. It's a tradeoff, as
always.

- Charlie

Hover state on an element stuttering when I'm close to the edge, or move my mouse really fast	1	Feb 2, 2023
[C Language] Need help transferring Linux CodeBlocks Project to Windows CodeBlocks Project	1	Jun 19, 2023
What are the technologies (emerging or not emerging), standards, rules or techniques used for this POST case ?	10	Feb 29, 2024
Thread Variable race condition	1	Jan 13, 2011
Thread VS process	3	Sep 15, 2010
Asynchronous process to process pipe IO	11	Aug 24, 2009
exec (new process or new thread?) to continue	5	Feb 24, 2009
Help wanted to modify Gimp Script-fu : will pay	0	Aug 26, 2022

Process or Thread?

Pito Salas

7stud --

Mario Camou

Gary Wright

Pito Salas

Kent Friis

Gary Wright

Robert Klemme

Charles Oliver Nutter

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads