T
Tassilo v. Parseval
Also sprach (e-mail address removed):
Being silly sometimes helps in making a point.
No, it wouldn't. The point is that spawning of a new process per data
item is necessary with Parallel::ForkManager but not with threads since
they can be reused due to their ability to share data. It would be
deliberate stupidity to spawn off a new thread per item instead of
reusing the existing ones.
That only appears to work for a small set of shared data. With 10.000
items the forks-version merely sits there and doesn't appear to do
anything.
I am sure this is all very nice but also, as you say, unpublished.
This is already going into the direction of MPI where the global pool of
data is (ideally) once distributed evenly among the processors which
then do their work. After each one has computed its sub-result they are
gathered again.
But this is a different domain altogether. Threads and fork() are useful
for those problems that are not CPU- but network- or event-bound. As
such their job is to make a program more responsive by doing things in
parallel that are slow because a resource is involved that may be shared
without making each request even slower.
I am not even sure that the problem at hand (namely parsing files) is
one of those. If the files are on separate disks it could benefit from
using as many threads/processes as there are disks. More parallel units
are likely to make it slower than a single-threaded application because
of additional overhead incurred by the reader's head jumping around on
the disk.
It was undoubtedly a contrived example I chose. The work per work unit
was so miniscule that the particular overhead of each of the two
solutions became the dominating factor.
Parallelizing is a huge intellectual problem for every programmer and
many parallel programs are inherently hard to understand. I haven't yet
found a paradigm that is truely intuitive. The best I've come across so
far is Ada's task-oriented approach but I've seen no other programming
language using this model.
Second best is threads but an already existing serial solution needs to
be rewritten to fit into it.
Then there are processes which are good when no communication between
those is required. Once pieces of data have to be exchanged it gets
ugly. The code is inflated with boring code that keeps the processes
synchronized, reads from pipes, makes the programmer wonder why there is
a deadlock etc.
The fourth approach, explicit message-passing through means like MPI, I
don't really count as this is clearly targeted at scientific computation
and requires spiffy multi-processor mainframes to be benificial.
Yes, objects don't play nice with threads as of now. The manpage of
threads::shared says this is going to be fixed some day. We'll see.
Tassilo
I agree that if you are going to do something incredibly silly, like
parallelizing something that has no legitimate need to be parallelized,
then using ForkManager is probably not the best choice. But then again,
doing silly things is generally not the best choice in the first place,
unless your goal is to be silly (which is a noble goal in itself,
sometimes).
Being silly sometimes helps in making a point.
Hardly equivalent. The equivalent thread implementation to the above fork
code would be to spawn one thread per item.
No, it wouldn't. The point is that spawning of a new process per data
item is necessary with Parallel::ForkManager but not with threads since
they can be reused due to their ability to share data. It would be
deliberate stupidity to spawn off a new thread per item instead of
reusing the existing ones.
The equivalent fork implementation to the below threaded code would be
to use forks::shared (which, BTW, is even worse than the ForkManager
way).
That only appears to work for a small set of shared data. With 10.000
items the forks-version merely sits there and doesn't appear to do
anything.
use threads;
use threads::shared;
use constant NUM_THREADS => 30;
my @queue : shared = 1 .. shift;
my @threads;
push @threads, threads->new("run") for 1 .. NUM_THREADS;
$_->join for @threads;
sub run {
while (defined(my $element = shift @queue)) {
print "$element\n";
}
}
On my machine I get: ...
If you increase the number further to, say, 10000, it's already
[processes]
real 0m45.605s
user 0m24.320s
sys 0m21.130s
[threads]
real 0m8.671s
user 0m1.090s
sys 0m7.580s
Parallelization is inherently an optimization step. As such, there is
no general solution and the appropriate way to parallelize is highly
dependent on the details of what is to be parallelized. If I wanted to
parallelize something like your test case, consisting of a large number of
very very fast operations, I would use the unpublished "Parallel_Proc"
module, which divides into chunks up front.
I am sure this is all very nice but also, as you say, unpublished.
use Parallel_Proc;
my $pm = Parallel_Proc->new();
my @data = 1 ... shift;
my ($l,$r)=$pm->spawn(30,scalar @data);
foreach(@data[$l..$r]) {
print "$_\n";
}
$pm->harvest(sub{print $_[0]}); #pass childs output out the parent's STDOUT
$pm->Done();
This is already going into the direction of MPI where the global pool of
data is (ideally) once distributed evenly among the processors which
then do their work. After each one has computed its sub-result they are
gathered again.
But this is a different domain altogether. Threads and fork() are useful
for those problems that are not CPU- but network- or event-bound. As
such their job is to make a program more responsive by doing things in
parallel that are slow because a resource is involved that may be shared
without making each request even slower.
I am not even sure that the problem at hand (namely parsing files) is
one of those. If the files are on separate disks it could benefit from
using as many threads/processes as there are disks. More parallel units
are likely to make it slower than a single-threaded application because
of additional overhead incurred by the reader's head jumping around on
the disk.
While this is true, it is not particularly relevant to the poster's
problem. There are cases where threading wins hands down. This original
problem is not one of them.
It was undoubtedly a contrived example I chose. The work per work unit
was so miniscule that the particular overhead of each of the two
solutions became the dominating factor.
I somewhat agree. Parallel::ForkManger was (apparently) designed so that
you can usually take code originally written to be serial, and make it
parallel by simply adding 2 carefully-placed lines (plus 3 house-keeping
lines). The threaded code is pretty much written from the ground up to be
threaded. The threaded code structure tends to be dominated by the
threading, while the ForkManager code tends to be dominated by whatever you
are fundamentally trying to do, which just a few lines making a nod to the
parallelization. This makes it easier to thoughtless add code that breaks
parallelization under ForkManager. So when I substantially refactor code
that uses ForkManager, I simply remove the parallelization, refactor the
code as serial code, then add ForkManager back in at the end.
Parallelizing is a huge intellectual problem for every programmer and
many parallel programs are inherently hard to understand. I haven't yet
found a paradigm that is truely intuitive. The best I've come across so
far is Ada's task-oriented approach but I've seen no other programming
language using this model.
Second best is threads but an already existing serial solution needs to
be rewritten to fit into it.
Then there are processes which are good when no communication between
those is required. Once pieces of data have to be exchanged it gets
ugly. The code is inflated with boring code that keeps the processes
synchronized, reads from pipes, makes the programmer wonder why there is
a deadlock etc.
The fourth approach, explicit message-passing through means like MPI, I
don't really count as this is clearly targeted at scientific computation
and requires spiffy multi-processor mainframes to be benificial.
Oh, one more thing I discovered. Threaded code with a shared queue is
tricky to do if the queue holds references or objects.
If you change the queue to:
my @queue : shared = map {[$_]} 1 .. shift;
Then it dies with "Invalid value for shared scalar". Since the forking
code doesn't use shared values, it doesn't have this particular problem.
You can circumvent this with the rather ugly:
my @queue : shared = map {my $x =[$_]; share $x; $x} 1 .. shift;
With blessed reference, this doesn't work.
Yes, objects don't play nice with threads as of now. The manpage of
threads::shared says this is going to be fixed some day. We'll see.
Tassilo