multiplexed IO suggestion?

Walter Roberson · Jan 5, 2004

I am hoping someone can point me to an existing module or give
a good algorithm hint for this.

I process large-ish text files, performing a bit of work on each line.
The processing of each line can often be done mostly independantly of
the other lines, but it would be better if all the output for a given
line were to proceed the output for the next line. Some manner of
internal buffering for that purpose would be fine.

The per-line cpu time is short, by the way: but many lines call for DNS
lookups, and I'd like to do some of those lookups in parallel while
preferably maintining the output order -as if- the lines were processed
sequentially (assuming the processing of each is independant.)

I started writing this up in terms of ordered queues of processing
requests, and callbacks, but before I got very far on that approach, I
realized that in theory I could instead do something akin to having a
set of filehandles, barely change the existing code, and do some
behind-the-scenes work so that what was printed to any one filehandle
did not get sent to stdout until all the previous filehandles had
terminated.

Generalizing, I can see that this same mechanism could come up in other
contexts -- that one might want to start threads that execute
independantly, with a thread-localized filehandle that could be written
to by each thread, with the outputs automatically being demultiplexed
into a single filehandle so that all output from threads started
earlier appeared before the threads started later. Would someone have
some ideas on good ways to accomplish this?

While writing the previous paragraph, I realized that in the program
I'm working on most immediately, that output is always the last thing
the thread would do, after the "work" is done, so what I could do is,
just before output, queue on a semaphore that will not be made
available until the previous thread finishes. But I could imagine
other programs with intermediate outputs.

I will not be able to literally use perl threads; threaded
perl fails one of it's build tests on my IRIX system.
("known problem" "put here to stop you from putting into
production.) I guess there's always forking.

Sara · Jan 5, 2004

I am hoping someone can point me to an existing module or give
a good algorithm hint for this.

I process large-ish text files, performing a bit of work on each line.
The processing of each line can often be done mostly independantly of
the other lines, but it would be better if all the output for a given
line were to proceed the output for the next line. Some manner of
internal buffering for that purpose would be fine.

The per-line cpu time is short, by the way: but many lines call for DNS
lookups, and I'd like to do some of those lookups in parallel while
preferably maintining the output order -as if- the lines were processed
sequentially (assuming the processing of each is independant.)

I started writing this up in terms of ordered queues of processing
requests, and callbacks, but before I got very far on that approach, I
realized that in theory I could instead do something akin to having a
set of filehandles, barely change the existing code, and do some
behind-the-scenes work so that what was printed to any one filehandle
did not get sent to stdout until all the previous filehandles had
terminated.

Generalizing, I can see that this same mechanism could come up in other
contexts -- that one might want to start threads that execute
independantly, with a thread-localized filehandle that could be written
to by each thread, with the outputs automatically being demultiplexed
into a single filehandle so that all output from threads started
earlier appeared before the threads started later. Would someone have
some ideas on good ways to accomplish this?

While writing the previous paragraph, I realized that in the program
I'm working on most immediately, that output is always the last thing
the thread would do, after the "work" is done, so what I could do is,
just before output, queue on a semaphore that will not be made
available until the previous thread finishes. But I could imagine
other programs with intermediate outputs.

I will not be able to literally use perl threads; threaded
perl fails one of it's build tests on my IRIX system.
("known problem" "put here to stop you from putting into
production.) I guess there's always forking.

Hello Anon:

Do you really mean "multiplexed" as in TDM or FDM or ?? Seems like an
odd term to use - perhaps you mean multi-threaded or
parallel-processed? "Multiplexing" is splitting up one task in either
time or frequency domains. Sounds like you want to multitask?

At any rate, you want to process lines in a file indepenently.
Optimizing this type of requirement can be difficult as it depends on
your hardware, swapspace and overhead to do a swap, input size, etc.
You might try some test cases, such as:

split the file into 1/4's then kick off 4 BG tasks, and compare the
clocktime with your baseline (running the lines sequentially in one
thread).

If the clocktime gets WORSE (which it could given swapping and
multiple task overhead) then its possible you might have nothing to
gain by trying to "multiplex" this process. If it improves then you'll
need to start searching for an optimal number of threads, which would
also depend on the size of your input file.

Don't forget you'll create a lot of task management overhead- some
process will have to split up and monitor (and error-check/report) all
of the threads, and another will have to reassemble the results from
the individual task.

Also, if you already have a multi-processor system with a contemporary
O/S then these types of tasks may already be optimized for you at
runtime.

G

Walter Roberson · Jan 5, 2004

o you really mean "multiplexed" as in TDM or FDM or ?? Seems like an

dd term to use - perhaps you mean multi-threaded or

arallel-processed? "Multiplexing" is splitting up one task in either
:time or frequency domains. Sounds like you want to multitask?

TDM and FDM are two particular -kinds- of multiplexing. OED
says:

5. Telecomm. Designating or relating to a process or apparatus
for simultaneously transmitting two or more signals or programmes
(to be later separated and recovered) over a single wire or
channel; designating a composite signal so formed.

My task involves several "signals" that have to be
"later separated and recovered", so "multiplex" was the proper term
for the component I asked about. The several signals are generated
through multitasking, but I have a sufficient handle on multitasking
thanks.

:Also, if you already have a multi-processor system with a contemporary
:O/S then these types of tasks may already be optimized for you at
:runtime.

I have -several- multi-processor systems with contemporary O/S.
On the fastest of them, the CPU time involved in my processing
averages about 1 second per megabyte. CPU time is not the
issue here: it's waiting for the results from remote DNS
servers that is bogging the program down (e.g., over 8 hours
clock time to process a 112 Mb file.) A single DNS lookup could take
30 seconds clock time, so I will run several lookups in parallel
(say, 100). As it is unpredictable which lookup will return first,
it is very likely that the lookup results will come back
out-of-order, so now I'm looking at mechanisms to re-order
the output -- i.e., mechanisms to de-multiplex it. As long
as the multiprocess management overhead is less than about 250
times as expensive as the per-line CPU processing, I come out ahead.
No point in having the system sitting idle waiting for
remote systems to return DNS results.

There are several options available to me. For example, I could
just proceed every output line with the input line number it
corresponds to (plus a sequence number of relative output line
number), and then in a post-processing pass sort based upon those
sequence numbers, but I think there are better options available.

ctcgag · Jan 5, 2004

I am hoping someone can point me to an existing module or give
a good algorithm hint for this.

I process large-ish text files, performing a bit of work on each line.
The processing of each line can often be done mostly independantly of
the other lines, but it would be better if all the output for a given
line were to proceed the output for the next line. Some manner of
internal buffering for that purpose would be fine.

The per-line cpu time is short, by the way: but many lines call for DNS
lookups, and I'd like to do some of those lookups in parallel while
preferably maintining the output order -as if- the lines were processed
sequentially (assuming the processing of each is independant.)

Use the UNIX "split" util or a simple perl script to divide input file
into (say) four files. Run your existing script on the four files
independently in parallel, then concatenate the output files (in the same
order they were parcelled out).

Or, have the master thread assign a serial number to each line, have the
slave threads output this serial number in additional to the normal output,
then sort on this number.

I started writing this up in terms of ordered queues of processing
requests, and callbacks, but before I got very far on that approach, I
realized that in theory I could instead do something akin to having a
set of filehandles, barely change the existing code, and do some
behind-the-scenes work so that what was printed to any one filehandle
did not get sent to stdout until all the previous filehandles had
terminated.

Generalizing, I can see that this same mechanism could come up in other
contexts -- that one might want to start threads that execute
independantly, with a thread-localized filehandle that could be written
to by each thread, with the outputs automatically being demultiplexed
into a single filehandle so that all output from threads started
earlier appeared before the threads started later. Would someone have
some ideas on good ways to accomplish this?

I was going to suggest a heap with serial numbers might be the way to go,
but I decided a simple hash (if made thread-safe) would work:

while ($INPUT_STILL_GOING or $next_seq<=$last_seq) {
sleep 1 while !exists $hash{$next_seq};
print delete $hash{$next_seq++}, "\n";
};

Xho

gnari · Jan 5, 2004

Walter Roberson said:
I have -several- multi-processor systems with contemporary O/S.
On the fastest of them, the CPU time involved in my processing
averages about 1 second per megabyte. CPU time is not the
issue here: it's waiting for the results from remote DNS
servers that is bogging the program down (e.g., over 8 hours
clock time to process a 112 Mb file.) A single DNS lookup could take
30 seconds clock time, so I will run several lookups in parallel
(say, 100). As it is unpredictable which lookup will return first,
it is very likely that the lookup results will come back
out-of-order, so now I'm looking at mechanisms to re-order
the output -- i.e., mechanisms to de-multiplex it. As long
as the multiprocess management overhead is less than about 250
times as expensive as the per-line CPU processing, I come out ahead.
No point in having the system sitting idle waiting for
remote systems to return DNS results.

There are several options available to me. For example, I could
just proceed every output line with the input line number it
corresponds to (plus a sequence number of relative output line
number), and then in a post-processing pass sort based upon those
sequence numbers, but I think there are better options available.

maybe it would be advantageous to separate the DNS lookups totally from
your main process:
a) performe a separate pass over the input, retrieving addresses to lookup
b) sort, uniq and split list of addresses into , say 100 lists
c) feed those to 100 identical DNS-lookup processes, createing a local
lookup database.
d) now run your original process, but using the local database
instead of DNS-lookups

I am not saying this would be more effective, but it should simplify
your coding, and avoids unecessarily complicate your main input processing
to keep the output in order.

also, the database could be kept between runs if there is a chance of
repeated
addresses. then you would filter out known addresses in b) before the split

gnari

Ben Morrow · Jan 5, 2004

While writing the previous paragraph, I realized that in the program
I'm working on most immediately, that output is always the last thing
the thread would do, after the "work" is done, so what I could do is,
just before output, queue on a semaphore that will not be made
available until the previous thread finishes. But I could imagine
other programs with intermediate outputs.

What about (untested):

use IO::Select;

my @chunks = split_input_into_chunks();
my (@FHs, %bufs);

for(@chunks) {
unless(open my $FH, "-|") {
process_chunk($_);
exit();
}
push $FH, @FHs;
}

my $sel = IO::Select->new(@FHs);

while(my @to_read = $sel->can_read) {
for(@to_read) {
sysread $_, $bufs{$_}, 2048, -1
or $sel->remove($_);

if($_ == @FHs[0]) {
print STDOUT $bufs{$_};
$bufs{$_} = "";
$sel->exists($_) or shift @FHs;
}
}
}

print STDOUT $bufs{$_} for @FHs;

? Obviously it needs error checking added.

Ben

Walter Roberson · Jan 5, 2004

:use IO::Select;
[...]

Thanks, Ben, I'll go through that in more detail when I'm a bit
more awake.

One possibility I was thinking of was to introduce a new object
inheriting from IO, in which a "write" to any given handle appended
( e.g., .= ) to the end of a class variable, to be released to the
master file-handle when all the previous objects had closed.

Hmmm, guess that wouldn't really be a class variable in a pure
sense: it should really be something parameterized by the master
file-handle. Dual-level objects perhaps, create a multiplexer
master and then create ordered slaves off of that master.
(Queue the memories of SVRr3's "clone" device mechanism!)

Uri Guttman · Jan 5, 2004

WR> My task involves several "signals" that have to be
WR> "later separated and recovered", so "multiplex" was the proper term
WR> for the component I asked about. The several signals are generated
WR> through multitasking, but I have a sufficient handle on multitasking
WR> thanks.

WR> I have -several- multi-processor systems with contemporary O/S.
WR> On the fastest of them, the CPU time involved in my processing
WR> averages about 1 second per megabyte. CPU time is not the
WR> issue here: it's waiting for the results from remote DNS
WR> servers that is bogging the program down (e.g., over 8 hours
WR> clock time to process a 112 Mb file.) A single DNS lookup could take
WR> 30 seconds clock time, so I will run several lookups in parallel
WR> (say, 100). As it is unpredictable which lookup will return first,
WR> it is very likely that the lookup results will come back
WR> out-of-order, so now I'm looking at mechanisms to re-order
WR> the output -- i.e., mechanisms to de-multiplex it. As long
WR> as the multiprocess management overhead is less than about 250
WR> times as expensive as the per-line CPU processing, I come out ahead.
WR> No point in having the system sitting idle waiting for
WR> remote systems to return DNS results.

i will have to recommend you check out stem. it can do all you want and
with much less effort than you are doing now. stem is a network
application toolkit and framework and it supports message passing and
events. it is easy to create connected processes (forked or otherwise)
and some can do the blocking DNS lookups and others the main crunching
of the file. creating the modules you need under stem for your
application would be very simple. a single method that takes in a
message and does a DNS lookup (gethostbyname?) would be all you need for
that service. creating your complete application is also simple as it
won't require any network programming, just configuring the desired
components and running them under stem.

you can check out stem at stemsystems.com. feel free to ask me any
questions or if you need development or technical support, i am
available.

thanx,

uri

Rocco Caputo · Jan 5, 2004

I am hoping someone can point me to an existing module or give
a good algorithm hint for this.

I process large-ish text files, performing a bit of work on each line.
The processing of each line can often be done mostly independantly of
the other lines, but it would be better if all the output for a given
line were to proceed the output for the next line. Some manner of
internal buffering for that purpose would be fine.

The per-line cpu time is short, by the way: but many lines call for DNS
lookups, and I'd like to do some of those lookups in parallel while
preferably maintining the output order -as if- the lines were processed
sequentially (assuming the processing of each is independant.)

You might want to look at POE, on the CPAN and at http://poe.perl.org/.
Specifically, the cookbook at http://poe.perl.org/?POE_Cookbook has
examples of asynchronous DNS lookups using POE::Component::Client:

NS.

Matching the input order can be done simply by tagging each line of
output with the input line number

printf "%9d %s\n", $input_line_number, $output_text;

then sorting the results and cutting off the first ten characters of
each line.

./foo | sort -n | cut 11- > output.text

.... which isn't Perl, but it beats writing code.

I started writing this up in terms of ordered queues of processing
requests, and callbacks, but before I got very far on that approach, I
realized that in theory I could instead do something akin to having a
set of filehandles, barely change the existing code, and do some
behind-the-scenes work so that what was printed to any one filehandle
did not get sent to stdout until all the previous filehandles had
terminated.

That seems like an awful lot of work. Consider this alternative:

Maintain an output window, and its offset. Perl starts counting input
and output records at 1, so we do too.

my $output_offset = 1;
my @window;

Here's a callback to process a record asynchronously. We use the return
value to determine when your input source is exhausted.

sub process_one_more {
my $next_record = <INPUT>;
return unless defined $next_record;

chomp $next_record;
process_record($., $next_record);
return 1;
}

Theoretically, you call process_one_more() until you have enough records
being processed in parallel. At that point you stop calling it until
one of the parallel records is done.

Note that we're passing the input record number along with the record.
It's used in output() to reconstitute the file's original order.

When a record is done being processed, the results are sent to output,
and we try to process a new record to replace the finished one.

output($input_record_number, $output_record);
process_one_more();

The output function adds the output record to @window, then it decides
whether some records can be flushed to disk.

sub output {
my ($input_record_number, $output_record) = @_;

$window[$input_record_number - $output_offset] = $output_record;

while (@window and defined $window[0]) {
print OUTPUT shift @window;
$output_offset++;
}
}

Assume there are four records, which become ready in the order 4, 2, 1,
and 3.

Record 4 will be entered into position 3 of the window, because
$output_offset (1) is subtracted from it. The window looks like

$output_offset == 1

0: undef
1: undef
2: undef
3: record 4

Likewise, records 2 and 1 are entered into positions 1 and 0,
respectively.

$output_offset == 1

0: record 1
1: record 2
2: undef
3: record 4

Position 0 is now defined, so the while() loop shifts that record off
the @window, writes it to OUTPUT, and adjusts the output offset.

$output_offset == 2

0: record 2
1: undef
2: record 4

Position 0 is still defined, so record 2 is shifted off and flushed.

$output_offset == 3

0: undef
1: record 4

Record 3 will be entered into position 0 of the window, because
$output_offset is subtracted from its record number.

0: record 3
1: record 4

The while() loop iterates twice, writing out records 3 and 4, and
leaving $output_offest at 5.

etc.

Generalizing, I can see that this same mechanism could come up in other
contexts -- that one might want to start threads that execute
independantly, with a thread-localized filehandle that could be written
to by each thread, with the outputs automatically being demultiplexed
into a single filehandle so that all output from threads started
earlier appeared before the threads started later. Would someone have
some ideas on good ways to accomplish this?

While writing the previous paragraph, I realized that in the program
I'm working on most immediately, that output is always the last thing
the thread would do, after the "work" is done, so what I could do is,
just before output, queue on a semaphore that will not be made
available until the previous thread finishes. But I could imagine
other programs with intermediate outputs.

I think you're thinking too hard.

I will not be able to literally use perl threads; threaded
perl fails one of it's build tests on my IRIX system.
("known problem" "put here to stop you from putting into
production.) I guess there's always forking.

POE doesn't require threads or forking to handle asynchronous DNS
requests. If you run into problems installing it on IRIX, please e-mail
them (and any helpful output) to the output of

perl -wle '$_=q([email protected]);tr[a-z][n-za-m];print'

Thanks!

Bruce Horrocks · Jan 5, 2004

Walter Roberson said:
The per-line cpu time is short, by the way: but many lines call for DNS
lookups, and I'd like to do some of those lookups in parallel while
preferably maintining the output order -as if- the lines were processed
sequentially (assuming the processing of each is independant.)

I started writing this up in terms of ordered queues of processing
requests, and callbacks, but before I got very far on that approach, I
realized that in theory I could instead do something akin to having a
set of filehandles, barely change the existing code, and do some
behind-the-scenes work so that what was printed to any one filehandle
did not get sent to stdout until all the previous filehandles had
terminated.

Are the text output lines of constant length or similar length? You
could record the current write point in the output file (call it WP),
write a line of n spaces as a placeholder and spawn the child process,
giving it WP (and the filehandle) as a parameter.

You then write another line of n spaces and spawn the next child and so
on.

Each child process does its stuff and seeks to WP before overwriting the
n spaces with its n characters of real data.

Even if the lines aren't exactly the same length, you could do the same
thing but set n to allow for the maximum and then do a single
post-process scan at the end to trim trailing blanks. (Or just leave the
trailing blanks.)

Regards,

Anno Siegel · Jan 7, 2004

[DNS lookup]

maybe it would be advantageous to separate the DNS lookups totally from
your main process:
a) performe a separate pass over the input, retrieving addresses to lookup
b) sort, uniq and split list of addresses into , say 100 lists
c) feed those to 100 identical DNS-lookup processes, createing a local
lookup database.
d) now run your original process, but using the local database
instead of DNS-lookups

The strategy seems sound, but I'm a little worried by the number of
parallel lookups suggested, here and elsewhere in the thread. I'm
no DNS expert, but there must be a limit to how many lookups the local
system can deal with in a given time. Once you reach that limit,
additional parallelization won't gain anything. The point of anti-social
behavior may be reached before that. Just from a guts feeling, 5-10
parallel processes seems more realistic than 100.

Anno

Michele Dondi · Jan 8, 2004

The per-line cpu time is short, by the way: but many lines call for DNS
lookups, and I'd like to do some of those lookups in parallel while
preferably maintining the output order -as if- the lines were processed
sequentially (assuming the processing of each is independant.)

I haven't read carefully all the other posts in this thread, but it
seems to me that no one has mentioned "open my $fh, '-|', command"
yet: I am well aware that mine may be a naive suggestion to you, but
it may well be worth giving it a try...

More precisely, to preserve the order of the input lines, you may put
into an array to be used like a FIFO/buffer the processed line if
there's no lengthy process to be performed and a reference to an
open()ed FH, running the lengthy process, otherwise.

Michele

Walter Roberson · Jan 9, 2004

:
:> I have -several- multi-processor systems with contemporary O/S.
:> On the fastest of them, the CPU time involved in my processing
:> averages about 1 second per megabyte. CPU time is not the
:> issue here: it's waiting for the results from remote DNS
:> servers that is bogging the program down (e.g., over 8 hours
:> clock time to process a 112 Mb file.)

:maybe it would be advantageous to separate the DNS lookups totally from
:your main process:
:a) performe a separate pass over the input, retrieving addresses to lookup

Thanks. What I've done in the meantime is use Storable to save
and restore the lookup caches, so that later passes over the same
(or similar) data will execute more quickly. It isn't perfect
because it can still take a long time the first time around, and
because I'm not recording any lookup expiry information. I would
perhaps be better off with using a nameserver that used a persistant
cache, but my investigation of bind9 shows that bind9's dump and
restore features are intended only for bind9 developers.

I am working on moving the lookups into threads, and am making
some progress but there is still much to do on it.

Walter Roberson · Jan 9, 2004

:The strategy seems sound, but I'm a little worried by the number of

arallel lookups suggested, here and elsewhere in the thread. I'm
:no DNS expert, but there must be a limit to how many lookups the local
:system can deal with in a given time. Once you reach that limit,
:additional parallelization won't gain anything. The point of anti-social
:behavior may be reached before that.

True.

:Just from a guts feeling, 5-10

arallel processes seems more realistic than 100.

In this particular case, the kernels are engineered so that
(say) 100 parallel lookups shouldn't be a significant problem.
They were designed from the ground up as high-performance
multiprocessing machine, complete with distributed signal handling
and distributed system call handling. But you do raise a good
point, so I will keep an eye on it during my testing.

Walter Roberson · Jan 11, 2004

:The strategy seems sound, but I'm a little worried by the number of

arallel lookups suggested, here and elsewhere in the thread. I'm
:no DNS expert, but there must be a limit to how many lookups the local
:system can deal with in a given time. Once you reach that limit,
:additional parallelization won't gain anything. The point of anti-social
:behavior may be reached before that. Just from a guts feeling, 5-10

arallel processes seems more realistic than 100.

I hit the limit rather sooner than that -- dns lookups in threads
was noticably slower than sequential lookups, even with only
one lookup thread active at a time.

I found something in perlthrtut that I believe accounts for this.
In the discussion of private vs shared variables, the documentation
notes that at the time the thread is created, private copies are
taken of all private variables, so that the thread can potentially
modify them in isolation. If so, then the slowness I saw would
be due to the time spent cloning my address space. My small
threads test program is not particularily slow, but the data in
my real program is non-trivial.

I am revising the program to be more aware of when it needs to spin
off threads and when it doesn't (particularily important when the
dns data is in cache already.) At some point, though, I'll probably
have to revise further to spin off some worker threads very early
on (before I load my data) and then distribute work to them as needed.

I guess I was expecting "copy on write" or other light-weight semantics.
Or expecting a way to be able to spin off a thread with a very limited
data context... sort of closure-like.

Ben Morrow · Jan 11, 2004

I guess I was expecting "copy on write" or other light-weight semantics.
Or expecting a way to be able to spin off a thread with a very limited
data context... sort of closure-like.

perldoc threads
| TODO
| The current implementation of threads has been an attempt to get a cor-
| rect threading system working that could be built on, and optimized, in
| newer versions of perl.
|
| Currently the overhead of creating a thread is rather large, also the
| cost of returning values can be large. These are areas were there most
| likely will be work done to optimize what data that needs to be cloned.

Ben

Walter Roberson · Jan 20, 2004

|I am revising the program to be more aware of when it needs to spin
|off threads and when it doesn't (particularily important when the
|dns data is in cache already.) At some point, though, I'll probably
|have to revise further to spin off some worker threads very early
|on (before I load my data) and then distribute work to them as needed.

I have worked on this parallel DNS lookups [for log file analysis] for
quite a few hours now, and I have learned some lessons:

1) perl 5.8.2 thread start-up time depends heavily upon the size of
your unshared variables. This is because ithreads copy all unshared
variables that are of potential use by the new thread -- there is,
at present, no copy-on-write semantics.

2) It seems it also can take a fair time to copy shared variables -- just
not *as* much time.

3) Because of the copying behaviour, you should start up your threads
after all the variables they will need have been set, but before
large variables they don't need have been created. For example, one
of my large hashes was not needed by the threads, so I postponed
initializing the hash.

4) You can only use the :shared syntax for 'my' variables. This makes
it harder to selectively create shared variables depending on whether
threads are present or not (but you can use the share() function.)

5) Using the share function on arrays or hashes wipes out the previous
values stored. Using the share function on a scalar preserves the existing
value.

6) You can't lock a hash element. This is documented, but documented
in terms of $hashref->{key} so it wasn't clear that locking $hash{key} would
not work.

7) Avoid creating new threads. This may mean introducing new functions
that do the work if it can be done locally and return a special status
if you really do need another thread. Depends how much you want your
utility functions to know about your thread management strategy.

8) If you need to have different thread results come out in a
particular order, you can use a semaphore and at the end of each
step 'up' by the [sequential] identifier of the next in line. It can
be a bit tricky to get this working properly... and it turns out to
be kind of slow.

8a) Instead, another way that works much much faster is to drop the
result into a shared hash keyed by the sequence identifier and have a
mainline harvesting routine that pulls the results out in sequence.

9) Avoid creating new threads, da.*it! Try a pool of threads and
Thread::Queue enqueue work for them -- that is *much* faster than
creating a new thread if you have large data structures sitting around
[see point 1.] I held off trying this approach for rather some time
thinking it was going to be hard to impliment, but it turned out to be
relatively easy.

10) Thread::Queue won't handle refs or anything more than simple variables.
And it only allows a single item to be dequeued at a time -- not much
fun if you want to pass a bunch of parameters. Thread::Queue::Any
from CPAN handles complicated structures and is much more sensible
about groups of parameters, but it uses Storable to move the data around,
and sometimes it gets confused when it tries to thaw the data. If it
does get confused, then you will probably get an error message about
being unable to start up a thread -- I suspect it is referring to an
internal thread to handle an 'eval'.

11) perl -d gets confused and very messed up if you have multiple
threads running. But maybe I just haven't learned how to debug threads
yet.

12) lock so that you don't have shared hashes being written to
by multiple threads. Even if the different threads are writing to
different elements, you will end up with a corrupt hash. When I
accidently turned my locking off, I ended up with strange results.

13) Detached threads take their own sweet time exiting, and there
is nothing you can do about it except sleep() until they are gone.

14) If you have a lot of detached threads, your program will probably
just die for no apparent reason. If you don't detach your threads, then
they are going to sit around using resources until you harvest them.
Consider having a routine that periodically join()'s to older threads
so as to release the resources. This is tricky because there is no
provided mechanism to tell whether a thread is active or waiting to
be shutdown. Consider having the thread notify its completion by
mechanisms such as in 8a).

15) Cache the results of your threads if there is a realistic chance
you will need the result again. (See point 7.) The speedup can be
substantial.

16) Unless you put a lot of work in to optimizing thread usage, your
threaded program might be noticably slower than proceeding non-threaded.

In the case of my log-file scanner, I am now at the point where
the non-threaded version is -usually- faster, but sometimes the
threaded version is faster, depending on system load, how much DNS
has been cached by the system, and so on. The variations in runtimes
of the non-threaded versions can exceed the runtime of the
threaded version.

Next step would probably be to try it on some really big input files;
I'm no-longer using Thread::Queue::Any, so I'm expecting the program
now will not die after 400000 input lines or so.

Walter Roberson · Jan 21, 2004

[lessons learned about perl threads]

17) If you try to use Storable to save a shared variable, you will
just get a string that makes reference to Tied, and the actual contents
will not be saved. Work around: copy the data into a temporary variable
and save that temporary variable.

When you are lazy...	3	Jan 2, 2025
IO::Select and PerlIO	9	Nov 19, 2012
Merge sort and threads	0	Aug 16, 2024
Pattern suggestion	10	Apr 15, 2012
configuring STD* IO to use locale's encoding?	6	Aug 3, 2013
FAQ 5.1 How do I flush/unbuffer an output filehandle? Why must I do this?	0	Apr 2, 2011
Taskcproblem calendar	4	Aug 31, 2023
IO problem	3	Feb 19, 2014

multiplexed IO suggestion?

Walter Roberson

Sara

Walter Roberson

ctcgag

gnari

Ben Morrow

Walter Roberson

Uri Guttman

Rocco Caputo

Bruce Horrocks

Anno Siegel

Michele Dondi

Walter Roberson

Walter Roberson

Walter Roberson

Ben Morrow

Walter Roberson

Walter Roberson

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads