Threaded Perl Processes Going to Sleep Simultaneously??? Why?

R

Roycedot

Hi,

I've written a Perl program that creates 10 threads (ithreads) and
each thread operates on a different part of a large array. The large
array is just a file read in before the threads are started. Each
thread processes a chunk of the array and then prints the processed
output to a shared text file.

I run different instances of this Perl program so I can process
multiple files, but oddly all instances of this program go to sleep at
the same time and then come back at the same time. At least this is
what I see with the 'top' command in Linux.

Now, this could be a process management problem on my Linux, but I was
wondering if anyone would know any other reason? I'm 99% sure it's
not deadlock when the threads for each program write to output files
because each program writes to a different output file and yet all the
programs go to sleep at the same time.

I thought it could be a CPU limit issue on my box, but 'limit's says
CPU is unlimited and I also am now running under root with each
program reniced to -20.

Any ideas would be appreciated.
 
X

xhoster

Hi,

I've written a Perl program that creates 10 threads (ithreads) and
each thread operates on a different part of a large array. The large
array is just a file read in before the threads are started.

Either this large array is marked shared, and thus you are sharing
a large array, or you it is not marked shared and thus you are replicating
a large array 10 times. I don't know which one of these prospects
is less enticing.

Each
thread processes a chunk of the array and then prints the processed
output to a shared text file.

I run different instances of this Perl program so I can process
multiple files,

So you have multiple processes, each of which has 10 threads?
but oddly all instances of this program go to sleep at
the same time and then come back at the same time. At least this is
what I see with the 'top' command in Linux.

I don't see why that is surprising. Say there is some resource that all of
your jobs are trying to use, the hard drive, the network, whatever. If
that resource freezes up temporarily (say, because 4 processes with 10
threads for a total of 40 things are beating it over the head), all your
jobs will block waiting for it to come back.

Xho
 
R

Roycedot

Either this large array is marked shared, and thus you are sharing
a large array, or you it is not marked shared and thus you are replicating
a large array 10 times. I don't know which one of these prospects
is less enticing.

First off, thank you for the reply. The array is duplicated for each
thread (not using shared).
So you have multiple processes, each of which has 10 threads?

Yes, I have two processes - and each uses 10 threads
I don't see why that is surprising. Say there is some resource that all of
your jobs are trying to use, the hard drive, the network, whatever. If
that resource freezes up temporarily (say, because 4 processes with 10
threads for a total of 40 things are beating it over the head), all your
jobs will block waiting for it to come back.

Xho

So I have two processes - each with 10 threads - it seems unreasonable
that all 20 threads would try to access the hard drive at the same
time and thus all 20 are blocking. Once the threads are started, the
threads process a block of the array, print to a file, and then come
back for the next unprocessed block. Since the time it takes to
process a block is not going to be the same each time and the time to
process the block is at least as long as printing the output, there
should be no reason all threads are trying to print to the file at the
same time. I would think if at least 1 thread was not trying to write
to the hard drive, the process that owns that thread would be running
when I 'top'

This could be a machine-specific problem. I ran the same two
processes on another box and they cranked along nicely once I setup
their nice levels appropriately.

I suspect it could be because I'm running a 32-bit SMP linux on a 64-
bit dual-core machine...I'm going to install the 64-bit version this
weekend as I was planning on doing it anyways.

I appreciate the response. But, since the processes work fine on
another machine, I think this post continuing on a perl forum may be
unnecessary.

Thanks again.
 
X

xhoster

Roycedot said:
So I have two processes - each with 10 threads - it seems unreasonable
that all 20 threads would try to access the hard drive at the same
time and thus all 20 are blocking. Once the threads are started, the
threads process a block of the array, print to a file, and then come
back for the next unprocessed block.

What kind of processing is being done to each block? If it were CPU bound
then it seems unlikely you would want want to run this many in parallel in
the first place (Unless you are one of the very rare very lucky people
posting here have 16+ CPU boxes sitting around). And if it isn't CPU
bound, then there must be something else that is limiting, and whatever it
is could then make all your jobs stop at the same time.
Since the time it takes to
process a block is not going to be the same each time and the time to
process the block is at least as long as printing the output, there
should be no reason all threads are trying to print to the file at the
same time.

This seems unlikely for a simple local partition. If it something like NFS
though, it could just freeze up for a few seconds (or even minutes, I've
seen before) and all the threads would finish their processing and go to
print, piling up there. So if they all need to print once every few
seconds, it could easily look like they are stopping at about the same
time.
I would think if at least 1 thread was not trying to write
to the hard drive, the process that owns that thread would be running
when I 'top'

Another thing to consider is swapping. Does one of the swap processes jump
up in top when the perl processes freeze?
This could be a machine-specific problem. I ran the same two
processes on another box and they cranked along nicely once I setup
their nice levels appropriately.

I suspect it could be because I'm running a 32-bit SMP linux on a 64-
bit dual-core machine...I'm going to install the 64-bit version this
weekend as I was planning on doing it anyways.

I appreciate the response. But, since the processes work fine on
another machine, I think this post continuing on a perl forum may be
unnecessary.

If it still could be something specific about the perl implementation, I
would still consider it on-topic. My next step would probably be to strace
a job and see if the freezings are occurring in a particular system call,
then figuring out what part of perl is issuing that system call.

Xho
 
R

Roycedot

What kind of processing is being done to each block? If it were CPU bound
then it seems unlikely you would want want to run this many in parallel in
the first place (Unless you are one of the very rare very lucky people
posting here have 16+ CPU boxes sitting around). And if it isn't CPU
bound, then there must be something else that is limiting, and whatever it
is could then make all your jobs stop at the same time.

Each thread takes a string, uses the string to make a request to one
local server, transforms the string into more strings, then makes a
request to a non-local server, then applies some logic to create more
text, and finally writes the output to disk. CPU is the main
resource, but each process is still taking less than 50% of each of my
CPUs. I have an Xeon 3.0 Ghz dual-core.

One could argue that there is blocking happening because of the
requests to the non-local server, but, again, the processes don't
sleep when I run on another box with the same # of processes and
threads.
This seems unlikely for a simple local partition. If it something like NFS
though, it could just freeze up for a few seconds (or even minutes, I've
seen before) and all the threads would finish their processing and go to
print, piling up there. So if they all need to print once every few
seconds, it could easily look like they are stopping at about the same
time.

These are writes of ~8,000 characters to a local partition. I have a
10k SATA drive.
Another thing to consider is swapping. Does one of the swap processes jump
up in top when the perl processes freeze?

No swap - machine goes to 0 load.
If it still could be something specific about the perl implementation, I
would still consider it on-topic. My next step would probably be to strace
a job and see if the freezings are occurring in a particular system call,
then figuring out what part of perl is issuing that system call.

Xho

That's a good recommendation - I'll try strace and see what's going
on. I just didn't want to be impolite and ask about a problem that
could very well be unrelated to Perl.

I'll try strace and then installing a 64-bit specific kernel/perl.

Thanks for the help.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

Forum statistics

Threads
473,994
Messages
2,570,223
Members
46,813
Latest member
lawrwtwinkle111

Latest Threads

Top