Reaping dead children

M

Mike Dowling

I'm having problems reaping my dead children.

I have written a perl script that starts a monster program that
determines optimal schedules for my companies power plants. It can
happen that the optimisation software takes hours, months, or years.
Obviously, we would like to interrupt the program when that happens, but
my perl script is blocked until the monster finishes, or until it's
interrupted. So I start a second process with "fork" that writes info
from the log file that the monster is continually writing. I don't want
that second process to become a zombie, so I reap dead children ising:

use POSIX ":sys_wait_h";

$SIG{CHLD} = \&REAPER;

sub REAPER { # reap my dead children to avoid zombies
my $stiff;
while (($stiff = waitpid(-1, WNOHANG)) > 0) { }
$SIG{CHLD} = \&REAPER;
}

Now my child can rest in peace.

But, and here's the problem, if monster terminates with an error
message, I want to cath that, and analyse why. With any kind of
reaping, including

$SIG{CHLD} = 'IGNORE'

(I am really not interested in why my child dies), the status code
returned by monster is changed. I don't care that happens to dead
children started by "fork", but it is vital to check why a child dies
that is started with "system". What can be done?

Cheers,
Mike Dowling
 
B

Brian McCauley

I seem to be dogged with Usenet posting problems. Let's try again...

Mike said:
I'm having problems reaping my dead children.

I have written a perl script that starts a monster program that
determines optimal schedules for my companies power plants. It can
happen that the optimisation software takes hours, months, or years.
Obviously, we would like to interrupt the program when that happens, but
my perl script is blocked until the monster finishes, or until it's
interrupted. So I start a second process with "fork" that writes info
from the log file that the monster is continually writing. I don't want
that second process to become a zombie,

Why not just let the zombies persist until the monster has finished then
explicitly reap using waitpid($pid,0)? No need for handlers.
 
B

Ben Morrow

Quoth (e-mail address removed):
I'm having problems reaping my dead children.

I have written a perl script that starts a monster program that
determines optimal schedules for my companies power plants. It can
happen that the optimisation software takes hours, months, or years.
Obviously, we would like to interrupt the program when that happens, but
my perl script is blocked until the monster finishes, or until it's
interrupted. So I start a second process with "fork" that writes info
from the log file that the monster is continually writing. I don't want
that second process to become a zombie, so I reap dead children ising:

use POSIX ":sys_wait_h";

$SIG{CHLD} = \&REAPER;

sub REAPER { # reap my dead children to avoid zombies
my $stiff;
while (($stiff = waitpid(-1, WNOHANG)) > 0) { }
$SIG{CHLD} = \&REAPER;
}

Now my child can rest in peace.

But, and here's the problem, if monster terminates with an error
message, I want to cath that, and analyse why. With any kind of
reaping, including

$SIG{CHLD} = 'IGNORE'

(I am really not interested in why my child dies), the status code
returned by monster is changed. I don't care that happens to dead
children started by "fork", but it is vital to check why a child dies
that is started with "system". What can be done?

Create the 'monster' child yourself with fork/exec. Then you don't need
the other child: the main program can deal with reading the logfile and
killing it when necessary.

To answer your actual question, $? is only valid *inside* the
sighandler, and if you have one the return value of system is invalid.
So, find out the pid of your 'monster' child (you will need to do the
system by hand with fork/exec to do this), and store it in a global; in
the sighandler, check if $stiff == $monster and if it is stick $? into
another global that you can look at back in the main program. You *will*
find you get nasty effects if you have a waitpid($monster) in the main
program and a waitpid(-1) in the sighandler running at the same time;
I'd probably use eval { POSIX::pause } in the main program and then die
in the sighandler if we've got the monster child.

Yes, all this business with globals is pretty nasty; but that's signals
for you, I'm afraid... :). You may be able to make the sighandler a
closure: something like

sub run_monster {
my $monster_pid = ...;
my $monster_status;

$SIG{CHLD} = sub {
...;
if ($stiff == $monster_pid) {
$monster_status = $?;
die;
}
};
}

but I haven't tested that.

Ben
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,997
Messages
2,570,240
Members
46,829
Latest member
KimberAlli

Latest Threads

Top