sleep/fork/shell/SIGCHLD interaction problem

J

Justin Fletcher

Hiya,

I'm having a problem trying to get a simple program to respond the way
that I expect. The basic premise is thus :

1. Fork a child.
2. Sleep for a while.
3. Do other stuff.

This seems pretty simple, and I have a SIGCHLD handler which will catch my
forked process if it exits. I thought everything was fine. Then I found
that is I press ctrl-Z to suspend the parent whilst I'm running the
program and then background it, it hangs. I've reduced the problem to the
simplest I can, as follows :

----
#!/bin/perl

$SIG{'CHLD'} = sub {
print "SIGCHLD\n";
$pid = wait;
print "leave SIGCHLD for pid $pid\n";
};

print "Forking to do some long running task\n";
unless ($pid = fork) {
$SIG{'CHLD'} = 'DEFAULT';
exec "tail -f /dev/null";
die "failed\n";
};

print "Sleeping\n";
sleep 50;
print "Waking\n";
----

The problem is that if I press ctrl-Z whilst the program is sleeping, and
then resume it in the background with 'bg', a SIGCHLD is triggered. The
handler then does a 'wait' to get the PID and hangs because there isn't a
child that's exited. We never leave the SIGCHLD handler (unless the long
running task completes). The use of 'tail -f /dev/null' is purely to
simulate a task which just keeps running.

In the shell, the following sequence is seen:

----
justin@buttercup:~/Root/perltest$ perl testsleep.pl
Forking to do some long running task
Sleeping

[1]+ Stopped perl testsleep.pl
justin@buttercup:~/Root/perltest$ bg
[1]+ perl testsleep.pl &
SIGCHLD
justin@buttercup:~/Root/perltest$
----

I'm running bash 3.1.17, linux kernel 2.6.18, from debian stable, with
perl 5.8.8.

I believe this sort of construct to be normal and even recommended from
the perlipc pages; so... am I doing something wrong ? is bash ? is the
kernel ? is perl ?

I'm hoping I'm just misunderstanding how process control should be done.
 
M

Martijn Lievaart

The problem is that if I press ctrl-Z whilst the program is sleeping,
and then resume it in the background with 'bg', a SIGCHLD is triggered.
The handler then does a 'wait' to get the PID and hangs because there
isn't a child that's exited. We never leave the SIGCHLD handler (unless
the long running task completes). The use of 'tail -f /dev/null' is
purely to simulate a task which just keeps running.

I believe this sort of construct to be normal and even recommended from
the perlipc pages; so... am I doing something wrong ? is bash ? is the
kernel ? is perl ?

I'm hoping I'm just misunderstanding how process control should be done.

It seems you are getting signals for the stop and start of the child, see
man sigaction and look at the possible CHLD signals.

This is worrying, your code is quite a normal construct and there must be
a lot of production code out there that has this same problem.

Additionally I could not find out how to get at the si_code for the
signal.

The solution seems to me to use (thanks to perldoc perlipc):

#!/usr/bin/perl

use strict;
use warnings;
use POSIX ":sys_wait_h";

sub REAPER {
print "entering reaper\n";
my $child;
# If a second child dies while in the signal handler caused by the
# first death, we won’t get another signal. So must loop here else
# we will leave the unreaped child as a zombie. And the next time
# two children die we get another zombie. And so on.

# Also, we can get signals on stopping and continuation of children
# so there is no process to wait on

while (($child = waitpid(-1,WNOHANG)) > 0) {
print "Reaped $child: $?\n";
}
$SIG{CHLD} = \&REAPER; # still loathe sysV
print "Leaving reaper\n";
}
$SIG{CHLD} = \&REAPER;

my $pid;
print "Forking to do some long running task\n";
unless ($pid = fork) {
$SIG{'CHLD'} = 'DEFAULT';
my $i=0;
while (1) {
print $i++, "\n";
sleep 1;
}
}

print "pid=$pid\n";
print "Sleeping\n";
sleep 20;
print "Waking\n";
kill 'INT', $pid;
sleep 2;
 
B

Ben Morrow

Quoth Justin Fletcher said:
The problem is that if I press ctrl-Z whilst the program is sleeping, and
then resume it in the background with 'bg', a SIGCHLD is triggered.

This is expected bahaviour if your signal handler is installed with
sigaction without specifying the SA_NOCLDSTOP flag, which is what perl
does. See your system's sigaction(2).
The handler then does a 'wait' to get the PID and hangs because there
isn't a child that's exited.

You shouldn't simply call wait in a SIGCHLD handler, anyway. You don't
know how many children have exitted before you could handle the signal.
The usual idiom is something like

use POSIX qw/:sys_wait_h/;

$SIG{CHLD} = sub { 1 while 0 < waitpid -1, WNOHANG };

which will wait for everything that needs waiting for. See perlipc for
examples which let you get the child pid and exit status, and waitpid(2)
for how to check for children that have stopped/continued.
In the shell, the following sequence is seen:

----
justin@buttercup:~/Root/perltest$ perl testsleep.pl
Forking to do some long running task
Sleeping

[1]+ Stopped perl testsleep.pl

How do you think the shell knew its child had stopped? It relies on
SIGCHLD being sent when the process's status changes.

Ben
 
X

xhoster

Justin Fletcher said:
Hiya,

I'm having a problem trying to get a simple program to respond the way
that I expect. The basic premise is thus :

1. Fork a child.
2. Sleep for a while.
3. Do other stuff.

This seems pretty simple, and I have a SIGCHLD handler which will catch
my forked process if it exits. I thought everything was fine. Then I
found that is I press ctrl-Z to suspend the parent whilst I'm running the
program and then background it, it hangs.

I find that this only occurs if I hit ctrl-Z from the keyboard. If I
send the process the TSTP signal via some other means, it doesn't happen.
I know that shells often respond to ctrl-Z, ctrl-C, etc, by sending signals
to entire process groups, rather than just the main process. I don't
exactly how this leads to the observed phenomena, though.

Also, be using "strace", I see that the process actually is getting a
SIGCHLD, (as opposed to some bug in Perl causing it to think that it did
when really it didn't)

<snip. Thank you for providing the sample code. But I don't think I need
to quote it.>

I believe this sort of construct to be normal and even recommended from
the perlipc pages; so... am I doing something wrong ? is bash ? is the
kernel ? is perl ?

I see the same or similar behavior under tcsh. So I'm thinking it is the
kernel. I often find that programs which spawn other program do not behave
well when put into the background after the fact, but yours is the only
simple demonstration of this that I've seen. When using programs that fork
or spawn others, I've learned to try to start such programs in the
background with &, and if I forget then I just kill them and restart them
in the background rather than using ctrl-Z

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.
 
B

Ben Morrow

Quoth (e-mail address removed):
I find that this only occurs if I hit ctrl-Z from the keyboard. If I
send the process the TSTP signal via some other means, it doesn't happen.
I know that shells often respond to ctrl-Z, ctrl-C, etc, by sending signals
to entire process groups, rather than just the main process. I don't
exactly how this leads to the observed phenomena, though.

SIGCHLD is sent to the parent whenever a child changes status. So when
you press ctrl-Z, the whole process group is signalled, the child is
stopped, and the parent gets a SIGCHLD. When the process group is
resumed (bg or fg) the parent gets another SIGCHLD: since it hasn't
responded to the first yet (because it was stopped), this is not
usually apparent.

If the OP really doesn't want SIGCHLDs when a child stops, he can
install the signal handler explicitly with sigaction and SA_NOCLDSTOP
(under systems which support that). Since one must assume that any
number of children may have exitted when handling SIGCHLD anyway,
including 0 in 'any number' is generally easier.

Ben
 
X

xhoster

Ben Morrow said:
Quoth (e-mail address removed):

SIGCHLD is sent to the parent whenever a child changes status. So when
you press ctrl-Z, the whole process group is signalled, the child is
stopped, and the parent gets a SIGCHLD. When the process group is
resumed (bg or fg) the parent gets another SIGCHLD: since it hasn't
responded to the first yet (because it was stopped), this is not
usually apparent.

Thanks for the explanation. I did notice sometimes the parent went into
the $SIG{CHLD} code when ctrl-Z was hit. Presumably the child received its
TSTP first, and the parent for some reason got the CHLD from that before it
got the initial TSTP.
If the OP really doesn't want SIGCHLDs when a child stops, he can
install the signal handler explicitly with sigaction and SA_NOCLDSTOP
(under systems which support that).

Oy. That forces me to know more about the system thing than I wish I had
to know, at least for such a conceptually simple thing. Not that that is
surprising--there are limits to how much Perl can do to insulate me. malloc
and free it does a good job of, but signals I guess are harder.

Since one must assume that any
number of children may have exitted when handling SIGCHLD anyway,

This is only true if one knows there is more than one child to exit, or one
is writing code that is only a small part of a larger unknown system. If
one knows that there is only one child to exit, because only one has been
started, then one doesn't need to assume that any number greater than one
may have exited. And if it *is* part of a larger system, than all the
other parts need to agree on how to go about doing it. If one part does a
waitpid -1, WNOHANG and comes up with some other part's child, that could
cause problems. Maybe there should be a way to unwait on a child, which
would store the pid and exit status away somewhere, then if the localized
$SIG{CHLD} becomes unlocalized it would fire a fake SIG_CHLD and waitpid
could return the stored away value when it is next called.
including 0 in 'any number' is generally easier.

I find it easier to design/work around the need to ever set $SIG{CHLD} (to
anything other than the default or IGNORE) in the first place. :)

I'm perhaps fortunate in that I've usually been able to do so. Obviously,
not all people will be lucky enough to be able get away with that.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.
 
E

Eric Schwartz

Oy. That forces me to know more about the system thing than I wish I had
to know, at least for such a conceptually simple thing. Not that that is
surprising--there are limits to how much Perl can do to insulate me. malloc
and free it does a good job of, but signals I guess are harder.

I am reminded of some commercial Unix kernel hackers who were
responsible for the signal handling code. They had a pole 8 or 10
feet high with a sign on the top saying, "You must be THIS TALL to use
signals." As much as possible, they included themselves in this rule.

-=Eric
 
C

comp.llang.perl.moderated

Hiya,

I'm having a problem trying to get a simple program to respond the way
that I expect. The basic premise is thus :

1. Fork a child.
2. Sleep for a while.
3. Do other stuff.

This seems pretty simple, and I have a SIGCHLD handler which will catch my
forked process if it exits. I thought everything was fine. Then I found
that is I press ctrl-Z to suspend the parent whilst I'm running the
program and then background it, it hangs. I've reduced the problem to the
simplest I can, as follows :

----
#!/bin/perl

$SIG{'CHLD'} = sub {
print "SIGCHLD\n";
$pid = wait;
print "leave SIGCHLD for pid $pid\n";
};

print "Forking to do some long running task\n";
unless ($pid = fork) {
$SIG{'CHLD'} = 'DEFAULT';
exec "tail -f /dev/null";
die "failed\n";
};

print "Sleeping\n";
sleep 50;
print "Waking\n";
----

The problem is that if I press ctrl-Z whilst the program is sleeping, and
then resume it in the background with 'bg', a SIGCHLD is triggered. The
handler then does a 'wait' to get the PID and hangs because there isn't a
child that's exited. We never leave the SIGCHLD handler (unless the long
running task completes). The use of 'tail -f /dev/null' is purely to
simulate a task which just keeps running.

In the shell, the following sequence is seen:

----
justin@buttercup:~/Root/perltest$ perl testsleep.pl
Forking to do some long running task
Sleeping

[1]+ Stopped perl testsleep.pl
justin@buttercup:~/Root/perltest$ bg
[1]+ perl testsleep.pl &
SIGCHLD
justin@buttercup:~/Root/perltest$
----

I'm running bash 3.1.17, linux kernel 2.6.18, from debian stable, with
perl 5.8.8.

I believe this sort of construct to be normal and even recommended from
the perlipc pages; so... am I doing something wrong ? is bash ? is the
kernel ? is perl ?
think you could lose the SIGCHLD handler
as it's not necessary at all here. You're
not spawning multiple processes and SIGSTP
is problematic as you've seen. A simple
waitpid on the child should eliminate the
problems, eg.,

my $pid = fork;
die "fork: $!" unless defined $pid;

unless ($pid) { # child
exec "tail -f /dev/null"
or die "exec failed: $!\n";

} else { # parent
sleep 50;
waitpid $pid, 0;
}
 
J

Justin Fletcher

Quoth (e-mail address removed):

SIGCHLD is sent to the parent whenever a child changes status. So when
you press ctrl-Z, the whole process group is signalled, the child is
stopped, and the parent gets a SIGCHLD. When the process group is
resumed (bg or fg) the parent gets another SIGCHLD: since it hasn't
responded to the first yet (because it was stopped), this is not
usually apparent.

If the OP really doesn't want SIGCHLDs when a child stops, he can
install the signal handler explicitly with sigaction and SA_NOCLDSTOP
(under systems which support that). Since one must assume that any
number of children may have exitted when handling SIGCHLD anyway,
including 0 in 'any number' is generally easier.

Thanks for your (everyone on this group) help! I hadn't appreciated that
SIGCHLD was delivered for all the information signals, or that there might
be multiple children present.

The explanations given have helped me resolve the odd hangs I've been
getting. Yay :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
473,968
Messages
2,570,150
Members
46,697
Latest member
AugustNabo

Latest Threads

Top