having problems with open4 and stuck forked processes

T

Tim Uckun

I am running a batch process which uses the wkhtmltoimage-i386 binary
to make screenshots of urls. Unfortunately this is in beta and it
frequently hangs up and takes up 100% of one of the CPUs on the
machine.

I have the following code to try and detect the hung process and kill
it but it doesn't always work and I was wondering if anybody has a
better idea of how to do this. When I run it by testing simple
commands like sleep it works perfectly. In production with this binary
it doesn't seem to always work.

def Util.shell_with_timeout(cmd, seconds = 3600)
#the default timeout is an hour. That's probably way too long

Timeout::timeout(seconds) {
@pid, @stdin, @stdout, @stderr = Open4.popen4(cmd)
ignored, @status = Process::waitpid2 @pid
if @status.exitstatus != 0
raise "Exit Status not zero"
end
}

@stdout ? @stdout.read.strip : ''
rescue Timeout::Error
Process.detach @pid
Process.kill 'SIGKILL', @pid
raise "Process Timed out"
rescue => e
msg = @stderr ? @stderr.read.strip : ''
msg += e.to_s
raise "Error during execution of command #{cmd}\n #{msg}"
end
 
R

Robert Klemme

I am running a batch process which uses the wkhtmltoimage-i386 binary
to make screenshots of urls. =A0Unfortunately this is in beta and it
frequently hangs up and takes up 100% of one of the CPUs on the
machine.

I have the following code to try and detect the hung process and kill
it but it doesn't always work and I was wondering if anybody has a
better idea of how to do this. =A0When I run it by testing simple
commands like sleep it works perfectly. In production with this binary
it doesn't seem to always work.

What do you mean by that? Goes the timeout undetected? Can't you
kill the process? Are there any unexpected error messages /
exceptions?
def Util.shell_with_timeout(cmd, seconds =3D 3600)
=A0 =A0#the default timeout is an hour. That's probably way too long

=A0 =A0Timeout::timeout(seconds) {
=A0 =A0 =A0@pid, @stdin, @stdout, @stderr =3D Open4.popen4(cmd)
=A0 =A0 =A0ignored, @status =3D Process::waitpid2 @pid
=A0 =A0 =A0if @status.exitstatus !=3D 0
=A0 =A0 =A0 =A0raise "Exit Status not zero"
=A0 =A0 =A0end
=A0 =A0}

=A0 =A0@stdout ? @stdout.read.strip : ''
=A0rescue Timeout::Error
=A0 =A0Process.detach @pid
=A0 =A0Process.kill 'SIGKILL', @pid
=A0 =A0raise "Process Timed out"
=A0rescue =3D> e
=A0 =A0msg =3D =A0@stderr ? [email protected] : ''
=A0 =A0msg +=3D =A0 e.to_s
=A0 =A0raise "Error during execution of command #{cmd}\n #{msg}"
=A0end

A frequent problem with #popen methods is to not read file descriptors
which can make the client hang (i.e. if it writes more than fits into
a pipe). That could be something to check since you are not reading
any of the streams.

Kind regards

robert

--=20
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/
 
T

Tim Uckun

What do you mean by that? =C2=A0Goes the timeout undetected? =C2=A0Can't =
you
kill the process? =C2=A0Are there any unexpected error messages /
exceptions?

Obviously the timeout is not detected. I am not sure about the
exceptions as it happens when I am not looking but I will ramp up the
logging and see if I can trap anything.
A frequent problem with #popen methods is to not read file descriptors
which can make the client hang (i.e. if it writes more than fits into
a pipe). =C2=A0That could be something to check since you are not reading
any of the streams.

If you have any pointers to documentation about this I would really
appreciate it. I know so little about unix processes and pipes and
such.
 
R

Robert Klemme

Obviously the timeout is not detected.

I don't find that obvious at all from your initial description.
I am not sure about the
exceptions as it happens when I am not looking but I will ramp up the
logging and see if I can trap anything.

You could start by doing

Thread.abort_on_exception = true

at the beginning of your script.
If you have any pointers to documentation about this I would really
appreciate it. I know so little about unix processes and pipes and
such.

I don't have anything handy but I guess Google will help.

A pipe is basically what it looks like: it's a piece of pipe with you
write to on one end and read from at the other end. At the read end
there is a valve. If nobody reads the valve stays closed and you can't
fill in more at the write end. If you use blocking IO your process
blocks on the system call and won't be active before you read from the
other end. (This is a bit simplistic because it leaves threads and
interpreter implementation out of the way but this is basically what
happens).

http://en.wikipedia.org/wiki/Pipe_(Unix)

Kind regards

robert
 
E

elise huard

Thread.abort_on_exception = true
at the beginning of your script.

Euhm (asking this because I honestly don't know) will this work for
Processes ? (he's not using Thread)

Elise
 
B

Brian Candler

elise said:
Euhm (asking this because I honestly don't know) will this work for
Processes ? (he's not using Thread)

Timeout::timeout uses a thread internally - and it raises an exception
asynchronously in the main thread, which makes it unsafe in just about
any application you can think of for it.

It would be safer to use select() on the data coming from the child to
wait for the process to terminate (when you read end-of-file)
 
R

Robert Klemme

Euhm (asking this because I honestly don't know) will this work for
Processes ? (he's not using Thread)

But he uses Timeout which AFAIK uses threads internally for monitoring.

Cheers

robert
 
T

Tim Uckun

It would be safer to use select() on the data coming from the child to
wait for the process to terminate (when you read end-of-file)


Do you know of any examples on how to do that? I am willing to rewrite
my code obviously.
 
T

Tim Uckun

It would be safer to use select() on the data coming from the child to
wait for the process to terminate (when you read end-of-file)

But what if the process hangs?

Wouldn't I need to use timeout to check for that anyway?
 
R

Robert Klemme

But what if the process hangs?

Wouldn't I need to use timeout to check for that anyway?

Select can be called with a timeout which guarantees that the call
returns in time regardless whether there is any data available.

Kind regards

robert
 
T

Tim Uckun

Select can be called with a timeout which guarantees that the call returns
in time regardless whether there is any data available.


Hey guys I want to revist this issue because I can't seem to find any
documentation on how to do this.

What I want to do seem simple enough. I want to shell out to a process
which sometimes gets stuck. It won't return at all. It just sits there
taking up 100% of the CPU (one of the cores anyway). I just want to
make sure that if the process does not end in a reasonable amount of
time I want to kill it.

So far I have tried wrapping it in a timeout block but that doesn't
always trigger for some reason. I have plenty of error handling and
have an ensure block which says to kill the process if it exists but
nothing I do seems to work. Sooner or later I get a stuck process that
hangs around forever till I kill it by hand.

Surely there is a simple way to do this.

Here is the code I have so far.

http://gist.github.com/609119
 
C

Caleb Clausen

Hey guys I want to revist this issue because I can't seem to find any
documentation on how to do this.

What I want to do seem simple enough. I want to shell out to a process
which sometimes gets stuck. It won't return at all. It just sits there
taking up 100% of the CPU (one of the cores anyway). I just want to
make sure that if the process does not end in a reasonable amount of
time I want to kill it.

So far I have tried wrapping it in a timeout block but that doesn't
always trigger for some reason. I have plenty of error handling and
have an ensure block which says to kill the process if it exists but
nothing I do seems to work. Sooner or later I get a stuck process that
hangs around forever till I kill it by hand.

Timeout::timeout is kind of a hack. It's probably better to avoid it.
Surely there is a simple way to do this.

Here is the code I have so far.

http://gist.github.com/609119

Your problem may be that you're sending signal 0; you should pass
"TERM" or (if that won't work) "KILL" as the first parameter to
Signal.kill. signal 0 just queries if the process can receive signals
or not...

If you want to use select instead of timeout, then instead of this:

Timeout::timeout(seconds) {

@pid, @stdin, @stdout, @stderr = Open4.popen4(cmd)

ignored, @status = Process::waitpid2 @pid

if @status.exitstatus != 0
raise "Exit Status not zero"
end
}

You should use something like this: (UNTESTED)

@pid, @stdin, @stdout, @stderr = Open4.popen4(cmd)

if IO::select([@stdout],nil,nil,seconds)
Util.kill_process_if_exists? @pid
else
fail 'unexpected data on stdout'
end

ignored, @status = Process::waitpid2 @pid

if @status.exitstatus != 0
raise "Exit Status not zero"
end

Except, if the external process actually prints something to stdout,
then you need to call select in a loop until select returns nil, with
decreasing timeouts depending on how much time has passed.

Unfortunately, 'ri Kernel#select' seems to be broken... it just
refers you back to Kernel#select. I hope somebody fixes that. Check
what it says in the pickaxe instead. (There's a free version available
online if you don't own a copy yourself.)
 
T

Tim Uckun

Except, if the external process actually prints something to stdout,
then you need to call select in a loop until select returns nil, with
decreasing timeouts depending on how much time has passed.


Well I tried to go a different route and ran into a strange issue.

I found a shell script on the net and modified it a bit see this

http://gist.github.com/626072

This shell script works perfectly when I use it from bash but it works
weird when I call it with backtics in ruby.

basically what happens is that the backtics don't return until the
timeout is expired no matter what happens.

It's the weirdest thing.

Does anybody have an explanation for that?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

open4 stdin.close required? 0
[ANN] open4-0.9.2 0
[ANN] open4-0.6.0 0
[ANN] open4-0.9.0 0
[ANN] open4-0.8.0 2
[ANN] open4-0.5.1 0
[ANN] open4-0.5.0 0
[ANN] open4-0.7.0 0

Members online

No members online now.

Forum statistics

Threads
473,968
Messages
2,570,152
Members
46,697
Latest member
AugustNabo

Latest Threads

Top