S
Svante Frey
We are running a quite busy multithreaded inhouse application in Java
1.4.2 on a multi-processor Solaris 8 system, which also uses some
socket communication (lots of process starting and file I/O, pure
batch, no graphics). Very occasionally, the entire process stops
executing and responding to all signals except SIGKILL (typically after
a month of problem-free execution). The JVM process is shown as being
in status "stop" with the "prstat" command (the enhanced version of the
"top" command for Solaris, looks like this):
7219 amtrix 77M 36M stop 29 10 0:26.44 0.0% java/47
The problem has been reprocuded under both the "client" and "server"
JVM, as well as under different modifications of Java 1.4.2 (currently
we're using 1.4.2_05).
A deadlock/racing condition in the code would be the natural
explanation, of course, but that would not cause the JVM process to
stop like this. We are quite certain that no user has been sending
SIGSTOP to the process either...I've tried searching forums, FAQs and
websites but haven't come up with any similar problem description so
far.
Since the process is stopped, it does not react to signals, so not
possible to take a thread dump. So far, I have not been able to try
sending a SIGCONT to see if it will start running again...
Anyone with experiences, hints or ideas about what it means that the
JVM process is in status "stop"?
1.4.2 on a multi-processor Solaris 8 system, which also uses some
socket communication (lots of process starting and file I/O, pure
batch, no graphics). Very occasionally, the entire process stops
executing and responding to all signals except SIGKILL (typically after
a month of problem-free execution). The JVM process is shown as being
in status "stop" with the "prstat" command (the enhanced version of the
"top" command for Solaris, looks like this):
7219 amtrix 77M 36M stop 29 10 0:26.44 0.0% java/47
The problem has been reprocuded under both the "client" and "server"
JVM, as well as under different modifications of Java 1.4.2 (currently
we're using 1.4.2_05).
A deadlock/racing condition in the code would be the natural
explanation, of course, but that would not cause the JVM process to
stop like this. We are quite certain that no user has been sending
SIGSTOP to the process either...I've tried searching forums, FAQs and
websites but haven't come up with any similar problem description so
far.
Since the process is stopped, it does not react to signals, so not
possible to take a thread dump. So far, I have not been able to try
sending a SIGCONT to see if it will start running again...
Anyone with experiences, hints or ideas about what it means that the
JVM process is in status "stop"?