JVM process in status "stop"?

S

Svante Frey

We are running a quite busy multithreaded inhouse application in Java
1.4.2 on a multi-processor Solaris 8 system, which also uses some
socket communication (lots of process starting and file I/O, pure
batch, no graphics). Very occasionally, the entire process stops
executing and responding to all signals except SIGKILL (typically after
a month of problem-free execution). The JVM process is shown as being
in status "stop" with the "prstat" command (the enhanced version of the
"top" command for Solaris, looks like this):

7219 amtrix 77M 36M stop 29 10 0:26.44 0.0% java/47

The problem has been reprocuded under both the "client" and "server"
JVM, as well as under different modifications of Java 1.4.2 (currently
we're using 1.4.2_05).

A deadlock/racing condition in the code would be the natural
explanation, of course, but that would not cause the JVM process to
stop like this. We are quite certain that no user has been sending
SIGSTOP to the process either...I've tried searching forums, FAQs and
websites but haven't come up with any similar problem description so
far.

Since the process is stopped, it does not react to signals, so not
possible to take a thread dump. So far, I have not been able to try
sending a SIGCONT to see if it will start running again...

Anyone with experiences, hints or ideas about what it means that the
JVM process is in status "stop"?
 
S

Svante Frey

To follow up my own post here, I have now found that the JVM process
started an identical copy of itself at the same instant it stopped. The
original process looked like this:

amtrix 7219 7216 0 Dec 08 ? 2082:28 java -cp
bin/scheduler.jar:bin/hpcharset.jar:bin/xercesImpl.jar:bin/xmlParserAP

The new child process looked like this:

amtrix 8114 7219 0 20:02:39 ? 0:00 java -cp
bin/scheduler.jar:bin/hpcharset.jar:bin/xercesImpl.jar:bin/xmlParserAP

Both process were in status "stop" as if they had received a SIGSTOP
signal:

8114 amtrix 77M 36M stop 28 10 0:00.00 0.0% java/1
7219 amtrix 77M 36M stop 29 10 0:26.44 0.0% java/47

There is nothing in the program code that can cause it to start a clone
of itself -- and even if there was, that would not explain why both the
new and the old processes became stopped.

Has anyone seen such "spontaneous JVM cloning" behaviour before, or has
any idea what it means?
 
S

Stefan Schulz

Has anyone seen such "spontaneous JVM cloning" behaviour before, or has
any idea what it means?

It might just be that the JVM for your architecture implements threads by
forking itself, and you use multiple threads somewhere in your application.
 
S

Svante Frey

It might just be that the JVM for your architecture implements threads by
forking itself, and you use multiple threads somewhere in your application.

Sure -- we use lots of threads, but they are contained within the
application (the java/47 field in the prstat output means that the
process uses 47 threads or LWPs). But the JVM process starts lots of
external processes with Runtime.exec(), for every such process start
the Java process needs to fork itself.

Still, any of this doesn't explain why the parent process (and the
child process) stop. That is what causes problems, since it also causes
the data flow into our production systems to stop. I have built some
workarounds that checks every 5 minutes that the JVM is active and
restarts it if it isn't, but the situation is still a bit frustrating.
I will try to take a stack trace and core dump of the newly started
child process, to see if it brings anything. But the problem is
completely non-reproducible and only occurs about once a month, so
there are not many opportunities for testing...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,994
Messages
2,570,223
Members
46,814
Latest member
SpicetreeDigital

Latest Threads

Top