Tomcat can't connect to MySQL

G

georgesbilodeau

Here's my problem:

I run a secure (SSL) Java-based web app, using Tomcat
5.5.17/Apache/mod_jk. Those reside on one server while MySQL 5.0 sits
on another. All servers are running CentOS 4.3 and have plenty of RAM
(2-3 GB) and fast CPUs. The site runs perfectly fine for a while, then
all of a sudden it stops responding. Trying to hit any page on the site
that requires a DB connection just stalls and stalls and the page is
never served.

I use a custom-made database connection pool that doesn't have any
serious problems. If a connection in the pool leaks, which happens very
rarely, the app is able to recover, and if all connections have leaked,
the app will open up new connections to the DB (the app has never
actually leaked all connections in production, just in testing).

The app will be running for fine for a day or two, without any hitch in
response time, then all of a sudden the site will hang inexplicably.
There are no errors to speak of in the catalina log when it hangs like
this. MySQL is still running because other apps can hit the same
database without a problem. Tomcat & Apache are both still running
because I can access pages on the site that don't require a DB
connection.

I use the export LD_ASSUME_KERNEL=2.4.1 setting in my Tomcat startup to
avoid stability problems, since the CentOS distribution I use is Red
Hat 9.0-based (see
http://tomcat.apache.org/tomcat-5.5-doc/RELEASE-NOTES.txt).

I've racked my brain for days on end trying to figure this one out but
I'm completely stumped. Any ideas would be greatly appreciated.

Georges
 
M

Mark Jeffcoat

I use a custom-made database connection pool that doesn't have any
serious problems. If a connection in the pool leaks, which happens very
rarely, the app is able to recover, and if all connections have leaked,
the app will open up new connections to the DB (the app has never
actually leaked all connections in production, just in testing).


I believe without evidence that the reason you mentioned this
is that some part of your mind that you're not quite yet conscious
of believes that the problem is somewhere in here.
The app will be running for fine for a day or two, without any hitch in
response time, then all of a sudden the site will hang inexplicably.
There are no errors to speak of in the catalina log when it hangs like
this. MySQL is still running because other apps can hit the same
database without a problem. Tomcat & Apache are both still running
because I can access pages on the site that don't require a DB
connection.

I'd want to focus first on reproducing the symptoms in a
controlled environment. The simplest way that might possibly
work is to use a tool like ab ("Apache bench", probably lurking
somewhere in your Apache distribution) to hit a page that requires
a db access many many times (pointing at a test machine, of course),
and see if it locks up after a fews days worth of hits.

That would be excellent. Do that, and you've got something
to test hypotheses with.

(This sort of thing happened to me once -- I ended up figuring
out that running two instances of ab with slightly different
timings would get it to freeze up almost instantly, which just
screams "Race Condition". Once I knew what I was looking for,
forehead slapping quickly followed.)

(Without being able to reproduce the bug, I could have found
and fixed the race condition, but not had any idea whether or
not I'd fixed the problem that was actually causing the observed
freeze. It was worth the effort.)
 
G

georgesbilodeau

Thanks for the reply. I've tried sending some load at a test machine
using 2 instances of Siege (http://www.joedog.org/JoeDog/Siege).
They're hitting a page that accesses the DB. My last test ran each
instance for 2 hours, and threw WAY more load at the site than it ever
gets in production, with lots of threads constantly checking DB
connections in and out of the pool. The server handled the load
masterfully, and the DB connections held up the whole time. I would
think that, at some point, the same problem experienced in production
would have happened during the test. GRRR.... to no avail.

I'm currently running a 10 hour test that will go overnight, again with
two instances, with one instance using a delay of 0-2 seconds between
requests and the other 0-3. Hopefully this one will be a little (ok a
LOT) more fruitful.

As a side note, I've had a production server's DB connections hang up
in a matter of less than an hour before, so it doesn't make sense to me
that using a longer test will be more successful (although I would love
to be proven wrong). When they did hang up in less than hour, it was on
a particularly busy day for the site, but still not nearly as busy as a
Siege test makes it.

Anyway, thanks again.

Georges
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,968
Messages
2,570,149
Members
46,695
Latest member
StanleyDri

Latest Threads

Top