Strange Socket problem

K

Knute Johnson

I'm having a problem in some production code that I can't figure out.
I'll post the complete actual code below. This code is running in three
places and has the same problem in two of them at the same time. The
other I'm not sure, it may be that the personnel operating it are
restarting the program and so don't complain. This piece of code is a
simple client that connects via a Socket to a server. The server
supplies some data and the client reads that data and files it away. It
is supposed to restart itself if there is a connection failure or fault
for whatever reason. The problem is that at some random point in time
the Socket disconnects, the code logs the disconnect but never restarts.
It does print the "SportsWinClient Disconnected" message but never
executes the "fireconnectionEvent()" method after creating a new Socket.
It doesn't print any Exception message. I'm not sure how it gets out
of the try block without printing the "End of Stream" message or an
exception message.

The crazy part is that all night long when there is no activity from the
server it times out and restarts with no problems.

I'm hoping that somebody will see a fault in my code that could cause
the failure. It is not a compile problem so I left the formatting as it is.

Thanks for looking.

package com.knutejohnson.xyzcasinos.translux;

import java.io.*;
import java.net.*;
import java.util.*;

import com.knutejohnson.classes.*;

import static com.knutejohnson.xyzcasinos.translux.Constants.*;

public class SportsWinClient implements Runnable {
private final Thread thread;

private volatile boolean isConnected;
private volatile boolean runFlag = true;

private volatile Socket socket;

public SportsWinClient() {
thread = new Thread(this,"SportsWinClient");
}

public void start() {
thread.start();
}

public void run() {
// boolean serverFlag = true;

System.out.println("SportsWinClient: Started");
while (runFlag) {
// String serverAddress = serverFlag ? SPORTS_WIN_IP_PRIMARY :
// SPORTS_WIN_IP_SECONDARY;
try {
// socket = new Socket(serverAddress,SPORTS_WIN_PORT,
socket = new Socket(SPORTS_WIN_IP,SPORTS_WIN_PORT,
InetAddress.getByName(REMOTE_IP),0);
socket.setKeepAlive(true);
isConnected = true;

********* I know that the line below is not being executed **********

fireConnectionEvent(ConnectionEvent.CONNECTED);
socket.setSoTimeout(3600000); // one hour timeout
System.out.println("SportsWinClient: Connected");
InputStream is = socket.getInputStream();
InputStreamReader isr = new InputStreamReader(is);
BufferedReader br = new BufferedReader(isr);

String str;
while ((str = br.readLine()) != null) {
if (!str.matches("\\d+.*")) // not a sports record
continue;
SportsBet sb = new SportsBet(str);
SPORTS_BET_MAP.put(sb.betNumber,sb);
}

System.out.println("SportsWinClient: End of Stream");
} catch (IOException ioe) {
System.out.println("SportsWinClient: " + ioe.toString());
} finally {
isConnected = false;
if (socket != null)
try {
socket.close();
} catch (IOException ioe) {
ioe.printStackTrace();
}
fireConnectionEvent(ConnectionEvent.DISCONNECTED);
// serverFlag = !serverFlag;

*********** I know that the line below is being executed *************

System.out.println("SportsWinClient: Disconnected");
}
// stop interrupts this thread so this will be bypassed on
a stop
try {
Thread.sleep(10000);
} catch (InterruptedException ie) { }
}
System.out.println("SportsWinClient: Stopping");
}

public void disconnect() {
if (isConnected())
if (socket != null)
try {
socket.close();
} catch (IOException ioe) {
ioe.printStackTrace();
}
}

public void stop() {
runFlag = false;
thread.interrupt();
if (socket != null)
try {
socket.close();
} catch (IOException ioe) {
ioe.printStackTrace();
}
}

public boolean isConnected() {
return isConnected;
}

private final java.util.List<ConnectionListener> connectionListeners =
new ArrayList<ConnectionListener>();

public synchronized void addConnectionListener(ConnectionListener cl) {
connectionListeners.add(cl);
}

public synchronized void
removeConnectionListener(ConnectionListener cl) {
connectionListeners.remove(cl);
}

private synchronized void fireConnectionEvent(int id) {
ConnectionEvent ce = new ConnectionEvent(this,id);

for (ConnectionListener listener : connectionListeners)
listener.connState(ce);
}
}
 
S

Steven Simpson

It does print the "SportsWinClient Disconnected" message but never
executes the "fireconnectionEvent()" method after creating a new
Socket. It doesn't print any Exception message. I'm not sure how it
gets out of the try block without printing the "End of Stream" message
or an exception message.

Can you provide an exact trace of the messages that are visible in the
code you've shown (around the problematic event, of course), just so
there's no ambiguity about what you're seeing?
System.out.println("SportsWinClient: End of Stream");
} catch (IOException ioe) {
System.out.println("SportsWinClient: " + ioe.toString());

Catch Throwable here too, print it out, and rethrow, so we can be sure
no other exception is slipping through.
 
K

Knute Johnson

Can you provide an exact trace of the messages that are visible in the
code you've shown (around the problematic event, of course), just so
there's no ambiguity about what you're seeing?


Catch Throwable here too, print it out, and rethrow, so we can be sure
no other exception is slipping through.

When the problem arrives, it prints the SportsWinClient: Disconnected
message and nothing else. It doesn't reconnect to the server.

I'll try putting in a catch for Throwable but it could take some time
before I see the problem again.

Thanks,
 
M

markspace

I'm having a problem in some production code that I can't figure out.


This code below strikes me as possibly being in the wrong order. If you
interrupt a thread, it should bail. But if you close the socket, it
should do the same thing, and close the socket too (I'm pretty sure
there's a handshake for TCP for "close me".)

If you interrupt the thread, then close the socket, the "close" might
never happen. I'd *just* close the socket, if that passes your testing.
Try to let the thread actually execute the close, and then unwind
naturally.

(I didn't look to see if your thread/runnable has other places it waits
besides the socket. If it does, those should be eliminated, replaced
with other sockets which are also closed, etc.)
 
M

Martin Gregorie

I'm having a problem in some production code that I can't figure out.
I'll post the complete actual code below. This code is running in three
places and has the same problem in two of them at the same time. The
other I'm not sure, it may be that the personnel operating it are
restarting the program and so don't complain. This piece of code is a
simple client that connects via a Socket to a server. The server
supplies some data and the client reads that data and files it away. It
is supposed to restart itself if there is a connection failure or fault
for whatever reason. The problem is that at some random point in time
the Socket disconnects, the code logs the disconnect but never restarts.
It does print the "SportsWinClient Disconnected" message but never
executes the "fireconnectionEvent()" method after creating a new Socket.
It doesn't print any Exception message. I'm not sure how it gets out
of the try block without printing the "End of Stream" message or an
exception message.

The crazy part is that all night long when there is no activity from the
server it times out and restarts with no problems.
I notice that InetAddress.getByName() can throw UnknownHostException (and
this exception isn't explicitly handled. Is it possible that this call
can time out when the network is busy?

However, as UnknownHostException is a subclass of IOException I don't see
how it avoids being caught, unless a timeout causes a null to be
returned: the description of getByName() isn't at all clear about when a
null is returned instead of throwing an exception. Pulling this out of
the new Socket() statement so it can be tested for a null return would at
least show whether this is happening.

Also, I don't see why you're using InetAddress to pass what is apparently
a remote address to the Socket constructor as localAddress, or why you're
using port zero as the local port.
 
S

Steven Simpson

When the problem arrives, it prints the SportsWinClient: Disconnected
message and nothing else. It doesn't reconnect to the server.

What about previous messages (Started, Connected, End of Stream),
especially for the cycle before the problem? Do you get this?:

Started
Connected
End of Stream
Disconnected
...
Connected
End of Stream
Disconnected
Connected
Disconnected

Or this?:

Started
Connected
End of Stream
Disconnected
...
Connected
End of Stream
Disconnected
Disconnected

Or is there no previous (successful) cycle?

I'm thinking that maybe a listener is throwing an unchecked exception.
 
K

Knute Johnson

What about previous messages (Started, Connected, End of Stream),
especially for the cycle before the problem? Do you get this?:

Started
Connected
End of Stream
Disconnected
...
Connected
End of Stream
Disconnected
Connected
Disconnected

Or this?:

Started
Connected
End of Stream
Disconnected
...
Connected
End of Stream
Disconnected
Disconnected

Or is there no previous (successful) cycle?

I'm thinking that maybe a listener is throwing an unchecked exception.

It cycles fine until it hangs. All three sites stop within a few
seconds of each other so it is something that happens on the server end
I think. But I have no idea what it could be.
 
K

Knute Johnson

I notice that InetAddress.getByName() can throw UnknownHostException (and
this exception isn't explicitly handled. Is it possible that this call
can time out when the network is busy?

That's possible I guess but I would think it would still throw an
UnknownHostException. Also, since I pass it a string with the IP
address specified with digits, it doesn't have to do any DNS lookup.
However, as UnknownHostException is a subclass of IOException I don't see
how it avoids being caught, unless a timeout causes a null to be
returned: the description of getByName() isn't at all clear about when a
null is returned instead of throwing an exception. Pulling this out of
the new Socket() statement so it can be tested for a null return would at
least show whether this is happening.

Also, I don't see why you're using InetAddress to pass what is apparently
a remote address to the Socket constructor as localAddress, or why you're
using port zero as the local port.

The computer has two NICs. The LOCAL_IP is the address of the NIC that
I want to use to connect to the server. Having to use InetAddress is a
limitation of the Socket constructor. Port 0 is the ephemeral port. It
allows the socket implementation to select any available port for the
client end. I'm not sure how InetAddress.getByName() can return a null.
 
K

Knute Johnson

This code below strikes me as possibly being in the wrong order. If you
interrupt a thread, it should bail. But if you close the socket, it
should do the same thing, and close the socket too (I'm pretty sure
there's a handshake for TCP for "close me".)

Interrupting a thread just sets a flag. It doesn't do anything unless
it runs across a method call that will throw an InterruptedException
(ie. Thread.sleep()).
If you interrupt the thread, then close the socket, the "close" might
never happen. I'd *just* close the socket, if that passes your testing.
Try to let the thread actually execute the close, and then unwind
naturally.

(I didn't look to see if your thread/runnable has other places it waits
besides the socket. If it does, those should be eliminated, replaced
with other sockets which are also closed, etc.)

Not sure what you mean here. The thread waits on the BufferedReader but
closing the socket closes the stream which should cause an IOException.
The server end closing the stream should cause the read to return null
and the code should print the End of Stream message which it doesn't so
I don't think that is where it is getting hung up.
 
M

markspace

Interrupting a thread just sets a flag. It doesn't do anything unless it
runs across a method call that will throw an InterruptedException (ie.
Thread.sleep()).

I seem to recall that interrupt() would also cause many types of IO
operations to abort and exit. I don't know for sure and I didn't double
check it.

If it happens at the same time for all clients, have you tried running a
client locally on your own system so you can observe it directly? Can
you run a network trace at the same time? Can you tell what was being
sent over the wire immediately before the clients hang? (Put
data/commands into a circular buffer then dump the buffer manually.)

What OS does the server run on? What info/statistics have you tried
looking at after a hang on the server?
 
X

x

Knute Johnson pisze:
I'm having a problem in some production code that I can't figure out.

Can you please describe the usage of your class ?

I'm rather puzzled about having a Thread object inside a Runnable. Your
class makes it is extremely easy to have TWO threads accessing private
fields, this is rarely a good idea. (yes, I have recreated it :)

What exactly is the point of the public start() method? I must say it
looks suspicious to me.

Best regards,
Marcin Jaskolski
 
L

Lew

Knute Johnson pisze:

Can you please describe the usage of your class ?

I'm rather puzzled about having a Thread object inside a Runnable. Your class

The issue would be what's used inside the 'run()' method, not just what fields
the class has.

I agree that the 'thread' variable's scope is wrong - it should be local to
'start()', not an instance member. But at the moment that's not causing any
outright harm.
makes it is extremely easy to have TWO threads accessing private fields, this
is rarely a good idea. (yes, I have recreated it :)

His class spawns one thread per instance. The main thread doesn't use the
mutable fields. The only common access of note is the 'runFlag' mechanism,
which is bog-standard in his code. The 'thread' member is not referenced
within 'run()'. I'm not sure why all his other instance variables are
'volatile', but at first blush I don't see a problem with his synchronization
safety.

You need to be specific about the trouble you claim you found. Show us where,
because I don't see danger coming from the areas you cited.
What exactly is the point of the public start() method? I must say it looks
suspicious to me.

It's also bog standard, and bog simple. Its purpose is to start the socket thread.

public class Client
{
public static void main(String[] args)
{
new SportsWinClient().start();
}
}
 
K

Knute Johnson

I seem to recall that interrupt() would also cause many types of IO
operations to abort and exit. I don't know for sure and I didn't double
check it.

Which would be fine but it is not detecting any kind of Exception.
If it happens at the same time for all clients, have you tried running a
client locally on your own system so you can observe it directly? Can
you run a network trace at the same time? Can you tell what was being
sent over the wire immediately before the clients hang? (Put
data/commands into a circular buffer then dump the buffer manually.)

I can't run it here because I can't get to the inside of the network
that these live on. I have no way to know what was being sent but all
of the relevant code is here. Bad data shouldn't cause any problems but
that does give me an idea of where to place another trap (thanks).
What OS does the server run on? What info/statistics have you tried
looking at after a hang on the server?

I have no idea what OS is running on the server and I can't get any data
from that end. I do know that the subprocess that I connect to is there
to keep me away from the actual server machine.
 
K

Knute Johnson

Knute Johnson pisze:

Can you please describe the usage of your class ?

I'm rather puzzled about having a Thread object inside a Runnable. Your
class makes it is extremely easy to have TWO threads accessing private
fields, this is rarely a good idea. (yes, I have recreated it :)

What exactly is the point of the public start() method? I must say it
looks suspicious to me.

Best regards,
Marcin Jaskolski

Another class instantiates this class and starts it. It is never
started again because it should not ever stop unless some runtime
exception is thrown.

It's not a Thread object inside a Runnable, it is a reference to the
Thread object. The Runnable is just that. If two of these got started,
it still probably wouldn't hurt anything.
 
K

Knute Johnson

I agree that the 'thread' variable's scope is wrong - it should be local
to 'start()', not an instance member. But at the moment that's not
causing any outright harm.

It's so I can start and interrupt the thread from a method call.

The volatiles exist because the methods that access them can be called
from other threads. I could have synchronized the start() stop()
methods but not easily the socket variable in the run() method. I
thought it was cleaner to just use volatile.
 
L

Lew

Knute said:
The volatiles exist because the methods that access them can be called from
other threads. I could have synchronized the start() stop() methods but not
easily the socket variable in the run() method. I thought it was cleaner to
just use volatile.

I see a problem right there.
public void disconnect() {
if (isConnected())
if (socket != null)
try {
socket.close();
} catch (IOException ioe) {
ioe.printStackTrace();
}
}

Since these are controlled by separate synchronization (different 'volatile'
variables) there's a race condition trying to work with both at once.

Also, 'socket' can become 'null' between the check for not 'null' and the
'close()' call.

You need to synchronize with 'synchronized' or other strong mechanism.
 
K

Knute Johnson

I see a problem right there.


Since these are controlled by separate synchronization (different
'volatile' variables) there's a race condition trying to work with both
at once.

I don't think so.
Also, 'socket' can become 'null' between the check for not 'null' and
the 'close()' call.

Only if I set it to null and I don't. That is there so that if the
Socket doesn't make the first connection when disconnect() is called
that I won't get a NPE.
You need to synchronize with 'synchronized' or other strong mechanism.

I don't think so and here's why; isConnected only gets modified by one
thread and read by another, socket is only modified by one thread and
read by another. In the disconnect() method, as soon as isConnected()
is called, isConnected the volatile variable is read and that would make
socket current even if it weren't volatile which it is but I didn't want
to rely on side effects in case I changed code somewhere.

Anyway, disconnect() isn't getting called in this situation so it's not
causing my problem.

I really appreciate everybody looking at this. I've got a couple of
ideas of where to code some traps and I'll have to put those in one
night and see what happens.
 
M

Martin Gregorie

That's possible I guess but I would think it would still throw an
UnknownHostException. Also, since I pass it a string with the IP
address specified with digits, it doesn't have to do any DNS lookup.
Agreed.

All I wanted to do was to flag up that its odd that the Javadocs say that
sometimes it returns null without (apparently) throwing an exception but
don't say explicitly what governs each behaviour, and that it would be
useful to call it in such a way that that you force it to make that
distinction explicit.
The computer has two NICs. The LOCAL_IP is the address of the NIC that
I want to use to connect to the server. Having to use InetAddress is a
limitation of the Socket constructor. Port 0 is the ephemeral port. It
allows the socket implementation to select any available port for the
client end. I'm not sure how InetAddress.getByName() can return a null.
OK.
 
M

Martin Gregorie

All three systems run fine until they hang. There are normal timeouts
early in the morning when there is no data available and they all
reconnect without a hitch.

I'm pretty sure the fault is in the Socket constructor somewhere. It
happens at all three sites at the same time so I think it is triggered
by something the server does.

Anybody think it could be caused by the keep alive?
I've never used it, so won't speculate.
 
M

Martin Gregorie

I don't think so.


Only if I set it to null and I don't. That is there so that if the
Socket doesn't make the first connection when disconnect() is called
that I won't get a NPE.


I don't think so and here's why; isConnected only gets modified by one
thread and read by another, socket is only modified by one thread and
read by another. In the disconnect() method, as soon as isConnected()
is called, isConnected the volatile variable is read and that would make
socket current even if it weren't volatile which it is but I didn't want
to rely on side effects in case I changed code somewhere.

Anyway, disconnect() isn't getting called in this situation so it's not
causing my problem.

I really appreciate everybody looking at this. I've got a couple of
ideas of where to code some traps and I'll have to put those in one
night and see what happens.

new Socket() can return null (it says, but not why) but I don't think
that is happening or you'd have seen an NPE.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,969
Messages
2,570,161
Members
46,705
Latest member
Stefkari24

Latest Threads

Top