G
Grant
Hi Everyone,
A little background to begin with. In my company, I've been given the
task of resolving a disconnection issue between a TCP probe scripted
in Perl, and a W2K app. I've barely touched Perl in the past, so
"right into the fire..." so-to-speak.
The Perl script resides on a Solaris 8 box, using Perl 5.8.0. Since
I've done a bit of C coding in the past, I've been able to determine
what the code is doing. It's a multithreaded script that opens a
socket to a number of W2K servers and reads data that is spit out from
an app on the W2K servers (data is dumped to port 3000 on its own
machine, a socket connection is made to this port, and the data is
picked up). The data then is dumped into a Queue from all operating
threads, then piped into another probe for further processing.
The problem occurs when the app on the W2K box initiates a disconnect.
I installed Windump on the W2K server, and saw that the app does send
a FIN and ACK, and does recieve an ACK back from the Perl probe, but
the FIN and ACK from the probe are never sent to the W2K server. Thus,
the W2K machine is stuck in a FIN_WAIT_2 state, and the Solaris
machine in a CLOSE_WAIT state. The states remain until the Perl probe
reaches a TCP inactivity timeout of 10 minutes (this condition is
explicitly checked within the code). At this point, an explicit call
to close the socket occurs. The connection is torn down, and the probe
reinitiates a connection back to the W2K server.
Adjusting the timeout value lower is not an option. It is possible
that the connection could be inactive (ie. no data sent) for up to 7
minutes, which is within normal operating parameters. The bottom line
is that if I lower it below that, I will have unneccessary inactivity
alarms come up throughout the day. But I need to have the probe
reconnect almost immediately, as the data that is sent is time
sensitive. In case you are wondering, the W2K app initiates a
disconnect each day in order to restart itself and compact a database
it uses.
I'm fairly certain the problem is not on the W2K side of the
connection, as there currently is a C probe running on another Solaris
machine that essentially does the same thing, and the TCP teardown is
done properly.
After reading that there was a setsockopt function, I thought there
would be a way to query the state of socket, but there isn't one that
I can find. I then thought I could make a system call, run netstat,
grep the IP that is used in the thread in question, search for
"CLOSE_WAIT" and if it exists, close the socket. But I kept saying to
myself that there has to be a more elegant way of dealing with this.
As mentioned earlier, the script is using Perl 5.8.0, along with the
threads module (and Threads::Queue which uses threads). I'm not really
sure how to determine if I'm running the proper threads module or not.
When I do a perl -V, I get this as part of the output:
usethreads=define use5005threads=undef useithreads=define
usemultiplicity=define
Any and all help would be apprciated... I'm somewhat leary on posting
the code, as it wasn't my company that developed it (vendor support
isn't an option, which brings me here), but if it will shed more light
on this problem, I will.
Thanks a bunch in advance,
Grant
A little background to begin with. In my company, I've been given the
task of resolving a disconnection issue between a TCP probe scripted
in Perl, and a W2K app. I've barely touched Perl in the past, so
"right into the fire..." so-to-speak.
The Perl script resides on a Solaris 8 box, using Perl 5.8.0. Since
I've done a bit of C coding in the past, I've been able to determine
what the code is doing. It's a multithreaded script that opens a
socket to a number of W2K servers and reads data that is spit out from
an app on the W2K servers (data is dumped to port 3000 on its own
machine, a socket connection is made to this port, and the data is
picked up). The data then is dumped into a Queue from all operating
threads, then piped into another probe for further processing.
The problem occurs when the app on the W2K box initiates a disconnect.
I installed Windump on the W2K server, and saw that the app does send
a FIN and ACK, and does recieve an ACK back from the Perl probe, but
the FIN and ACK from the probe are never sent to the W2K server. Thus,
the W2K machine is stuck in a FIN_WAIT_2 state, and the Solaris
machine in a CLOSE_WAIT state. The states remain until the Perl probe
reaches a TCP inactivity timeout of 10 minutes (this condition is
explicitly checked within the code). At this point, an explicit call
to close the socket occurs. The connection is torn down, and the probe
reinitiates a connection back to the W2K server.
Adjusting the timeout value lower is not an option. It is possible
that the connection could be inactive (ie. no data sent) for up to 7
minutes, which is within normal operating parameters. The bottom line
is that if I lower it below that, I will have unneccessary inactivity
alarms come up throughout the day. But I need to have the probe
reconnect almost immediately, as the data that is sent is time
sensitive. In case you are wondering, the W2K app initiates a
disconnect each day in order to restart itself and compact a database
it uses.
I'm fairly certain the problem is not on the W2K side of the
connection, as there currently is a C probe running on another Solaris
machine that essentially does the same thing, and the TCP teardown is
done properly.
After reading that there was a setsockopt function, I thought there
would be a way to query the state of socket, but there isn't one that
I can find. I then thought I could make a system call, run netstat,
grep the IP that is used in the thread in question, search for
"CLOSE_WAIT" and if it exists, close the socket. But I kept saying to
myself that there has to be a more elegant way of dealing with this.
As mentioned earlier, the script is using Perl 5.8.0, along with the
threads module (and Threads::Queue which uses threads). I'm not really
sure how to determine if I'm running the proper threads module or not.
When I do a perl -V, I get this as part of the output:
usethreads=define use5005threads=undef useithreads=define
usemultiplicity=define
Any and all help would be apprciated... I'm somewhat leary on posting
the code, as it wasn't my company that developed it (vendor support
isn't an option, which brings me here), but if it will shed more light
on this problem, I will.
Thanks a bunch in advance,
Grant