I would like to add one additional problem to your explanation, though:
timeout exceptions can prevent you from knowing whether or not an
operation actually succeeded. We have this problem with CORBA timeouts
in particular; a remote call is made, and the process on the other side
is running slowly. It does, however, complete the operation, just as
the timeout exception is being fired. So do I treat the operation as
success (since I know the request reached the other side, as I didn't
get a communications failure exception), or do I treat the operation as
failure (since I did get an exception and I did fail to get a return
value from the call)?
I think that's a broader problem, and not specific to Ruby timeouts.
In the simplest case it's a pure race: with a 30 second timeout, what
happens if you get a response after 29.99 seconds or 30.01 seconds? In my
opinion, the borderline case doesn't really matter; you can assert that if a
Timeout::Error is fired, then you did not get a response in time. If the
server subsequently responds after 30.01 or 40 or 50 seconds, then tough; it
was too late, by definition.
However, I guess what you're really worried about is something more
fundamental: did the command actually complete on the CORBA server? Did the
server change state? Should I resubmit the command later?
You can see that this cannot be handled by timeouts alone. For example:
(1) the command might have completed after 27 seconds, but due to network
congestion, the reply did not get back until after 32 seconds. (=> the
command completed in time, but you were unable to detect this)
(2) the command might continue to execute after your timeout exception
fires, and complete after say 31 seconds. Even if the timeout exception then
goes on to drop the CORBA connection or try to chase the command with an
"abort" message of some sort, it's still a race which might be lost.
In order to be able to tell with certainty whether your command was accepted
AND acted upon, I believe you really need to use a sequence-number type of
mechanism, where both ends keep track of which messages they have sent and
have been acknowledged by the other side.
---> submit command N
... timeout
<-- response N "1234" (ignored by client, it was too late)
---> resubmit command N
<-- response N "1234" (from cache)
---> submit command N+1
<-- response N+1 "9876" etc.
Each end needs to keep track of the last command or response sent, so that
it can be resent if necessary. At the server side, if command N is received
a second time, the previous (remembered) response is resent; that's because
the command already executed the first time and changed the system state, so
attempting to perform the command again could fail. The client sending
command N+1 is an implicit acknowledgement that the response from command N
has been received, and no longer needs to be remembered.
Unfortunately, building a protocol like this *properly* is difficult, and if
done right you will end up with something which looks very much like TCP or
the X25 link layer. You need procedures to initialise the sequence numbers
and reset them in the case of gross errors, such as one end or the other
forgetting its sequence number. Ideally the sequence numbers and message
buffers should be persistent across application restarts (i.e. they are
stored in a database). Each "logical connection" between two endpoints needs
to be distinct with its own sequence number set. And you will need to choose
appropriate retransmission parameters.
This is a common problem though, and I'd certainly like to see a generic
encapsulation protocol which handles it properly. I think if done right, it
would work over multiple transport layers (e.g. HTTP POST, or even exchanges
of E-mail messages). If anyone knows of such a thing, I'd love to see it.
Regards,
Brian.