NIO and accepts()

  • Thread starter Cyrille \cns\ Szymanski
  • Start date
C

Cyrille \cns\ Szymanski

Hello,

I'm benchmarking several server io strategies and for that purpose I've
built two simplistic Java ECHO servers.

One of the server implementation takes advantage of the java.nio API.
However it (my implementation) is slower than the classic 1 thread /
client server. I've managed to find out (thanks to the profiler) that the
accept() function call was slowing down the process. The strange thing is
that I'm calling accept() only when SelectionKey.isAcceptable() and thus
this operation should be fast, right ? Issues ?

To test this behaviour I used a program that sequentially creates N
connections to the server. The first server I wrote used an infinite loop
that accepts sockets from a ServerSocket. The second server I wrote uses
nio, selects on OP_ACCEPT and does a SelectionKey.accept(). This takes
about 10 times longer.

I'd also like to take advantage of multiprocessor architectures and spawn
as many "worker threads" (taken from the IOCP voc.) as there are CPUs
installed. Has anybody done this already ?

Is it good practice to have multiple threads waiting on select() on the
same Selector ?

How can I register a Channel with a selector while one thread is in
Selector.select() and have this thread process incoming events ? What
I've done so far is loop on selects() with a timeout but this surely
isn't good practice.

I'm saddened to see that as I wrote it the 1 thread/client outperforms
the nio one...

Here comes the code :


import java.io.*;
import java.lang.*;
import java.net.*;
import java.nio.*;
import java.nio.channels.*;
import java.util.*;

public class javaenh
{
public static void main(String args[]) throws Exception
{
// incoming connection channel
ServerSocketChannel channel = ServerSocketChannel.open();
channel.configureBlocking(false);
channel.socket().bind( new InetSocketAddress( 1234 ) );

// Register interest in when connection
Selector selector = Selector.open();
channel.register( selector, SelectionKey.OP_ACCEPT );

System.out.println( "Ready" );
// Wait for something of interest to happen
while( selector.select()>0 )
{
// Get set of ready objects
Iterator readyItor = selector.selectedKeys().iterator();

// Walk through set
while( readyItor.hasNext() )
{
// Get key from set
SelectionKey key = (SelectionKey)readyItor.next();
readyItor.remove();

if( key.isReadable() )
{
// Get channel and context
SocketChannel keyChannel = (SocketChannel)key.channel
();
ByteBuffer buffer = (ByteBuffer)key.attachment();
buffer.clear();

// Get the data
if( keyChannel.read( buffer )==-1 ) {
keyChannel.socket().close();
buffer = null;
} else {
// Send the data
buffer.flip();
keyChannel.write( buffer );

// wait for data to be sent
keyChannel.register( selector,
SelectionKey.OP_WRITE, buffer );
}
}
else if( key.isWritable() )
{
// Get channel and context
SocketChannel keyChannel = (SocketChannel)key.channel
();
ByteBuffer buffer = (ByteBuffer)key.attachment();

// data sent, read again
keyChannel.register( selector, SelectionKey.OP_READ,
buffer );
}
else if( key.isAcceptable() )
{
// Get channel
ServerSocketChannel keyChannel =
(ServerSocketChannel)key.channel();

// accept incoming connection
SocketChannel clientChannel = keyChannel.accept();

// create a client context
ByteBuffer buffer = ByteBuffer.allocateDirect( 1024
);

// register it in the selector
clientChannel.configureBlocking(false);
clientChannel.register( selector,
SelectionKey.OP_READ, buffer );
}
else
{
System.err.println("Ooops");
}
}
}
}
}
 
D

Douwe

One of the server implementation takes advantage of the java.nio API.
However it (my implementation) is slower than the classic 1 thread /
client server. I've managed to find out (thanks to the profiler) that the
accept() function call was slowing down the process. The strange thing is
that I'm calling accept() only when SelectionKey.isAcceptable() and thus
this operation should be fast, right ? Issues ?

I´m not sure that NIO was written to outperform the classic IO
(specific Socket). The idea behind NIO is that you do not have to
start a new Thread for every client since the underlaying operating
system more or less already created a Thread for that client (this I
think depends on the platform java is running on). Not creating a
seperate thread for each client has one very big advantage; it
simplifies all data handling. i.e. you want to write the data received
from a client to a datafile. In a multiple threaded program you have
to make sure that you are the only one writing to that file in a
single threaded program you just write you´re data (since you are
already sure you are the only thread writing data to the disk at that
moment). Unfortunately a single threaded program has some
disadvantages as well: if one client sends erronous data and causes
the thread to go into a locked state then this means all other client
handling is blocked as well. You say that the accept method is slow
and you´ve probably expected that NIO would solve this. Unfortunately
you still have to call the accept method although you are sure (by
using the selector) it will not block, it still has to initialize the
socket structure (which I think takes some time) and since the program
is single threaded all your clients have to wait.
To test this behaviour I used a program that sequentially creates N
connections to the server. The first server I wrote used an infinite loop
that accepts sockets from a ServerSocket. The second server I wrote uses
nio, selects on OP_ACCEPT and does a SelectionKey.accept(). This takes
about 10 times longer.

I'd also like to take advantage of multiprocessor architectures and spawn
as many "worker threads" (taken from the IOCP voc.) as there are CPUs
installed. Has anybody done this already ?

Dont know ? at least I have not :)
Is it good practice to have multiple threads waiting on select() on the
same Selector ?

No.....
I don´t understand why you want to use a combination of a Selector and
also use multiple Threads. In a multiprocessor environment a multi
threaded program will almost always outperform a single threaded
program (depending on the design of the programs and on the programs
algorithm). If you have already created multiple threads for different
connections and want to use one selector for that then it means that
you more or less block all threads until the selector wakes up again
and notifies the threads needed. To do this you have to create an
extra thread to handle the selector and have to create some
synchronized methods so that client threads can control this thread.
You´ve then created a complex system that uses a Selector.

I think the best pratice is to handle each client in a seperate
thread. To avoid the overkill of creating the threads you could create
a system where a thread can be reused over and over againg. Depending
on the number of clients and the number of processors (if these are
more or less static) you could use a selector in each thread where you
handle mulitple clients. This you should only do if you have a very
large number of clients connecting and a small number of CPUs.

How can I register a Channel with a selector while one thread is in
Selector.select() and have this thread process incoming events ? What
I've done so far is loop on selects() with a timeout but this surely
isn't good practice.

I don´t think you want to access a Selector with different threads
(this is IMO absolutely BAD practice). You could create an extra class
with a thread handling the selector and the other threads communicate
with this thread via methods (as described above) but try to avoid
multiple access on the Selector object itself. Maybe you can think of
a Selector as an object to coordinate your data handling and not to
handle the data itself.
I'm saddened to see that as I wrote it the 1 thread/client outperforms
the nio one...

Don´t think this has to do with NIO ... this has to do with the use of
the Selector (which is just one part of NIO).
Here comes the code :

And I removed it :)
 
J

John C. Bollinger

Cyrille said:
I'm benchmarking several server io strategies and for that purpose I've
built two simplistic Java ECHO servers.

Good move. Test, don't assume.
One of the server implementation takes advantage of the java.nio API.
However it (my implementation) is slower than the classic 1 thread /
client server. I've managed to find out (thanks to the profiler) that the
accept() function call was slowing down the process. The strange thing is
that I'm calling accept() only when SelectionKey.isAcceptable() and thus
this operation should be fast, right ? Issues ?

The actual profiler output might be useful here. It may be the case
that your implementation is buggy; I am not an NIO expert, but my
analysis of your code shows at least one or two possible problems (see
below). The problems may or may not have anything to do with your slow
accepts.

More importantly, however, you should consider whether your test
scenario is a good model for the application you plan. Slow accepts are
a problem only if accepting new connections is expected to be a
significant part of your service's work, which might not be the case.
To test this behaviour I used a program that sequentially creates N
connections to the server. The first server I wrote used an infinite loop
that accepts sockets from a ServerSocket. The second server I wrote uses
nio, selects on OP_ACCEPT and does a SelectionKey.accept(). This takes
about 10 times longer.

I'd also like to take advantage of multiprocessor architectures and spawn
as many "worker threads" (taken from the IOCP voc.) as there are CPUs
installed. Has anybody done this already ?

Is it good practice to have multiple threads waiting on select() on the
same Selector ?

Per the API docs, Selectors are thread-safe but their various key sets
are not. I'm not sure what you would expect the behavior to be with
multiple threads selecting on the same selector concurrently, in any
case. In particular, the selector's key sets are _not_ thread safe, so
you can't have multiple threads processing those concurrently, at least
if any of the threads attempt to modify the sets.
How can I register a Channel with a selector while one thread is in
Selector.select() and have this thread process incoming events ? What
I've done so far is loop on selects() with a timeout but this surely
isn't good practice.

If you are doing it all in one thread then you can only register a
channel when that thread is not doing something else (e.g. blocking on
selection). You must therefore ensure that the selection loop will
cycle periodically, which would be done exactly as you describe if you
generally have little else to do in that thread, or by using selectNow()
instead of select() if that thread generally has enough other work to do
to only check the selector periodically.

If you have a seperate thread in which you intend to perform the
registration then you should be able to do that without fear, but it is
not clear to me whether the registration would block, or whether the new
channel would be eligible for selection during the current invocation of
select(). (My guesses would be yes, it would block, and no, it wouldn't
be immediately eligible.)
I'm saddened to see that as I wrote it the 1 thread/client outperforms
the nio one...

The thread per client approach is tried and true. I wouldn't give up on
the selection approach just yet, however. As long as you are looking
into this sort of thing, it's worthwhile to try to tune your code a bit
to get the best performance out of each technique. The selector
variation is harder to get right (in other languages too).

[...]
public class javaenh
{
public static void main(String args[]) throws Exception
{
// incoming connection channel
ServerSocketChannel channel = ServerSocketChannel.open();
channel.configureBlocking(false);
channel.socket().bind( new InetSocketAddress( 1234 ) );

// Register interest in when connection
Selector selector = Selector.open();
channel.register( selector, SelectionKey.OP_ACCEPT );

Looks good so far....
System.out.println( "Ready" );
// Wait for something of interest to happen
while( selector.select()>0 )
{

This while condition is fine for testing, but is probably not what you
would want to use in a real app. The select() method will return zero
if the Selector's wakeUp() method is invoked or if the thread in which
select() is blocking is interrupted (from another thread in either case)
without any selectable channels being ready.
// Get set of ready objects
Iterator readyItor = selector.selectedKeys().iterator();

// Walk through set
while( readyItor.hasNext() )
{
// Get key from set
SelectionKey key = (SelectionKey)readyItor.next();
readyItor.remove();

This is fine here, but would be buggy if the Selector were concurrently
accessed by multiple threads as you proposed doing. It does appear that
this is necessary to indicate that you have handled the operation that
was selected for.
if( key.isReadable() )
{
// Get channel and context
SocketChannel keyChannel = (SocketChannel)key.channel
();
ByteBuffer buffer = (ByteBuffer)key.attachment();
buffer.clear();

// Get the data
if( keyChannel.read( buffer )==-1 ) {
keyChannel.socket().close();
buffer = null;

Setting the local buffer variable to null is pointless. The Buffer will
remain reachable (and thus not be deallocated or GC'd) at least until
the SelectionKey with which it is associated becomes unreachable. If
you wanted to reuse the buffer (via a buffer pool, for instance) then
you would want to disassociate it from the key and return it to the pool
here, but probably you can just forget about it.
} else {
// Send the data
buffer.flip();
keyChannel.write( buffer );

This is buggy. The channel is in non-blocking mode, so you are not
assured that all the available data (or even any of it) will be written
during this invocation of write().
// wait for data to be sent
keyChannel.register( selector,
SelectionKey.OP_WRITE, buffer );

This is suboptimal. Rather than register the channel again, you should
be changing the key's interest set. The same buffer will even remain
associated. Moreover, if you have successfully written all the buffer
contents then you don't need to select for writing at all, just again
for reading.
}
}
else if( key.isWritable() )
{
// Get channel and context
SocketChannel keyChannel = (SocketChannel)key.channel
();
ByteBuffer buffer = (ByteBuffer)key.attachment();

// data sent, read again
keyChannel.register( selector, SelectionKey.OP_READ,
buffer );

As above, this is suboptimal -- just change the interest set. Before
doing so, however, attempt to write the remaining bytes from the buffer;
only switch back to selecting for reading once you have written all the
data available.
}
else if( key.isAcceptable() )
{
// Get channel
ServerSocketChannel keyChannel =
(ServerSocketChannel)key.channel();

// accept incoming connection
SocketChannel clientChannel = keyChannel.accept();

// create a client context
ByteBuffer buffer = ByteBuffer.allocateDirect( 1024
);

Have you read the API docs' recommendations about direct vs. non-direct
buffers? In particular their warning that allocating a direct buffer
takes longer, and their recommendation that direct buffers only be used
for large, long-lived buffers and that they only be used when they yield
a measurable performance gain?
// register it in the selector
clientChannel.configureBlocking(false);
clientChannel.register( selector,
SelectionKey.OP_READ, buffer );

Unlike some of the above, this a new channel registration, so okay.
}
else
{
System.err.println("Ooops");
}
}
}
}
}


John Bollinger
(e-mail address removed)
 
C

Cyrille \cns\ Szymanski

I´m not sure that NIO was written to outperform the classic IO
(specific Socket).

The classic blocking socket scheme does not scale well and this is why
writing a powerful server in Java wasn't reasonable. I thought that NIO
had been written to solve this problem.

In a multiple threaded program you have to make sure that you are the
only one writing to that file in a
single threaded program you just write you´re data (since you are
already sure you are the only thread writing data to the disk at that
moment).

I've written servers in which only one thread at a time handles a client.

I have the program spawn N "worker threads" (typically N=2*CPU) which
enter a sleeping state. Handles (sockets, files, memory...) are
registered with a queue and when something happens on one of the handles
(the queue for that handle isn't empty), the operating system awakens one
of the worker threads which handles the event.

If a resource has to be shared within several threads (for instance you
wish to count bytes sent/recv) then the thread posts its job to the queue
associated with the resource and asynchronously waits for it to complete.

Unfortunately a single threaded program has some disadvantages as well:
if one client sends erronous data and causes
the thread to go into a locked state then this means all other client
handling is blocked as well.

Right. So are dead threads in a MT program a vulnerability, and for this
reason I happen to think that single threaded models are better because
you can't go away with that sort of problem.


You say that the accept method is slow and you´ve probably expected that
NIO would solve this. Unfortunately
you still have to call the accept method although you are sure (by
using the selector) it will not block, it still has to initialize the
socket structure (which I think takes some time) and since the program
is single threaded all your clients have to wait.

Then Java lacks an asynchronous accept() method.

No.....
I don´t understand why you want to use a combination of a Selector and
also use multiple Threads. In a multiprocessor environment a multi
threaded program will almost always outperform a single threaded
program (depending on the design of the programs and on the programs
algorithm).

On multiprocessor architectures the 1 thread per client model doesn't
scale well either. Even though the maximum number of clients is higher,
it is still too small.

On a 4 CPU machine, I'd typically want to have 8 threads processing IO
requests. If I use a single threaded progam, the thread would only run on
one CPU at a time which does not take advantage of the 3 other CPUs.

you could use a selector in each thread where you handle mulitple
clients. This you should only do if you have a very
large number of clients connecting and a small number of CPUs.

You mean if have N threads and M clients, I'd give M/N clients to each
thread to handle ? That doesn't solve the accept issue (which can be only
done by one thread) and I'd rather have N threads handling M clients.


Thanks for your helpful thoughts.
 
C

Cyrille \cns\ Szymanski

I'm benchmarking several server io strategies and for that purpose
Good move. Test, don't assume.

My goal is to write the best ECHO server for various platforms (Java,
win32, .NET...) I can as long as the code remains simple and assume that
fine tuning it (which I will not) will improve performance by, say 10% on
each platform. This should be a good starting point for comparisons.

More importantly, however, you should consider whether your test
scenario is a good model for the application you plan. Slow accepts
are a problem only if accepting new connections is expected to be a
significant part of your service's work, which might not be the case.

Since I am planning a HTTP proxy server I think it is reasonable to
assume that connections will not last long specially with lossy web
clients.

Per the API docs, Selectors are thread-safe but their various key sets
are not. I'm not sure what you would expect the behavior to be with
multiple threads selecting on the same selector concurrently, in any
case.

In fact I think I've mistaken NIO with Microsoft's IO Completion Ports
(IOCP). The selector is nothing more than the Java implementation of
Berkeley's socket select().

If you are not aware of what IOCP is, here is a brief explanation :

The idea is to spawn N threads (typically N=2*CPU) that will process IO
requests. The programmer then registers the handles he wishes to use with
the iocp.

The worker threads wait for the IOCP to wake them up when an io operation
completes on one of those handles so it can process the received data,
then issue another asynchronous io request and re-enter sleeping state.


Typically this is how things happen with a typical echo server :

The listening socket is registered with the IOCP and a (asynchronous)
call to accept is made, then the thread sleeps. When a connection is
established and the accept finishes, the thread wakes up (it can have
handled other io requests in the meantime), it finds out that an accept
has finished (context information is associated with the asynchronous
call) and typically issues an (asynchronous) read request.

When the read request completes, the thread wakes up, finds out that a
read has finished and issues a send request on the received buffer.

When the send completes, either all data has been sent in which case a
new read is done, either there is still data to send in which case a new
send is done.


The good thing about IOCP is that every lengthy operation (accept,
connect, read, write...) is overlapped. I believe that socket acceptance
is time consuming because a new socket descriptor has to be allocated (I
bet most of the time is spent in thread synchrinosation calls to ensure
the socket implementation is thread safe) and SYN ACK packets have to be
sent. Thus it is time consuming and not cpu consuming which makes it a
good candidate for overlapped operation.


My requirements are simple : I do not want 1 thread per client as this
does not scale well (exit classical io) and I need several threads to
handle io requests to take advantage of multiprocessor machines.

I wonder if those requirements are comatible with NIO... since they are
not compatible with select()...

If you have a seperate thread in which you intend to perform the
registration then you should be able to do that without fear, but it
is not clear to me whether the registration would block, or whether
the new channel would be eligible for selection during the current
invocation of select(). (My guesses would be yes, it would block, and
no, it wouldn't be immediately eligible.)

The threads that perform Channel registrations also call select(). But as
long as the others do not cycle there will only be one thread able to
process the newly registered channels.

Besides your guesses seems to be correct.

The thread per client approach is tried and true. I wouldn't give up
on the selection approach just yet, however. As long as you are
looking into this sort of thing, it's worthwhile to try to tune your
code a bit to get the best performance out of each technique. The
selector variation is harder to get right (in other languages too).

I'm a strong believer in the Selector approach. However i'd rather have
implemented "completion" selects (as it is done in IOCP) because it makes
MT programs easier to write.


The approach this thread made me think of is having one thread loop in
selects() and dispatch work to idle worker threads of a thread pool. I
thought that the JVM would do the dispatching for me if I had several
thread waiting on select() but it doesn't seem to be the case.

public class javaenh
{
public static void main(String args[]) throws Exception
{
// incoming connection channel
ServerSocketChannel channel = ServerSocketChannel.open();
channel.configureBlocking(false);
channel.socket().bind( new InetSocketAddress( 1234 ) );

// Register interest in when connection
Selector selector = Selector.open();
channel.register( selector, SelectionKey.OP_ACCEPT );

Looks good so far....
System.out.println( "Ready" );
// Wait for something of interest to happen
while( selector.select()>0 )
{

This while condition is fine for testing, but is probably not what you
would want to use in a real app. The select() method will return zero
if the Selector's wakeUp() method is invoked or if the thread in which
select() is blocking is interrupted (from another thread in either
case) without any selectable channels being ready.

Great. There is a way to wake up the selector without io operation being
triggered.
This is fine here, but would be buggy if the Selector were
concurrently accessed by multiple threads as you proposed doing. It
does appear that this is necessary to indicate that you have handled
the operation that was selected for.


Setting the local buffer variable to null is pointless. The Buffer
will remain reachable (and thus not be deallocated or GC'd) at least
until the SelectionKey with which it is associated becomes
unreachable. If you wanted to reuse the buffer (via a buffer pool,
for instance) then you would want to disassociate it from the key and
return it to the pool here, but probably you can just forget about it.

Ok. I wanted the buffer to be marked for GC but indeed it is still
referenced by the SelectionKey.
This is buggy. The channel is in non-blocking mode, so you are not
assured that all the available data (or even any of it) will be
written during this invocation of write().

I want this write operation to be overlapped. What I wish is to be
notified when the write operation completes and how much data has been
sent.
This is suboptimal. Rather than register the channel again, you
should be changing the key's interest set. The same buffer will even
remain associated. Moreover, if you have successfully written all the
buffer contents then you don't need to select for writing at all, just
again for reading.

If I get it right, I'd rather write
keyChannel.keyFor().interestOps( SelectionKey.OP_WRITE );
I need to be notified when the previous write operation completes.

As above, this is suboptimal -- just change the interest set. Before
doing so, however, attempt to write the remaining bytes from the
buffer; only switch back to selecting for reading once you have
written all the data available.

if( buffer.length()>0 ) {
keyChannel.write();
} else {
keyChannel.keyFor().interestOps( SelectionKey.OP_READ );
}

Have you read the API docs' recommendations about direct vs.
non-direct buffers? In particular their warning that allocating a
direct buffer takes longer, and their recommendation that direct
buffers only be used for large, long-lived buffers and that they only
be used when they yield a measurable performance gain?

Ok. I was not aware of that issue.
Unlike some of the above, this a new channel registration, so okay.



John Bollinger
(e-mail address removed)

John, thanks for your helpful advice.
 
D

Douwe

Cyrille \"cns\" Szymanski said:
The classic blocking socket scheme does not scale well and this is why
writing a powerful server in Java wasn't reasonable. I thought that NIO
had been written to solve this problem.

Dont know exactly what you mean with scaling but as far as I know
Swing is largely based on AWT and therefor you could do the same
things with AWT as you can with Swing
only one writing to that file in a

I've written servers in which only one thread at a time handles a client.

I have the program spawn N "worker threads" (typically N=2*CPU) which
enter a sleeping state. Handles (sockets, files, memory...) are
registered with a queue and when something happens on one of the handles
(the queue for that handle isn't empty), the operating system awakens one
of the worker threads which handles the event.

If a resource has to be shared within several threads (for instance you
wish to count bytes sent/recv) then the thread posts its job to the queue
associated with the resource and asynchronously waits for it to complete.

question is why you then created multiple threads ... if their is only
one queue that is dispatching the enlisted information one by one to
the different Threads you could better implement a single Thread (IMO
this is just overkill)
if one client sends erronous data and causes

Right. So are dead threads in a MT program a vulnerability, and for this
reason I happen to think that single threaded models are better because
you can't go away with that sort of problem.

Depends on what you mean with dead ... a dead thread could be a thread
that just waits for data which will NEVER arive, a real dead thread is
a thread that can not be reached at all anymore. A thread waiting on
data can be interrupted (if Thread.interupt() does not work a close
socket will work) and therefor a cleaner could remove that kind of
'dead' threads . In a single Thread you can not do so.
You say that the accept method is slow and you´ve probably expected that
NIO would solve this. Unfortunately

Then Java lacks an asynchronous accept() method.

Would an asynchronous accept help to speed up the initialisation
process ?? If you can answer thiw with no then an asynchronous accept
doesn´t bring much.
On multiprocessor architectures the 1 thread per client model doesn't
scale well either. Even though the maximum number of clients is higher,
it is still too small.

On a 4 CPU machine, I'd typically want to have 8 threads processing IO
requests. If I use a single threaded progam, the thread would only run on
one CPU at a time which does not take advantage of the 3 other CPUs.


clients. This you should only do if you have a very

You mean if have N threads and M clients, I'd give M/N clients to each
thread to handle ? That doesn't solve the accept issue (which can be only
done by one thread) and I'd rather have N threads handling M clients.

That indeed doesn´t solve the accept issue ... as far as I can see you
don't need to solve the slow accept() initilzing .. all you need to
solve is that the slow accept is not interfering with the other
clients that are being handled. But using a single thread you can not
solve this problem. And if you use one Selector handling all
connections then you should handle acceptance of connections in
another Thread.
 
C

Cyrille \cns\ Szymanski

The classic blocking socket scheme does not scale well and this is
Dont know exactly what you mean with scaling

Quoting webopedia : "A popular buzzword that refers to how well a
hardware or software system can adapt to increased demands."

This has something to do with the asymptotic behaviour of functions as
well (response time = f(nb flients) ). Typically a system which responds
in o(n^2) where n is the number of clients isn't scalable while one that
responds in o(n) is scalable.

In a nutshell the idea is that an increasing number of clients will slow
down the server but not overwhelm it.

For instance, with less than 50 clients a 1-thread-per-client
server and a iocp server give almost the same results, with about 2000
clients the 1-thread-per-client is overwhelmed (it does not respond
anymore) whereas the iocp server still works.

question is why you then created multiple threads ... if their is only
one queue that is dispatching the enlisted information one by one to
the different Threads you could better implement a single Thread (IMO
this is just overkill)

The fact is that worker threads take more time to complete than the
dispatcher thread to cycle because for example they have to parse a HTTP
request when it's sent. And it's automatically done by the operating
system under windows (IOCP server model).

This method has been tried and tested and in multiprocessor environments
it has been proven to yield a significant performance gain.


It is my goal to compare different io strategies and if you're right, the
benchmarks should show it.


Would an asynchronous accept help to speed up the initialisation
process ?? If you can answer thiw with no then an asynchronous accept
doesn´t bring much.

I bet it will since the accept() operation is time consuming but not cpu
consuming it will allow the system to do something in the meantime.

As I explained in another post, upon accpetance the operating system
has to allocate a new socket descriptor (which involves thread
synchronization with the socket subsystem) and perhaps send SYN/ACK
packets which leaves the cpu with many cyles to spare.

Again, it is my goal to see whether or not this would lead to a gain in
performance.

That indeed doesn´t solve the accept issue ... as far as I can see you
don't need to solve the slow accept() initilzing .. all you need to
solve is that the slow accept is not interfering with the other
clients that are being handled.

.... and with other clients being accepted.
But using a single thread you can not solve this problem.

It is not my goal to use only one thread but an arbitrary number of
threads that I can change at will to take advantage of multiprocessor
architectures.
 
D

Douwe

The classic blocking socket scheme does not scale well and this is
Quoting webopedia : "A popular buzzword that refers to how well a
hardware or software system can adapt to increased demands."

This has something to do with the asymptotic behaviour of functions as
well (response time = f(nb flients) ). Typically a system which responds
in o(n^2) where n is the number of clients isn't scalable while one that
responds in o(n) is scalable.

In a nutshell the idea is that an increasing number of clients will slow
down the server but not overwhelm it.

For instance, with less than 50 clients a 1-thread-per-client
server and a iocp server give almost the same results, with about 2000
clients the 1-thread-per-client is overwhelmed (it does not respond
anymore) whereas the iocp server still works.

Thanks for your fine definition ... :)
The fact is that worker threads take more time to complete than the
dispatcher thread to cycle because for example they have to parse a HTTP
request when it's sent. And it's automatically done by the operating
system under windows (IOCP server model).

This method has been tried and tested and in multiprocessor environments
it has been proven to yield a significant performance gain.


It is my goal to compare different io strategies and if you're right, the
benchmarks should show it.

Could be that I´ve misunderstood you ... I thought you had built an
queue-thread that dispatches its actions to different worker threads
one by one waiting for each seperate worker thread (and so generating
a sequential program with multiple threads) ... I thought so because
you wrote
client.



I bet it will since the accept() operation is time consuming but not cpu
consuming it will allow the system to do something in the meantime.

As I explained in another post, upon accpetance the operating system
has to allocate a new socket descriptor (which involves thread
synchronization with the socket subsystem) and perhaps send SYN/ACK
packets which leaves the cpu with many cyles to spare.

Again, it is my goal to see whether or not this would lead to a gain in
performance.

Asuming the accept is slow caused by the reasons you described above.
If the new socket has to be synchronized with the sub-system the
Thread handling this will do it calls to the socket subsystem ... go
into a WAITING state ... [subsystem sends SYN and waits for ACK and
maybe does other stuff].... WAKING up again (being signaled by the
subsystem) .. and return from the accept method. As far as I can see
two threads are involved here were the "outer" thread is a Java Thread
and the inner Thread is a system thread (owned by the JVM). The outer
Thread is going into a WAITING state and will not use any CPU cycles.
The inner Thread will (in most cases) go into a WAITING state as well
as soon as it has sent the SYN to the IO-Device/Networkcard and it
will wake up as soon as the IO-Device has new data. The second
(System-)Thread therefor doesn´t consume much time either while being
asleep. This is the situation if the non-asynchronized accept() is
used.

In the asynchronized way their is not much different ... the Selector
could be seen as the first thread ... the second thread stays more or
less the same ... but now the first thread can be notified by multiple
events from different Threads. I even could imaging that after being
notified by one Thread the Selector waits for a few milliseconds in
the hope more threads will notify it (but therefor I would have to
look into the implementation).

In both situations you have the system-thread that does the actual
work, this can not be changed/improved. The first situation is using
an older implementation for IO then the second one but it does not
waste CPU cycles. I can´t tell you which of the two implementations
will be faster that is just trial and error.
... and with other clients being accepted.

Not sure what you mean but I could be that if two clients are being
accepted at the same moment the Socket implementation will handle the
accepts sequentially ... but this is IMO OS dependent and has nothing
to do with the Java Socket API and neither with the accept being
asynchronized or not.
 
J

John C. Bollinger

Cyrille said:
Since I am planning a HTTP proxy server I think it is reasonable to
assume that connections will not last long specially with lossy web
clients.

Well, for some clients connections might not last long, but they will in
general last longer than for a locally-generated echo request /
response, even for ill-behaved clients. Much longer in many cases.
In fact I think I've mistaken NIO with Microsoft's IO Completion Ports
(IOCP). The selector is nothing more than the Java implementation of
Berkeley's socket select().
Yes.

If you are not aware of what IOCP is, here is a brief explanation :

The idea is to spawn N threads (typically N=2*CPU) that will process IO
requests. The programmer then registers the handles he wishes to use with
the iocp.

The worker threads wait for the IOCP to wake them up when an io operation
completes on one of those handles so it can process the received data,
then issue another asynchronous io request and re-enter sleeping state.

I was not aware. One could certainly build an equivalent in Java,
presumably on top of NIO, based on one thread to deal directly with the
Selector and an associated thread pool to handle the actual operations.
(Along the general lines you mentioned yourself.)

[...]
The good thing about IOCP is that every lengthy operation (accept,
connect, read, write...) is overlapped. I believe that socket acceptance
is time consuming because a new socket descriptor has to be allocated (I
bet most of the time is spent in thread synchrinosation calls to ensure
the socket implementation is thread safe) and SYN ACK packets have to be
sent. Thus it is time consuming and not cpu consuming which makes it a
good candidate for overlapped operation.

Having done a little socket programming in C, but not claiming to be
expert, I don't see how you could overlap two accepts on the same
listening socket. Don't you have to accept connections serially, even
at a low level? I guess it's a function of the TCP stack; do some
stacks allow concurrent accepts on the same socket?
My requirements are simple : I do not want 1 thread per client as this
does not scale well (exit classical io) and I need several threads to
handle io requests to take advantage of multiprocessor machines.

I wonder if those requirements are comatible with NIO... since they are
not compatible with select()...

I think so. Here's the scheme, based on your idea about dispatching
work to a thread pool:
() One thread manages the Selector, much as you already have.
() When it detects one or more ready IO operations, it iterates through
the selected keys and assigns the appropriate IO operation on the
associated Channel to a thread from a thread pool, after first clearing
the key's interest ops.
() After processing the whole list, the selector thread invokes a new
select().

() The threads from your pool, upon being awakend and assigned a new
SelectionKey, retrieve the channel, perform as much of the required
operation as they can without blocking, set the appropriate interest
operations on the key, and then wakeup() the Selector before going back
to sleep. (The wakeup is essential to make the Selector notice the
change in the key's interest operations.)
() After a read, as much of the data read as possible should be
written; if that's all of it then the new interest set is OP_READ;
otherwise it is OP_WRITE.
() Remember to close() the _channel_ (which will also implicitly cancel
all associated selection keys) when closure of the remote side is
detected. It is not clear to me from the API docs whether closing the
underlying Socket causes the channel to be closed (or the reverse).

() The selector thread must be prepared for the possibility that no
selection keys are ready when select() returns, but that shouldn't be hard.
The threads that perform Channel registrations also call select(). But as
long as the others do not cycle there will only be one thread able to
process the newly registered channels.

Besides your guesses seems to be correct.

I don't think it necessary for multiple channels to call select(), as
long as you wakeup() the Selector at appropriate points. You might need
to apply a bit of synchronization (for instance, so that the Selector
doesn't go back into select() too soon) but I think it could be worked
out. Rather than synchronizing on the Selector itself you might want to
create a simple mutex.
I'm a strong believer in the Selector approach. However i'd rather have
implemented "completion" selects (as it is done in IOCP) because it makes
MT programs easier to write.

In other words, IOCP already provides a packaged equivalent to the
approach I describe above? Or is there something I missed that it does
and the above doesn't?
Great. There is a way to wake up the selector without io operation being
triggered.

Many people would consider that a good thing. For instance, it makes it
easier to cleanly shut down. It also makes it possible to make the
Selector take notice of changes to its keys' interest op sets, a
facility that my suggested approach makes use of.
I want this write operation to be overlapped. What I wish is to be
notified when the write operation completes and how much data has been
sent.

The operation will not block on I/O. When it returns you can tell
whether or not more remains to write by checking buffer.remaining().
If I get it right, I'd rather write
keyChannel.keyFor().interestOps( SelectionKey.OP_WRITE );
I need to be notified when the previous write operation completes.

For the single-threaded approach you want, right after the
keyChannel.write() above,

if (buffer.remaining() > 0) {
key.interestOps(SelectionKey.OP_WRITE);
}

[...]
if( buffer.length()>0 ) {
keyChannel.write();
} else {
keyChannel.keyFor().interestOps( SelectionKey.OP_READ );
}

Make that

if (buffer.remaining() > 0) {
keyChannel.write(buffer);
} else {
// You already have the key; no need to look it up
key.interestOps(SelectionKey.OP_READ);
}

As described above, you could attempt to perform this in a seperate
thread. In fact, I think you safely could do so as long as you clear
the key's interest set before submitting the accept to another thread.
I don't think you can overlap multiple accepts, but I'm prepared to be
shown wrong.

But if the registration would block on completion of the Selector's
current select() then you need to wakeup() the Selector first, and make
sure it doesn't go back into select() until the registration is done.


John Bollinger
(e-mail address removed)
 
D

Douwe

Don´t know if it helps but fur ANSI C their is a real good simple http
server that also uses a Selector. Though it is the C version of the
sockets implementation and you need to be able to read it, I think it
acts prety much the same as the implementation in Java.

http://www.acme.com/software/thttpd/
 
E

Esmond Pitt

John said:
Having done a little socket programming in C, but not claiming to be
expert, I don't see how you could overlap two accepts on the same
listening socket. Don't you have to accept connections serially, even
at a low level? I guess it's a function of the TCP stack; do some
stacks allow concurrent accepts on the same socket?

Java serializes accepts via synchronization, so yes you can do this.
 
J

John C. Bollinger

Esmond said:
Java serializes accepts via synchronization, so yes you can do this.

If Java serializes the accepts then that is specifically contrary to the
behavior I was asking about (although consistent with the way I thought
things needed to work). You can have multiple threads blocking on
accept() on the same socket, but you cannot have multiple threads
concurrently executing accept on the same socket. I was wondering
whether there were any environments wherein one could successfully and
usefully have _concurrent_ accepts on one listening socket. If accept
is not thread-safe at the level of the TCP stack then the answer is
effectively no.


John Bollinger
(e-mail address removed)
 
E

Esmond Pitt

John said:
You can have multiple threads blocking on
accept() on the same socket, but you cannot have multiple threads
concurrently executing accept on the same socket. I was wondering
whether there were any environments wherein one could successfully and
usefully have _concurrent_ accepts on one listening socket. If accept
is not thread-safe at the level of the TCP stack then the answer is
effectively no.

Isn't this a distinction without a difference? I'm not at all clear
about the effective difference between these two conditions. Connections
(accepted sockets) will be fed to the callers of accept() one at a time
in either case: while there aren't any connections to accept you would
expect either to block in accept() or to get a null return from it in
non-blocking mode, and both of these occur in Java as expected. In other
words I don't see what 'usefully' means above.
 
C

Cyrille \cns\ Szymanski

Don´t know if it helps but fur ANSI C their is a real good simple http
server that also uses a Selector. Though it is the C version of the
sockets implementation and you need to be able to read it, I think it
acts prety much the same as the implementation in Java.

http://www.acme.com/software/thttpd/

Thank you for that precious link. The code is sometimes hard to follow
because of its cross platform nature but I managed to understand the key
parts. AFAIK this server could easily be implemented in Java.

However the IO model has a tiny nitpick : it doesn't take advantage of
multithreaded environments since it isn't multi threaded. You seem to
believe that this isn't very important since the OS (or VM) spawns separate
threads to handle io transfers.

Is the Java Selector class limited to a certain amount of registered
channels like the native select() often is ? Haw does the JVM address such
a problem ? FYI thttpd limits the number of concurrent clients.
 
C

Cyrille \cns\ Szymanski

Then Java lacks an asynchronous accept() method.

[...]
Asuming the accept is slow caused by the reasons you described above.
If the new socket has to be synchronized with the sub-system the
Thread handling this will do it calls to the socket subsystem ... go
into a WAITING state ... [subsystem sends SYN and waits for ACK and
maybe does other stuff].... WAKING up again (being signaled by the
subsystem) .. and return from the accept method. As far as I can see
two threads are involved here were the "outer" thread is a Java Thread
and the inner Thread is a system thread (owned by the JVM). The outer
Thread is going into a WAITING state and will not use any CPU cycles.
The inner Thread will (in most cases) go into a WAITING state as well
as soon as it has sent the SYN to the IO-Device/Networkcard and it
will wake up as soon as the IO-Device has new data. The second
(System-)Thread therefor doesn´t consume much time either while being
asleep. This is the situation if the non-asynchronized accept() is
used.

In the asynchronized way their is not much different ... the Selector
could be seen as the first thread ... the second thread stays more or
less the same ... but now the first thread can be notified by multiple
events from different Threads. I even could imaging that after being
notified by one Thread the Selector waits for a few milliseconds in
the hope more threads will notify it (but therefor I would have to
look into the implementation).

In both situations you have the system-thread that does the actual
work, this can not be changed/improved. The first situation is using
an older implementation for IO then the second one but it does not
waste CPU cycles. I can´t tell you which of the two implementations
will be faster that is just trial and error.

I guess the problem can be summarized as follows :

The accept() operation takes a long time to complete, so it must be
either handled in a separate thread, or there should exist an
asynchronous version of the function.

The server must be capable of handling many clients. 1 thread per client
is not reasonable so there must be a limited number of worker threads.
Since some operations involve synchronization it is best to have at least
two worker threads (so one thread can do number crunching while another
is in sleeping mode).
 
C

Cyrille \cns\ Szymanski

I think so. Here's the scheme, based on your idea about dispatching
work to a thread pool:
() One thread manages the Selector, much as you already have.
() When it detects one or more ready IO operations, it iterates
through
the selected keys and assigns the appropriate IO operation on the
associated Channel to a thread from a thread pool, after first
clearing the key's interest ops.
() After processing the whole list, the selector thread invokes a
new
select().

Seems good to me so far.

() The threads from your pool, upon being awakend and assigned a new
SelectionKey, retrieve the channel, perform as much of the required
operation as they can without blocking, set the appropriate interest
operations on the key, and then wakeup() the Selector before going
back to sleep. (The wakeup is essential to make the Selector notice
the change in the key's interest operations.)

Won't the worker thread block on attempt to set the interest ops ? I
guess so.

() After a read, as much of the data read as possible should be
written; if that's all of it then the new interest set is OP_READ;
otherwise it is OP_WRITE.
() Remember to close() the _channel_ (which will also implicitly
cancel
all associated selection keys) when closure of the remote side is
detected. It is not clear to me from the API docs whether closing the
underlying Socket causes the channel to be closed (or the reverse).

() The selector thread must be prepared for the possibility that no
selection keys are ready when select() returns, but that shouldn't be
hard.
Fine



In other words, IOCP already provides a packaged equivalent to the
approach I describe above? Or is there something I missed that it
does and the above doesn't?

IOCP is almost equivalent to the approach described above. The main
difference is that the selector equivalent of IOCP notifies when in IO
operation completes. Therefore you call io functions (which return
immediately) before the selector gives its notification.

With this model, accepts can be treated in the same manner as reads and
writes :
* create a "client" SOCKET
* call accept("client") which returns immediately
* wait for the selector to notify completion (thread wakes up)
* when it does, "client" is connected to the remote endpoint
* then say, send a "hello message", which returns immediately
* wait for the selector to notify completion (thread wakes up)
* when it does, some bytes have been sent, or there has been an IO error.
* etc.


Another approach I'll want to try is using IOCP with JNI.

But if the registration would block on completion of the Selector's
current select() then you need to wakeup() the Selector first, and
make sure it doesn't go back into select() until the registration is
done.

Got it.


Thanks a lot.
 
D

Douwe

Thank you for that precious link. The code is sometimes hard to follow
because of its cross platform nature but I managed to understand the key
parts. AFAIK this server could easily be implemented in Java.

However the IO model has a tiny nitpick : it doesn't take advantage of
multithreaded environments since it isn't multi threaded. You seem to
believe that this isn't very important since the OS (or VM) spawns separate
threads to handle io transfers.

Absolutely true .. and I wouldn´t recommend you to use the same
design/model in your program (single threaded). It just is a good
example of how the Selector works and also a good example how the life
of a programmer gets easier by decreasing the number of threads
(preferably to one)
Is the Java Selector class limited to a certain amount of registered
channels like the native select() often is ? Haw does the JVM address such
a problem ? FYI thttpd limits the number of concurrent clients.


Think the Java version is just a class wrapped around the original
Selector stuff (in case of Linux). These are details of which I
(unfortunately) have no knowledge.
 
C

Cyrille \cns\ Szymanski

However the IO model has a tiny nitpick : it doesn't take advantage
Absolutely true .. and I wouldn´t recommend you to use the same
design/model in your program (single threaded). It just is a good
example of how the Selector works and also a good example how the life
of a programmer gets easier by decreasing the number of threads
(preferably to one)

I've been reading the book "Java NIO" from O'Reilly and they implement such
a multithreaded server. Chapter 4 from the book deals with this matter and
happens to be available online (source code of the examples as well).

http://www.oreilly.com/catalog/javanio/index.html

The very last section is really interesting, however it is a pity that the
example code given fails to prove what the author describes.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,962
Messages
2,570,134
Members
46,692
Latest member
JenniferTi

Latest Threads

Top