Simulating smaller MTU? ie sending small packets.

E

Ed W

Hi, for various reasons I'm writing a little stress test app which tries
to simulate the effects of varying sized TCP packets on the overall
transfer speed.

So I have written a little app which acts as a server, waits for a
connection and then spews data in fixed sized chunks of your choice. I
also turn off nagle, turn on autoflush, and as far as I can tell ask for
the data to go out immediately

What I observe (using an ethernet dump) is that once the receiver is not
keeping up with the speed the sender is spewing packets, the *sender*
(which in this case is linux 2.6.12) is starting to coallesce the packets

So for example if I ask it to send 1000 byte packets I can see from the
network trace that it starts to send lots of MTU sized packets instead
(larger).

This is not what I was expecting at all, in fact I had no idea that
there was some clever process in linux to coallesce small network
packets? Am I tripping over some perl buffering instead? Any thoughts
on where to look?

Note, that it's not a mis-measurement problem at the receiving side. A
Network trace is showing me that the packets are coming out at MTU sized
(in general, but with a smattering of packets the size I requested).

If I slow down the sending rate, or speedup the receiver then the
packets go through at the correct size...

Grateful for any help trying to work around this

Ed W
 
E

Ed W

The only work-around I can see is make sure that the readers operate
faster than the writers.

I have nearly found a workaround. If I use the following:

setsockopt($sock, &Socket::IPPROTO_TCP, &Socket::TCP_MAXSEG, 500);

Then I can change the size of the MSS for the connection (which is the
effect I'm basically after).

The problem is that this only seems to be working if I use it on the
listening socket before I accept any connections. It doesn't seem to
work if I call it on the accepted connection

Looking at the C docs however, suggests that this *ought* to work at any
time... Likewise attempts to turn on and off TCP_CORK (for fun) aren't
working on the open connection and this is definitely supposed to be
possible

Any ideas what I am missing?

Thanks

Ed W
 
T

Tassilo v. Parseval

Also sprach Ed W:
Hi, for various reasons I'm writing a little stress test app which tries
to simulate the effects of varying sized TCP packets on the overall
transfer speed.

This is probably a moot venture. The smaller the packets are, the lower
the overall throughput is going to be. This is due to the fact that TCP
packet have to be acknowledged. If no piggy-bagging is used (which is
the case for a pure-receiver side), then an additional 40 bytes (IP +
TCP minimum header size) need to be sent out on each acknowledgement.

The number of ACKs sent depend on the window size. Normally a receiver
tries to minimize ACKs and window size update messages (Clark).
So I have written a little app which acts as a server, waits for a
connection and then spews data in fixed sized chunks of your choice. I
also turn off nagle, turn on autoflush, and as far as I can tell ask for
the data to go out immediately

What I observe (using an ethernet dump) is that once the receiver is not
keeping up with the speed the sender is spewing packets, the *sender*
(which in this case is linux 2.6.12) is starting to coallesce the packets

So for example if I ask it to send 1000 byte packets I can see from the
network trace that it starts to send lots of MTU sized packets instead
(larger).

But it's probably going to send these larger packets at a lower rate.
Did you also check the ACK packets from the receiver? The Nagle
algorithm tells the sender never to send small packets. Turning it off
means sending them immediately as long as the receiver's side can keep
up. Now, if the receiver is congested, I would assume that the sender
still buffers small packets and once an ACK packet arrives it sends out
as many data as there is space in the receiving window.
This is not what I was expecting at all, in fact I had no idea that
there was some clever process in linux to coallesce small network
packets? Am I tripping over some perl buffering instead? Any thoughts
on where to look?

No, you're tripping over a sane implementation of the TCP stack. TCP by
nature is slow and has some overhead which is reduced by various means,
most notably the Nagle (sender) and Clark (receiver) algorithms.
Furthermore, in order to avoid clogging the subnet between sender and
receiver, congestion control is carried out (see TCP slow start
algorithm).
Note, that it's not a mis-measurement problem at the receiving side. A
Network trace is showing me that the packets are coming out at MTU sized
(in general, but with a smattering of packets the size I requested).

If I slow down the sending rate, or speedup the receiver then the
packets go through at the correct size...

In order to do your measurements, you should probably adjust parameters
on the receiving side. If you want smaller packets, try to set the
window size (TCP_WINDOW_CLAMP, I think). TCP_MAX_SEG also needs to be
set there as the MSS is announced by the receiver during the
three-way-handshake when the connection is established.

Tassilo
 
E

Ed W

This is probably a moot venture. The smaller the packets are, the lower
the overall throughput is going to be. This is due to the fact that TCP
packet have to be acknowledged.

Be careful with your generalisation. The point of my experiment is to
test an unreliable (and very slow) satellite network to determine
whether faster speed would be achieved using smaller MTU due to less
retranmissions. 1500 bytes represents up to 7 seconds of transmission
time...
In order to do your measurements, you should probably adjust parameters
on the receiving side. If you want smaller packets, try to set the
window size (TCP_WINDOW_CLAMP, I think). TCP_MAX_SEG also needs to be
set there as the MSS is announced by the receiver during the
three-way-handshake when the connection is established.

I'm not sure I can see how window size affects things, but it's
interesting to see that I can influence it on a per connection basis?

I'm trying to change TCP_MAX_SEG and the docs imply it can be changed
once the connection is established, but at least using perl this doesn't
appear to work.

If I change it on a listening socket then I observe that the subsequent
tcp handshake uses the original max values, but that TCP then uses the
smaller values for sending data (ie it does what I expect). It would
just be useful to be able to change the MSS while the connection is
operating

It might for example be useful to change the MSS if we observe more
corrupted tcp packets arriving, or other similar algorithm.


Also, is it possible to observe how full the network buffers are?
getsockopt(xxx)? Again, it might be useful to observe this value in the
situation above and slow down sending when the buffers are filling up
(for example with these huge latencies I might want to have more control
over the amount of outstanding data)

Any thoughts?

Thanks

Ed W
 
T

Tassilo v. Parseval

Also sprach Ed W:
Be careful with your generalisation. The point of my experiment is to
test an unreliable (and very slow) satellite network to determine
whether faster speed would be achieved using smaller MTU due to less
retranmissions. 1500 bytes represents up to 7 seconds of transmission
time...

Ah, there we are! TCP design was heavily based on the assumption of
using wires on the physical layer (which means: the medium is fairly
reliable). I made the same assumption. :)

TCP over a wireless link is a completely different story. The current
design still works but it can be horrendously inefficient, especially
congestion control.
I'm not sure I can see how window size affects things, but it's
interesting to see that I can influence it on a per connection basis?

Yes, per connection. The window size is a parameter sent with each ACK
message and it refers to the currently remaining size of the receiving
buffer. This is normally handled by the TCP stack. It is not a global
parameter you set once and then it remains there. But from the
description in tcp(7) I assume that TCP_WINDOW_CLAMP is an upper bound
and no sent packet may ever exceed it.
I'm trying to change TCP_MAX_SEG and the docs imply it can be changed
once the connection is established, but at least using perl this doesn't
appear to work.

TCP_MAXSEG actually. To what value did you set it? If it exceeds the MTU
of your interface, the value is ignored. Also, the minimum size is, I
think, 556 which is a TCP requirement.
If I change it on a listening socket then I observe that the subsequent
tcp handshake uses the original max values, but that TCP then uses the
smaller values for sending data (ie it does what I expect). It would
just be useful to be able to change the MSS while the connection is
operating

As far as I know this is not in the specifications of TCP. The segment
size is agreed on during connection establishment (actually, both sides
may use different values for the MSS).
It might for example be useful to change the MSS if we observe more
corrupted tcp packets arriving, or other similar algorithm.

The only way to do that is to shutdown the connection and establish a
new one with the updated values for the MSS.
Also, is it possible to observe how full the network buffers are?
getsockopt(xxx)? Again, it might be useful to observe this value in the
situation above and slow down sending when the buffers are filling up
(for example with these huge latencies I might want to have more control
over the amount of outstanding data)

With a packet-sniffer, you certainly can. You find it in the window size
field of the TCP header that is sent back as acknowledgement from the
receiver. But as I said earlier, this is a dynamic value so I don't
think you can find that value on a per-socket basis. There is TCP_INFO
that returns some values. But the structure returned seems to have no
information on the last state of the receiver's window.

Since you are trying to optimize a TCP connection over a wireless link,
are you sure at all that reducing the packet size is a good idea? The
problem with wireless links is their lack of reliability (and latency).
When data get through corrupted (or not at all) a approach is to send
again immediately in the hope that it gets through this time. Also,
making packages smaller does not necessarily make the transmission more
reliable. Wireless is really just send-and-pray.

However, if you have problems with the buffer of the receiver filling up
too quickly, wouldn't that mean that the data got through beautifully?
If the receiver constantly has full buffers, it means the link is in
fact quite reliable and thus making it look similar to a wired link. In
this case you can just rely on the default behaviour of your TCP stacks
as they work well for reliable links.

Tassilo
 
A

Alan J. Flavell

Be careful with your generalisation. The point of my experiment is
to test an unreliable (and very slow) satellite network to determine
whether faster speed would be achieved using smaller MTU due to less
retranmissions. 1500 bytes represents up to 7 seconds of
transmission time...

Yes, but the acknowledgement doesn't have to be for every individual
packet. Check the "window" parameter. You may however need some very
large buffers if you hope to improve performance. One sees a similar
effect without the satellite, if trying to get good bulk data
throughtput on a transatlantic cable link: despite having at least
1Gbit/sec paths at all points between the hosts at each end, and the
hosts themselves being adequate to the purpose, the throughput looks
quite miserable unless some serious tuning of the TCP parameters is
done.

However, this would be better explored on a networking group, I think,
than right here on c.l.p.misc.

And google for * tcp tuning throughput * and similar combinations of
terms. Anything recent-ish which comes back with LBL.gov and/or
internet2 in the URL is likely to be worth a look.
I'm not sure I can see how window size affects things, but it's
interesting to see that I can influence it on a per connection
basis?

The acknowledgements are serial numbered as to which packets they
relate to, so you can be acknowledging a packet which was quite some
time back while you have all the intervening packets "up the spout" or
in transit, at least for the major part of the transfer (at the ends
of course it sorts itself out).

hope this helps.
 
E

Ed W

You may however need some very
large buffers if you hope to improve performance. One sees a similar
effect without the satellite, if trying to get good bulk data
throughtput on a transatlantic cable link: despite having at least
1Gbit/sec paths at all points between the hosts at each end, and the
hosts themselves being adequate to the purpose, the throughput looks
quite miserable unless some serious tuning of the TCP parameters is
done.

I think you overestimate the problem. I am using Iridium...

On a clear day with no clouds, the satellite overhead and a following
wind it can do 2400 baud...

Retransmissions are what I need to reduce

Ed W
 
E

Ed W

TCP_MAXSEG actually. To what value did you set it? If it exceeds the MTU
of your interface, the value is ignored. Also, the minimum size is, I
think, 556 which is a TCP requirement.

The docs say that you can use it to reduce the MSS over what was
negotiated at the link establishment. This seems to be the case I can
see the MTU being established at 1500 using a packet sniffer, but
setting TCP_MAXSEG then means packets go out at (say) 500 bytes

I haven't found an issue under Linux setting values from 300 bytes to
1400 bytes, so I don't know if there is any limit?

As far as I know this is not in the specifications of TCP. The segment
size is agreed on during connection establishment (actually, both sides
may use different values for the MSS).

It seems obvious though that there is no technical reason that we can't
agree on 1400 bytes being the largest we are allowed to send to the
remote and then sending 700 byte packets instead

My understanding is that this param can be changed at runtime. Also
there is TCP_CORK which is supposed to optimally pack data into packets
- again I can't seem to toggle this using the perl code
The only way to do that is to shutdown the connection and establish a
new one with the updated values for the MSS.

The docs imply not?

Since you are trying to optimize a TCP connection over a wireless link,
are you sure at all that reducing the packet size is a good idea? The
problem with wireless links is their lack of reliability (and latency).
When data get through corrupted (or not at all) a approach is to send
again immediately in the hope that it gets through this time. Also,
making packages smaller does not necessarily make the transmission more
reliable. Wireless is really just send-and-pray.

I observe several to a few dozen random errors per minute. If you do a
little quick maths I think you can easily see that packets taking 7 secs
each to send (and retransmit in case of error) are much less efficient
than say 1 second packets.

I have just done some testing and this is very much bourne out practice
with the smaller packets (I tested at 300, 500 and 600 bytes) suffer
only very small amounts of slowdown and are much more robust in the case
of retransmit. I think a quick model in Excel would also show that this
is the case?

However, if you have problems with the buffer of the receiver filling up
too quickly, wouldn't that mean that the data got through beautifully?
If the receiver constantly has full buffers, it means the link is in
fact quite reliable and thus making it look similar to a wired link.

Wrong buffer. I'm worried about the sending buffer at my end. I can't
see how the remote buffer would ever be anything but empty since that
would imply the application was sleeping on the job and not consuming
the transmitted input

In my case I was to keep the untransmitted data as small as possible.
Over my satellite system if I transmit 65Kb, and then later some urgent
data comes along it ends up on the tail of the other data. This means
that some minutes will pass by before I can even start to get the urgent
data out.

This is also compounded at the remote ISP end which might have several
megabyte buffers and be queuing up tons of data to squeeze down this
tiny pipe. Obviously I can't control any QOS settings at the remote end
because it's not under my control

Anyway, we are off track. What I really need to do now is figure out
how to control the sending packet size effectively. Variable MSS will
be a big help if it can be done?

Ed W
In
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,968
Messages
2,570,149
Members
46,695
Latest member
StanleyDri

Latest Threads

Top