On Tue, Sep 09, 2008 at 12:48:59PM -0400, Aaron Allen wrote:
> It's possible I could be missing something. I've attached a sample
> from the start of an upload to S3. You can see that the packets
> from Amazon are being scaled, but the packets from my server aren't.
First of all, both flows of this tcp stream are using tcp window scaling
see the tcp options in the SYN and the SYN/ACK packets. If you look
at frame 4, it looks like the window size is set fixed to 65536, but
it is actually set to 256 (=window scale factor) x 0x0100 (=256). In
frame 8 it is set to 256 x 0x00fe = 65024. So the framework for
enlarging the window size above 64k is set. However...
It looks like you are using TCP segmentation offloading. Your server
sends the packets in chunks of 1k or 2k. The latter of course being
larger than the advertised MSS of 1340 in frame 2. This means your
network card will do tcp segmentation, breaking up every 2k packet
in 2 smaller packets. This will result in a couple of full size
packets (containing 1340 bytes tcp payload each) and after that
a packet which is smaller.
>From the pattern of your ack's, it looks like your server has the
nagle algorythm enabled and the Amazon server has delayed acking
enabled with a wait time of 100ms. What I think happens is the
following, but that should be checked with a trace on a spanport
as tracing on your server will be done *before* the packet is
segmented by the NIC.
My theory is that the last packet is kept by the NIC because of
the nagle algorythm, which means it either waits for more data
to send so it can fill up a whole segment or it receives an ack
for the data that has been sent already.
On the Amazon-side, the tcp stack receives data and because of
delayed ACKing, it only ack's one in every two full-sized
segments. If it only receives one segment after the last acked
segment, it will wait until the delayed acking timer expires
(~100ms) and then acks the last received segment.
Now your servers nagle algorythm is happy again and starts sending
more data to the Amazon server until this little game between nagle
and delayed acking is played again.
Of course the application could send more data to the Amazon
server as the window size of the amazon server is not fully
utilized, but I suspect that maybe the buffer on your NIC
might be the bottleneck. I guess turning off TCP segmentation
offloading might be giving you much more throughput in this
case. But that is just a hunch. You could also try to see
if the newest driver for the card improves it's way to
handle this.
Hope this helps,
Cheers,
Sake