Hello,
I am running traffic between a Windows 2008 server and switch (via a router). While traffic will run perfectly for days, we occasionally see small delays on the network which bring down our software. Typically, we might see a couple of blockages of aprox 5 seconds in duration. We have only starting seeing these problems since we moved to Windows 2008. On Window 2003, everything worked perfectly. Now, I am not sure if this is a problem with the OS or perhaps some type of OS incompatibility issue with our hardware.
Here is a quick snippet of where things start to wrong in our logs.
16223 0.000456 172.18.100.18 172.18.100.15 TCP 49235 > ddi-tcp-1 [PSH, ACK] Seq=535013 Ack=4164690 Win=253 Len=646
16224 0.099948 172.18.100.15 172.18.100.18 TCP ddi-tcp-1 > 49235 [ACK] Seq=4164690 Ack=535659 Win=16738 Len=0
16225 1.179883 172.18.100.15 172.18.100.18 ICMP Echo (ping) request
16226 0.000709 172.18.100.18 172.18.100.15 ICMP Echo (ping) reply
16227 3.052179 172.18.100.15 172.18.100.18 TCP ddi-tcp-1 > 49235 [PSH, ACK] Seq=4164690 Ack=535659 Win=16738 Len=492
16228 0.019502 172.18.100.18 172.18.100.15 TCP 49235 > ddi-tcp-1 [PSH, ACK] Seq=535659 Ack=4165182 Win=251 Len=32
16229 0.047537 172.18.100.15 172.18.100.18 TCP ddi-tcp-1 > 49235 [ACK] Seq=4165182 Ack=535691 Win=16706 Len=0
16230 0.000332 172.18.100.18 172.18.100.15 TCP 49235 > ddi-tcp-1 [PSH, ACK] Seq=535691 Ack=4165182 Win=251 Len=64
16231 0.099530 172.18.100.15 172.18.100.18 TCP ddi-tcp-1 > 49235 [ACK] Seq=4165182 Ack=535755 Win=16642 Len=0
16232 1.393205 172.18.100.18 172.18.100.15 TCP 49235 > ddi-tcp-1 [PSH, ACK] Seq=535755 Ack=4165182 Win=251 Len=34
16233 0.006954 172.18.100.15 172.18.100.18 TCP ddi-tcp-1 > 49235 [ACK] Seq=4165182 Ack=535789 Win=16608 Len=0
Note: Our switch will ping the server every 6 seconds.
In general, we would not expect to see any communication delays between then switch and the server. The max response time is aprox 300ms but generally response time is much lower. But at frame 16227, we see that it takes almost 4.2 seconds (3.05 + 1.17) for the switch to send out the next packet. I think this is interesting because in between the switch was able to ping the server without any delays which suggests to me that the network is still healthy. At frame 16232, we see that the server takes 1.4 seconds to respond to the previous packet (i.e. ACK).
A little later in the logs, we see even more delays, only this time they are all originating on the switch side:
16293 0.099612 172.18.100.15 172.18.100.18 TCP ddi-tcp-1 > 49235 [ACK] Seq=4208432 Ack=535915 Win=17491 Len=0
16294 0.379674 172.18.100.15 172.18.100.18 ICMP Echo (ping) request
16295 0.000428 172.18.100.18 172.18.100.15 ICMP Echo (ping) reply
16296 3.733783 172.18.100.15 172.18.100.18 TCP ddi-tcp-1 > 49235 [PSH, ACK] Seq=4208432 Ack=535915 Win=17520 Len=495 -
16297 0.200446 172.18.100.18 172.18.100.15 TCP 49235 > ddi-tcp-1 [ACK] Seq=535915 Ack=4208927 Win=254 Len=0
16298 0.205381 172.18.100.18 224.0.0.252 IGMP V2 Membership Report / Join group 224.0.0.252
16299 0.473254 172.18.100.18 172.18.100.15 TCP 49235 > ddi-tcp-1 [PSH, ACK] Seq=535915 Ack=4208927 Win=254 Len=21
16300 0.006626 172.18.100.15 172.18.100.18 TCP ddi-tcp-1 > 49235 [ACK] Seq=4208927 Ack=535936 Win=17520 Len=0
16301 1.380183 172.18.100.15 172.18.100.18 ICMP Echo (ping) request
16302 0.000726 172.18.100.18 172.18.100.15 ICMP Echo (ping) reply
16303 1.855978 172.18.100.18 172.18.100.255 NBNS Name query NB 127.0.0.1,4001<00>
16304 0.206693 172.18.100.15 172.18.100.18 TCP ddi-tcp-1 > 49235 [PSH, ACK] Seq=4208927 Ack=535936 Win=17520 Len=510
16305 0.200422 172.18.100.18 172.18.100.15 TCP 49235 > ddi-tcp-1 [ACK] Seq=535936 Ack=4209437 Win=252 Len=0
16306 0.342543 172.18.100.18 172.18.100.255 NBNS Name query NB 127.0.0.1,4001<00>
16307 0.750224 172.18.100.18 172.18.100.255 NBNS Name query NB 127.0.0.1,4001<00>
16308 0.283679 172.18.100.18 224.9.9.2 IGMP V2 Membership Report / Join group 224.9.9.2
16309 0.467106 172.18.100.18 172.18.100.255 NBNS Name query NB 127.0.0.1,4001<00>
16310 0.749850 172.18.100.18 172.18.100.255 NBNS Name query NB 127.0.0.1,4001<00>
16311 0.750133 172.18.100.18 172.18.100.255 NBNS Name query NB 127.0.0.1,4001<00>
16312 0.392744 172.18.100.15 172.18.100.18 ICMP Echo (ping) request
16313 0.000770 172.18.100.18 172.18.100.15 ICMP Echo (ping) reply
16314 2.094337 172.18.100.15 172.18.100.18 TCP ddi-tcp-1 > 49235 [PSH, ACK] Seq=4209437 Ack=535936 Win=17520 Len=510
16315 0.200015 172.18.100.18 172.18.100.15 TCP 49235 > ddi-tcp-1 [ACK] Seq=535936 Ack=4209947 Win=256 Len=0
16316 3.704503 172.18.100.15 172.18.100.18 ICMP Echo (ping) request
16317 0.000520 172.18.100.18 172.18.100.15 ICMP Echo (ping) reply
16318 3.777418 172.18.100.15 172.18.100.18 TCP ddi-tcp-1 > 49235 [PSH, ACK] Seq=4209947 Ack=535936 Win=17520 Len=510
16319 0.209945 172.18.100.18 172.18.100.15 TCP 49235 > ddi-tcp-1 [ACK] Seq=535936 Ack=4210457 Win=254 Len=0
16320 2.012220 172.18.100.15 172.18.100.18 ICMP Echo (ping) request
16321 0.000599 172.18.100.18 172.18.100.15 ICMP Echo (ping) reply
16322 5.740877 172.18.100.15 172.18.100.18 TCP ddi-tcp-1 > 49235 [PSH, ACK] Seq=4210457 Ack=535936 Win=17520 Len=510
16323 0.203816 172.18.100.18 172.18.100.15 TCP 49235 > ddi-tcp-1 [ACK] Seq=535936 Ack=4210967 Win=252 Len=0
16324 0.054286 172.18.100.15 172.18.100.18 ICMP Echo (ping) request
16325 0.000921 172.18.100.18 172.18.100.15 ICMP Echo (ping) reply
16326 5.611319 172.18.100.18 172.18.100.15 TCP 49235 > ddi-tcp-1 [PSH, ACK] Seq=535936 Ack=4210967 Win=252 Len=21
16327 0.007736 172.18.100.15 172.18.100.18 TCP ddi-tcp-1 > 49235 [ACK] Seq=4210967 Ack=535957 Win=17520 Len=0
When we look at the first delay at frame 16296, we see a 4 second time lag to the previous data frame (i.e. 16293). The subsequent blockages have even larger delays - Frames 16314 (5.7sec), 16318 (7.4 sec), 16322 (7.7 sec) and 16326 (5.6 sec). All of these delays are originating on the switch side. I don’t see any real delays on the server side responding to the data packets or the ping requests. The above traces also show that there was other activity on the network. I can see a lot of NetBios and Router activity at this time but I don’t know if this is impacting.
Looking at the rest of the sniffer logs I can see a large number of malformed packets. Here is a typically example
4774 0.000113 172.18.100.15 172.18.100.18 UCP Roaming reset (Result) [checksum invalid][Malformed Packet]
In all cases, the malformed packets originated from the switch. Again, I am not sure how significant this is.
I would be grateful if anyone could provide any insight into the above.
Thanks
Eddie
- Follow-Ups:
- Re: [Wireshark-users] Traffic problems under Window 2008
- From: dan meyer
- Re: [Wireshark-users] Traffic problems under Window 2008
- From: Martin Visser
- Re: [Wireshark-users] Traffic problems under Window 2008
- Prev by Date: Re: [Wireshark-users] Apply as column
- Next by Date: Re: [Wireshark-users] Apply as column
- Previous by thread: Re: [Wireshark-users] Apply as column
- Next by thread: Re: [Wireshark-users] Traffic problems under Window 2008
- Index(es):