On Fri, Dec 29, 2006 at 04:28:36PM -0500, Small, James wrote:
> I am using Wireshark to look at mail traffic (SMTP/POP3). When I look
> at the trace I see lots of the following:
> Previous Segment Lost
> Retransmission (suspected)
> Duplicate ACKs
These all point to packet-loss...
> I'm suspecting that this is exacerbated by not having enough Internet
> bandwidth.
That could be the case...
> My question is, how do I interpret this? Does this show that I don't
> have enough bandwidth? Does it mean there needs to be tuning?
It could mean that you don't have enough bandwidth, it could also mean
that somewhere else there is packet loss. Is it only SMTP/POP3 traffic
that has this problem? To all destinations (or from all sources) or
only to a specific host? Do you have any bandwidth-managing devices
in your network? Or is traffic just routed to your internet-link?
> I realize this is not an easy question and would be very happy even with
> a go ready book ABC answer - just as long as once I read book ABC I
> would know how to interpret the data.
To my knowledge, there are no real ABC books and this skill comes
with gaining experience in analysing this kind of issues.
First of all, I would suggest using a graphing tool like MRTG to
start monitoring your most important interfaces in the network
That way, you have historic data that you can match your traces
and problem descriptions to. Look for topped-off patterns, that is
a sign of link saturation. See: http://oss.oetiker.ch/mrtg/.
Second of all, I would collect some traces over time of normal
traffic. That way you could compare the traffic from a faulty
situation to a healthy situation.
Third of all, looking at the traces carefully and trying to
imagine what could cause the packets that you see does help a lot.
If you have some more experienced people around you, that
might help. Feel free to send me a trace personally so I can
tell you what I see in it. I found myself following tcp-seq's
quite often to understand where traffic gets lost. So a book
on TCP is not a luxury. Reading the RFC might also help of
course. I heard "TCP Illustrated" is a good series, although
I never read it myself.
> Any and all advice greatly appreciated.
Based on the info in the beginning of your message, I suspect
there is packet loss between the host that collected the data
and the destination-host mentioned in the duplicate ack messages.
I suspect your uplink is on that side of the network, so it
could very well be source of your problems.
Possible solutions:
- Adding more bandwidth
- Spreading the load more by trying to send non-time-critical
data on times where there is not much bandwidth being used
- Traffic prioritasion on the router, this of course means that
if there is just not enough bandwidth, that you do not prevent
traffic from being dropped, you just decide which traffic
you want to be dropped
- Bandwidth optimisation, there are quite a few devices that can
limit the actual bandwidth being used by compressing and caching
traffic. This mostly only works between sites you administer
yourself
Just my $0,02 ... OK, maybe $0,03 ;)
Cheers,
Sake