Wireshark-bugs: [Wireshark-bugs] [Bug 1181] Delays in real-time packet capture
Date: Mon, 20 Nov 2006 02:48:06 +0000 (GMT)
http://bugs.wireshark.org/bugzilla/show_bug.cgi?id=1181





------- Comment #4 from jyoung@xxxxxxx  2006-11-20 02:48 GMT -------
Hello Peter,

FYI: My current workaround to flush out any packets that might have arrived but
have not yet been displayed in Wireshark is to inject a new packet into the
network several seconds after the expected arrival of the "missing" packet(s) 
(I know this is ugly and labor intensive but it generally works).

The delay problem you see with the ping reply can be easily replicated with
just about any bursty type of traffic on a "quiet" network.  The problem shows
up when a set of packets arrive in less than one second and then there are no
subsequent packets received (for some amount of time).   To be more precise,
the set of packets must arrive (within the same second) with enough
inter-packet delay to be detected as separate packet events.  If the burst of
packets arrives quick enough, then all of the packets can be consumed and
processed as a single event thereby obscuring the packet delay problem.  The
delta time between a typical single ping request/reply usually has more than
enough lag to expose the delay problem.  In fact if I do a "ping -c 1
MYPINGTARGET" I'll typically see an ARP request but no subsequent ARP reply,
ping request nor ping reply even though the ping utility indicates that it had
a successful ping!  So in this case Wireshark has delayed displaying 3 packets.

I think the real solution to this delay problem was suggested in Richard's
initial description of this problem.  It appears that some timer mechanism
needs to be implemented to wake up from a blocking capture_loop_dispatch() to
allow any deferred/batched packets to be flushed IF more than a second has
elapsed since the last flush.

Some background:

I can easily replicate the ping capture behavior you observe using just the
"dumpcap" utility on a SUSE linux system.  The dumpcap utility is the actual
live capture engine now used by Wireshark.  Previously this live capture code
was integrated directly into Ethereal but it was split out into the separate
dumpcap utility for several reasons (security, performance and some code
reusability with tshark). 

Dumpcap's main loop is implemented in capture_loop.c's capture_loop_start(). 
The actual capture loop starts with the line 'while (ld.go)'.  Immediately upon
entering the loop is a call to capture_loop_dispatch().  On my SUSE system the
call to capture_loop_dispatch() blocks until a packet arrives.  When a packet
arrives control is returned back to capture_loop_start().  Within
capture_loop_start() a series of tests are done before capture_loop_dispatch()
is called again where it again blocks until more packets arrive.   

When control is returned to capture_loop_start() one of the tests attempts to
determine if a packet has been "processed" within the last second (or .5 second
on Windows).  If a packet has been "processed" within the last second then
capture_loop_start() will defer processing of the newly arrived packet until a
later time and return control back to the (blocking) capture_loop_dispatch(). 
This is the "batch" mechanism that Richard mentioned in the initial problem
description.

This deferred processing of the packets (batching) is done to minimize the
number of "update" messages that are sent by dumpcap to Wireshark.  When
Wireshark starts a live capture, dumpcap is invoked with the "-Z" flag.  The
"-Z" flag basically tells dumpcap to convey information to Wireshark via
"messages" sent to stdout.  The first "message" tells Wireshark the name of the
tracefile that dumpcap is writing to.  Subsequent "messages" indicate the
number of packets that Wireshark should be able to retrieve from the capture
file.  These "packet count messages" indicate that some arbitrary total number
of packets have arrived.  To keep Wireshark from being flooded with new "packet
count messages" upon the arrival of each and every new packet, dumpcap paces
itself (with a deferral/batching mechanism) so as to send no more than one
"packet count message" per second. 

Now the delay in being able to see the ping reply occurs because the ping reply
packet itself was seen less than one second after the initial ping request.  I
have done some similar testing using a multicast test utility.   If I flood the
network with 1000 small multicast packets, dumpcap/Wireshark will only
acknowledge seeing perhaps the 1st 100 to 300 packets.   When I stop the
capture (When running dumpcap directly use "kill -SIGUSR1 <DUMPCAP-PID>" ) then
the rest of the "missing" packets (1000 multicast packets in total) will
appear.  (FYI: It takes my multicast test utility less than 1/3 of a second to
send the 1000 multicast packets on a 10base-t (half-duplex) segment!)

Since capture_loop_start() does in fact get control back upon the arrival of a
packet I tried to workaround the delay problem by making the flush/notification
code inside the "if( cur_time - upd_time > 0)" test unconditional. 
Unfortunately this exposed the Wireshark performance problem that the
deferral/batch logic was trying to avoid!  With the deferral/batch logic
effectively disabled, dumpcap retrieved and wrote the 1000 frames to the
capture file virtually in real-time.  But Wireshark itself then had to respond
to the inbound flood of "packet arrived messages" generated by dumpcap.  With
the deferral/batch logic effectively disabled it took Wireshark almost 4.5
minutes (270 seconds!) to catch up to the 1000 "packet count messages"
generated by dumpcap's "report_packet_count()" in less than 1/3 of second.   It
looks like it took Wireshark approximately 0.27 seconds to respond to each of
the 1000 dumpcap generated "packet count messages".

Obviously this behavior can be improved upon!

Assuming that the delay problem can be solved with some type of "wake-up and
flush" mechanism leads to two questions.   How "expensive" and how portable is
this proposed "wake-up and flush" mechanism?  

Under light network loads, I would expect that this "wake-up and flush"
mechanism would have virtually no detrimental performance impact at all on
capture performance.  But under high network loads (think GigE or even 10GigE
speeds) the overhead of re-arming the proposed wake up timer could be quite
expensive in terms of overhead and might result in the dropping of more packets
by dumpcap at a high network loads.  

One of the primary reasons for the introduction of the dumpcap utility was to
minimize the number of dropped packets.  Some had observed packets drops
occuring in busy traces due the overhead associated with Ethereal's attempt to
maintain packet "state".  Interestingly Tshark (previously known as tethereal)
maintains packet state and has it's own implementation of the capture dispatch
loop (implemented within tshark.c's capture()).  Although tshark doesn't
exhibit the delay problem I suspect that under extremely heavy network loads a
tshark based capture file will be found to have more dropped packets than a
"pure" dumpcap based capture file. (By "pure" dumpcap, I mean one that was NOT
started from within Wireshark.  I have some empirical experience (but no hard
evidence) that under the right conditions (or wrong conditions depending on you
point-of-view) that the CPU cycles associated with managing Wireshark's GUI
could interfere with dumpcap's ability to capture packets on an extremely busy
network segment).

While I'll probably keep hacking away at this problem, hopefully someone who is
much more select() savvy than myself can cook up a real solution to this
particular packet delay problem.

(One possibility I've thought that might be worth exploring would be to see if
Wireshark could be modified to use logic similar to the way the "tail -f"
utility works to check for any newly arrived data in an active capture file. 
Using this logic, dumpcap would no longer have to defer processing packets
simply to lessen the flow of "packet count messages" for Wireshark's sake.  In
fact I would think that with "tail -f" type logic implemented within Wireshark,
dumpcap would not be required to send any "packet count messages" at all.)

Comments/ideas?


-- 
Configure bugmail: http://bugs.wireshark.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.