Wireshark-dev: Re: [Wireshark-dev] multiple parsing of the same packets
From: Evan Huus <eapache@xxxxxxxxx>
Date: Wed, 30 Oct 2013 15:07:05 -0400
On Wed, Oct 30, 2013 at 2:20 PM, Matthieu Patou <mat@xxxxxxxxx> wrote:
> On 10/30/2013 07:31 AM, Evan Huus wrote:
>>
>> On Wed, Oct 30, 2013 at 4:14 AM, Matthieu Patou <mat@xxxxxxxxx> wrote:
>>>
>>> Hello,
>>>
>>> I noticed long time ago that wireshark is parsing the same packet at
>>> least 3
>>> tree times.
>>>
>>> To make it worse if I go back and forth to the same packet it will be
>>> dissected one more time.
>>> With complex protocols like DRS (directory replication for Active
>>> directory)
>>> it's really a problem as the UI freeze for a while.
>>
>> Is the protocol really so complex that dissecting a single packet of
>> it takes a user-visible amount of time? That seems suspect to me.
>
> So what I did is that I'm dissecting the deferred RPC pointers only if tree
> != NULL the dissection of pointers takes a while because there is ~ 1700 top
> level pointers and each of them have a lot inner pointers, DRS is a very
> complicated protocol.

Fair enough, that's quite a bit of data to process. The packets must
be enormous.

Putting null-tree checks in can lead to huge improvements. Just be
careful that things like column data and expert info are added even if
tree==NULL.

>>> First thing, why 3 dissections initially, is there a way to reduce this
>>> to
>>> 2, I more or less understand why 2 pass are needed but 3 ...
>>
>> It is in theory possible, the third pass is usually either to fill in
>> the column or tree information. We could in theory pull that straight
>> from the second pass, but we would have to calculate in advance which
>> packets are visible, which may or may not be easy.
>
> Pardon my wireshark ignorance but it really look like the 2nd and the 3rd
> pass are recreating the thing from scratch.

Every time we do a dissection it is more-or-less "from scratch". The
only data that reliably persists is minimal metadata about
conversations, request/response matching and that sort of thing.
Again, this was a decision made to trade off time for memory.

When loading a file, each packet is dissected once in order to set up
this metadata. Then any packet that is visible in the summary pane is
dissected again in order to calculate the column text to display. Then
the selected packet is dissected again to calculate the details tree
to show.

Usually the number of packets visible and/or selected is small (well
under 50) and so this extra dissection takes virtually no time at all.

>>> Also is it possible to remember the dissection of packet so that we don't
>>> do
>>> it again and again ?
>>
>> It is quite possible, it just takes an enormous amount of memory. I
>> actually hacked together a patch for this a few weeks ago while doing
>> some performance tests [1].
>>
>> [1] http://www.mail-archive.com/wireshark-dev@xxxxxxxxxxxxx/msg29107.html
>
>
> Well memory is not limitless neither ...

In the vast majority of cases dissecting a single packet (of any
protocol) is effectively instantaneous, so Wireshark saves as little
state as it possibly can. It has to redissect individual packets a lot
(pretty much any GUI action leads to at least one packet being
redissected) but this permits us to open substantially larger captures
(tens of thousands of packets) than we would be able to open
otherwise.

Given the number of tree items a DRS packet apparently produces,
storing the dissection data for every packet would require megabytes
of data per packet. On a machine with 4GB of ram you probably wouldn't
be able to load more than a few thousand packets without forcing out
into swap. A saturated network can produce that many packets in
seconds (though maybe not that many DRS packets?), so Wireshark would
be pretty useless in that case.

Evan