Wireshark-dev: Re: [Wireshark-dev] Copying TVBs for Reassembly [Was: Filebacked-tvbuffs : GSoC'
From: Evan Huus <eapache@xxxxxxxxx>
Date: Thu, 18 Apr 2013 17:13:49 -0400
On Thu, Apr 18, 2013 at 4:53 PM, Jeff Morriss <jeff.morriss.ws@xxxxxxxxx> wrote:
> On 04/18/13 16:40, Evan Huus wrote:
>>
>> On Thu, Apr 18, 2013 at 3:56 PM, Jeff Morriss <jeff.morriss.ws@xxxxxxxxx>
>> wrote:
>>>
>>> On 04/18/13 15:14, Evan Huus wrote:
>>>>
>>>>
>>>> This is a tangential issue that has always confused me.
>>>>
>>>> Why do we malloc+memcpy data for reassembly when we already have
>>>> 'virtual' composite TVBs?
>>>>
>>>> Wouldn't it be more efficient (in time and memory) to create a
>>>> composite TVB for each reassembly and then build the reassembled
>>>> packet in it? You would never have to copy or allocate any actual
>>>> packet data...
>>>
>>>
>>>
>>> There are a couple of problems with doing that (that I recall):
>>>
>>> 1) Composite TVBs don't actually work (or didn't work until very
>>> recently?).
>>>
>>> 2) The data behind a TVB goes away as soon as we're done dissecting (and
>>> displaying) the packet.  That is, the TVB data is overwritten (IIRC) when
>>> the next packet is read.
>>>
>>> I suppose there was never any real reason to try to make reassembly work
>>> with composite TVBs: if they're just more malloc()'d memory then why mess
>>> with it rather than allocate our own copy of the data?  (Well, OK, it
>>> would
>>> save a data copy, but...)
>>
>>
>> OK, so then the optimal case would be a tvb implementation that stored
>> only frame_data pointers, offsets and lengths... similar but not
>> identical to the current composite implementation.
>>
>> The reassembly code could then add meta-data to this when
>> reassembling, and the tvb could lazily refetch the underlying tvbs
>> using the existing wiretap interface? If we add some sort of caching
>> mechanism so that repeated accesses didn't keep forcing reads of the
>> original file then I expect this would be very fast:
>>
>> - adding fragments to reassembly would be near-instantaneous (just a
>> few pointer updates)
>> - reassembled tvbs would take minimal memory except when accessed
>> (using tvb_get_* or proto_tree_add_*)
>> - accessing a reassembled tvb would just be an offset calculation and
>> then a wtap read to bring into memory the underlying real packet(s)
>> containing the data being requested (assuming they aren't already
>> cached)
>>
>> Thoughts?
>
>
> Yep, that sounds about what I was thinking of for file-backed (+composite)
> TVBs. :-)

Oh, OK, I misunderstood that then.

> Basically reassembly would only have to deal with composite TVBs (for the
> reassembled data).  The TVB layer, using file backing for the real data,
> would deal with keeping (and not keeping) the data in memory when it is
> needed (caching).

Yes.

> Hmmm, the way you mentioned wiretap makes me think/realize that the TVBs
> should not actually store the real offset (since we'd want that stuff
> abstracted away by wiretap so the tvbuff code wouldn't have to deal with
> ugliness like compressed files and so forth).  That might make it a bit
> harder...  Unless we backed the TVBs with temporary files (seems wasteful
> and a pain to clean up after).

Not sure I follow this, as I'm not sure what you mean by 'real
offset'. TVBs don't currently store any offset into the file itself.
And the virtual ones I described wouldn't have to, that information is
already in frame_data->file_off.