Erlend Hamberg wrote:
On Saturday 10. October 2009 03.48.29 Guy Harris wrote:
The data from the frames in the capture file are not kept in
Wireshark's address space - they are read in as necessary, into a
small number of buffers (one for the main window, and one for each
packet window opened). *HOWEVER*, if data from a frame is reassembled
into a higher-level multiple-frame packet, the result of the
reassembly is, as noted, kept in Wireshark's address space.
So, when Wireshark reads the capture file, if it finds a single-frame packet,
it will only create a frame_data structure in memory and possibly data from
the dissector for that type of packet. But if the packet is made up of several
frames, the packet is reassembled and kept in memory? If so, do you think this
could be changed? Would it be worth it?
One thought: per-dissector data usually has to be real memory since the
dissectors access it as, well, memory.
The results of reassembly, however, are (I think always) put into a TVB
which you're only allowed[1] to access via the tvb_ APIs. Couldn't a
TVB be backed by something other than memory? For example, a
(non-memory-mapped) file?
To make it not be horrendously slow, the TVB layer might have to
implement some kind of in-memory caching of the stuff going to/from the
file (so that each tvb_get_guint8() wouldn't result in a seek plus a
1-byte read). Or maybe the OS would do that well enough?
[1] tvb_get_ptr() notwithstanding. OK, that is a tvb_ API but it allows
you direct access to the TVB data. Using this API with a file-backed
TVB would require allocating memory and copying it in from disk to
return to the user. BTW, given the big comment about this function in
tvbuff.h, I was surprised to find almost 1300 uses in epan/dissectors/ ...
People complain about it enough that, while in *most* cases it might
not be a problem, we frequently get mail from people who have to split
up capture files to read them - I'd call it enough of a problem that
we should work on it (ideally, by reducing the amount of address space
required by the aforementioned data items).
Yes, absolutely.
It would still be nice if would be possible for people to analyse more data
than will fit in virtual memory (in the case of Linux/Solaris, etc. where the
swap space is fixed). I see that there is an "abstraction" of memory
allocation in epan/emem.c (se_alloc* and friends), but g_malloc, and plain
malloc is used as well, it seems.
If the functions in emem.c were used for all memory allocation/freeing, that
would mean that this could be done by intercepting requests for memory in
those functions.
You mean by sending them to memory-mapped files? Unless, as Guy pointed
out, there's some way to tell the OS to swap out that memory before
normal memory, I think that once you start swapping the UI is (still)
going to become unusable.
What is the status on the use of these functions? I got the impression from
README.malloc that these are recommended, but I mostly see allocations done
using g_malloc. Or is that just allocations that should outlive a capture
session?
Yes, those functions "should" normally be used. But there are good
reasons not to: for example if we know we're allocating a bunch of
memory and we'll free it after the current frame is dissected (so we
can't use ep_ memory) but before the file is closed (so using se_ memory
would mean the allocation sticks around longer than it needs to). The
reassembly code uses g_malloc() (presumably) for this reason.
Another reason, of course, is that the ep_ and se_ allocators are
(relatively) new.