On Oct 5, 2009, at 8:01 AM, Håvar Aambø Fosstveit wrote:
We are a student group from the university of Science and
Engineering in
Norway, and are having a project on handling large data sets and
specifically
Wiresharks issues with it. I have included part of our prestudy into
the
problem as an attachment, and we are wondering if anybody has som
immediate
thoughts regarding our plans for a sollution.
The paper says
Since exhausting the available primary memory is the problem ...
What does "primary memory" refer to here?
It later says
An alternative for getting more memory than the machine's RAM is to use
memory-mapped files.
so presumably "primary memory" is referring to main memory, not to the
sum of main memory and available backing store ("swap space"/paging
files/swap files/whatever the OS calls it, plus the files that are
mapped into the address space).
Presumably by "more memory than the machine's RAM" you mean "more
memory than the machine's RAM plus the machine's swap space" - all the
OSes on which Wireshark runs do demand paging, so Wireshark can use
more memory than the machine has (unless the OS requires every page in
RAM to have a swap-space page assigned to it, in which case it can use
max(available main memory, available swap space)).
In effect, using memory-mapped files allows the application to extend
the available backing store beyond what's pre-allocated (note that OS
X and Windows NT - "NT" as generic for all NT-based versions of
Windows - both use files, rather than a fixed set of separate
partitions, as backing store, and I think both will grow existing swap
files or add new swap files as necessary; I know OS X does that),
making more virtual memory available.
The right long-term fix for a lot of this problem is to figure out how
to make Wireshark use less memory; we have some projects we're working
on to do that, and there are some additional things that can be done
if we support fast random access to all capture files (including
gzipped capture files, so that involves some work). However, your
scheme would provide a quicker solution for large captures that
exhaust the available main memory and swap space, as long as you can
intercept all the main allocators of main memory (the allocators in
epan/emem.c can be intercepted fairly easily; the allocator used by
GLib might be harder, but it still might be possible).