Wireshark-dev: Re: [Wireshark-dev] Wireshark memory handling
From: Guy Harris <guy@xxxxxxxxxxxx>
Date: Fri, 9 Oct 2009 19:08:08 -0700

On Oct 9, 2009, at 7:43 AM, Jeff Morriss wrote:

One advantage of using memory mapped files instead of swap is that if
your OS is swapping, *everything* is slow.  If only Wireshark is, er,
swapping, only Wireshark is slow.

That depends on the OS's policies for managing main memory - and on any policy hints given to the OS by the application. If, for example, when it searches for a page frame to use to satisfy a page fault, it uses the same policy when servicing a page fault for a page backed by a mapped file and when servicing a page fault for a page backed by swap space (an "anonymous" page), the only advantage to memory mapping would be

1) if the file is mapped into multiple process's address spaces (and either read-only or not copy-on-write), those processes can share a single page frame for a page from the file - but that's not the case here, as I understand it;

2) if the data in anonymous pages is a copy of data from a file, memory-mapping the file even in only one process means that you don't even temporarily have two copies of the data in memory.

Using memory mapped files would probably help quite a bit with keeping
the UI responsive because only Wireshark's, for example, packet data
would be on disk but the executable pages and "core" memory like the
statistics could be kept in RAM (or at least whatever the OS gives us).

As per my mail to Erlend, the frame data isn't kept in Wireshark's address space, although reassembled data is (and frame_data structures are, and some or all column text is).

However, if Wireshark reads a large capture file, on many OSes the blocks of the file will be brought into the page pool (as, on many OSes, the "buffer cache" is implemented atop the page pool, so pages being read in with read()/ReadFile() compete for memory with pages faulted in - it may even be that a read is done by mapping into the kernel's address space the region of the file being read and copying from that region into the userland buffer space, so that the actual file system reads are done in response to page faults). *Hopefully* the OS will recognize it as sequential access and, at least, not completely blow the page cache if the file is big enough (although, if you have enough memory that you *don't* blow the page cache, you might as well keep the pages in memory; my menagerie of capture files I use for Wireshark/tcpdump regression testing for some changes can fit entirely in main memory on my machine, so if I run the tests twice in a row, the disk hardly does anything).