Wireshark-dev: Re: [Wireshark-dev] Wireshark memory handling
From: didier <dgautheron@xxxxxxxx>
Date: Fri, 09 Oct 2009 03:47:16 +0200
Hi,
Le jeudi 08 octobre 2009 à 22:15 +0200, Erlend Hamberg a écrit :
> Sorry about the late reply. I am one of the other students in the group. 
> Thanks for your answers. I have commented below and would appreciate further 
> feedback.
> 
> On Monday 5. October 2009 20.23.42 Guy Harris wrote:
> > The paper says
> > 
> > 	Since exhausting the available primary memory is the problem ...
> > 
> > What does "primary memory" refer to here?
> 
> That could certainly have been worded more clearly. By primary memory, we mean 
> main memory, as your reasing lead you to.
> 
> The "problem", as we have understood it, and as we have seen it to be, is that 
> Wireshark keeps its internal representation (from reading a capture file) in 
> memory. I write "problem" in quotes, because in most use cases I guess that 
> this is not a problem at all, and this is also how almost any program 
> operates.
> 
> We work for an external customer who uses Wireshark and would like to be able 
> to analyze more data than is allowed by a machine's virtual memory without 
> having to splitup the captured data.
> 
> To be able to do this we looked at the two solutions mentioned in the PDF 
> Håvar sent, namely using a database and using memory-mapped files. Our main 
> focus is 64-bit machines due to 64-bit OS-es' liberal limits on a process' 
> memory space. Doing memory management ourselves, juggling what is mapped in 
> the 2 GiB memory space at any time, is considered out of the scope of this 
> project. (We are going to work on this until mid-November.)
> 
> [...]
> 
> > In effect, using memory-mapped files allows the application to extend
> > the available backing store beyond what's pre-allocated (note that OS
> > X and Windows NT - "NT" as generic for all NT-based versions of
> > Windows - both use files, rather than a fixed set of separate
> > partitions, as backing store, and I think both will grow existing swap
> > files or add new swap files as necessary; I know OS X does that),
> > making more virtual memory available.
> 
> So, on OS X (and possibly other modern OS-es), as long as you have available 
> harddisk space, a process will not run out of memory, ever? (A process can 
> have address space of ~18 exabytes on 64-bit OS X. [1])
> 
> This would mean that this problem would only continue to exist on operating 
> sytems using a fixed swap space, like most (all?) Linux distros still do.
Linux can use swap files too. It doesn't allocate them on demand, that's
all.

I don't see what you would get with mmaped files vs enough swap. But if
you are using wireshark, ie working interactively, it'd be slow, slow as
in unusable.

Using a DB could be a better option, but you need a 'data silo'
something like http://www.monetdb.nl For it a 100 Millions rows 200,000
columns sparse matrice should be a trivial data set. It would be faster
than wireshark for filtering by an order of magnitude or two. 
Disclaimer: We're using a proprietary data silo and I've no experience
with MonetDB.   

A modified Tshark should be able to upload a capture at around 30,000
packets/second.

No idea what would be better for the interactive front-end: a modified
wireshark or a new application.
No idea if you have enough time to do it either.


For example here we are using a modified wireshark.
It's able to filter simple expressions at around 5-10 Millions
packets/seconds. 

it filters complex expressions at 50,000 to 400,000 packets/second. 

But we never use wireshark if it needs to hit harddisks (for us roughly
3 times the file size), it's too slow.

If we have to use bigger files I would use MonetDB, I don't know if
using wireshark on such big data set would be useful though, at some
point more data is just noise.

Note:
A simple expression is a filter expression with only protocols or
previous expressions. ex:
llc && !arp
is a simple expression
tcp.stream == 0
is not but after that
afp && !(tcp.stream == 0)
is one.

Didier