Hi,
Le jeudi 08 octobre 2009 à 22:15 +0200, Erlend Hamberg a écrit :
> Sorry about the late reply. I am one of the other students in the group.
> Thanks for your answers. I have commented below and would appreciate further
> feedback.
>
> On Monday 5. October 2009 20.23.42 Guy Harris wrote:
> > The paper says
> >
> > Since exhausting the available primary memory is the problem ...
> >
> > What does "primary memory" refer to here?
>
> That could certainly have been worded more clearly. By primary memory, we mean
> main memory, as your reasing lead you to.
>
> The "problem", as we have understood it, and as we have seen it to be, is that
> Wireshark keeps its internal representation (from reading a capture file) in
> memory. I write "problem" in quotes, because in most use cases I guess that
> this is not a problem at all, and this is also how almost any program
> operates.
>
> We work for an external customer who uses Wireshark and would like to be able
> to analyze more data than is allowed by a machine's virtual memory without
> having to splitup the captured data.
>
> To be able to do this we looked at the two solutions mentioned in the PDF
> Håvar sent, namely using a database and using memory-mapped files. Our main
> focus is 64-bit machines due to 64-bit OS-es' liberal limits on a process'
> memory space. Doing memory management ourselves, juggling what is mapped in
> the 2 GiB memory space at any time, is considered out of the scope of this
> project. (We are going to work on this until mid-November.)
>
> [...]
>
> > In effect, using memory-mapped files allows the application to extend
> > the available backing store beyond what's pre-allocated (note that OS
> > X and Windows NT - "NT" as generic for all NT-based versions of
> > Windows - both use files, rather than a fixed set of separate
> > partitions, as backing store, and I think both will grow existing swap
> > files or add new swap files as necessary; I know OS X does that),
> > making more virtual memory available.
>
> So, on OS X (and possibly other modern OS-es), as long as you have available
> harddisk space, a process will not run out of memory, ever? (A process can
> have address space of ~18 exabytes on 64-bit OS X. [1])
>
> This would mean that this problem would only continue to exist on operating
> sytems using a fixed swap space, like most (all?) Linux distros still do.
Linux can use swap files too. It doesn't allocate them on demand, that's
all.
I don't see what you would get with mmaped files vs enough swap. But if
you are using wireshark, ie working interactively, it'd be slow, slow as
in unusable.
Using a DB could be a better option, but you need a 'data silo'
something like http://www.monetdb.nl For it a 100 Millions rows 200,000
columns sparse matrice should be a trivial data set. It would be faster
than wireshark for filtering by an order of magnitude or two.
Disclaimer: We're using a proprietary data silo and I've no experience
with MonetDB.
A modified Tshark should be able to upload a capture at around 30,000
packets/second.
No idea what would be better for the interactive front-end: a modified
wireshark or a new application.
No idea if you have enough time to do it either.
For example here we are using a modified wireshark.
It's able to filter simple expressions at around 5-10 Millions
packets/seconds.
it filters complex expressions at 50,000 to 400,000 packets/second.
But we never use wireshark if it needs to hit harddisks (for us roughly
3 times the file size), it's too slow.
If we have to use bigger files I would use MonetDB, I don't know if
using wireshark on such big data set would be useful though, at some
point more data is just noise.
Note:
A simple expression is a filter expression with only protocols or
previous expressions. ex:
llc && !arp
is a simple expression
tcp.stream == 0
is not but after that
afp && !(tcp.stream == 0)
is one.
Didier