Ethereal-dev: Re: [Ethereal-dev] tcp reassembly design thought

Note: This archive is from the project's previous web site, ethereal.com. This list is no longer active.

From: Guy Harris <guy@xxxxxxxxxx>
Date: Wed, 27 Jun 2001 11:55:30 -0700 (PDT)
> I am more and more convinced that the only sufficiently powerful and generic
> solution would be changing ethereal
> to do stateful packet handling. I.e. remembering packet content and tvbuffs
> between packets.

Why is it a requirement that packet *content* - by which I assume you
mean the raw packet data - be stored in memory persistently?

Random access to non-gzipped files, and compressed Sniffer files (i.e.,
compressed using the Sniffer compression scheme, not gzip), should be
reasonably efficient.

Random access to non-gzipped files is currently extremely inefficient,
but I think I can make it reasonably efficient (there are already cases
where the lack of efficiency is a nuisance - consider going to the last
packet in a large gzipped capture, and using the up arrow key to scroll
up, so that as you move up the current frame is dissected; yes, somebody
did this at one point, and noticed that it was slow - and there are
changes I plan to make at some point that will make efficient random
access a requirement).

With that:

	on the sequential first pass through the capture file, frame
	data would be saved in memory only until it's fully dissected -
	it would then be freed (so that, for example, all the fragments
	of a fragmented IP/CLNP datagram would be discarded when
	reassembly completes and the reassembled PDU is dissected);

	however, to frames that are part of larger packets, the
	dissector doing the reassembly would attach, with
	"p_add_proto_data()", enough information to allow the dissector
	to later read other frames from the capture file and reassemble
	the larger packet as necessary;

	that sort of reassembly would be done on later frames.

Think of it as demand paging in software - which means we can

	1) handle capture files too large to be mapped into memory
	   (which a scheme using "mmap()"/"MapViewOfFile()" couldn't do;
	   I would not be at all surprised to hear that people *do* want
	   to handle files that don't fit into memory, and wouldn't want
	   to hear "go buy a 64-bit machine");

	2) handle compressed Sniffer files or gzipped files.

> This could make stateful operations such as reassembly of (out of order?)
> tcpsegments much much easier, infact it would be almost trivial.

Eh?

The bulk of the reassembly work would, I think, be the processing of
the TCP headers, not the data.

> Also nice features as searchable fault like
> tcp.segment.overlap.conflict==TRUE  would be extremely easy to implement if
> dissect-tcp() knew of every single previous tcp packet it dissected.

> Would mmap()ing (instead of fread()/fwrite())the capture file in ethereal be
> an option considered for future versions?

It is not something I would consider, as

	1) memory-mapping gzipped or compressed-Sniffer files wouldn't
	   work the way memory-mapping uncompressed files works - to
	   make us handle those files in that fashion, we'd have to
	   allocate a chunk of memory equal to the total *uncompressed*
	   size of the capture, and read it into memory, which would not
	   only consume address space it'd consume swap/pagefile space,
	   *and* would either require knowing the size of the
	   uncompressed data beforehand, making an extra pass through
	   the file to determine it, or potentially fragmenting the heck
	   out of memory (by continually expanding the size of the chunk
	   of uncompressed data) and further reducing the maximum size
	   of the capture;

	2) it'd limit capture file size for uncompressed files to what
	   can be stuffed into the address space of a process;

	3) I see no evidence that it is necessary or that it'd be a
	   major improvement, if we can instead make random access to
	   gzipped files efficient (which I have to do in order to, for
	   example, reduce the Ethereal memory footprint by replacing
	   GtkCList with a "virtual clist" widget that calls back to a
	   routine to get the column data - meaning that we'd call back
	   to a routine that dissected the frame when a packet summary
	   line is to be drawn on the screen).