Ethereal-dev: Re: [ethereal-dev] Keeping state for SMB decodes

Note: This archive is from the project's previous web site, ethereal.com. This list is no longer active.

From: Guy Harris <gharris@xxxxxxxxxxxx>
Date: Sun, 17 Oct 1999 01:16:58 -0700
> The state keeping code can be used for other things as well ...

It currently assumes all addresses are IPv4 addresses.

I'd been looking at a way of generalizing "pi.ip_src" and "pi.ip_dst"
for other address types for other reasons.  I have some code that
defines:

	an "enum" giving the types of link-layer and network-layer
	addresses the code can handle (currently, Ethernet, IPv4, IPv6,
	and IPX);

	an "address" structure containing an address type, an address
	length, and a pointer to (the first element of) an array of that
	many bytes, containing the data for the address;

and that replaces "ip_src" and "ip_dst" in "packet_info" with:

	address dl_src;  /* link-layer source address */
	address dl_dst;  /* link-layer destination address */
	address net_src; /* network-layer source address */
	address net_dst; /* network-layer destination address */
	address src;     /* source address (net if present, DL otherwise */
	address dst;     /* destination address (net if present, DL otherwise */

The dissectors set those.  Link-layer dissectors set "dl_src"/"dl_dst"
and "src"/"dst" - the latter are set to the same thing the former are
set to.  Network-layer dissectors do the same for "net_src"/"net_dst"
and "src"/"dst".

The dissectors no longer set the columns for source and destination
addresses; the "fill_in_columns()" routine in "file.c" does that based
on the values of "dl_src", "net_src", and "src", and the "dst"
equivalents.  (Columns that are just "addresses" rather than "hardware
addresses" or "net addresses" are set from "src" and "dst" - those will
be the link-layer address if nobody set the network-layer address, and
will be the network-layer address if somebody did set the link-layer
address.

This may slightly speed up the reading in of capture files, as, if there
isn't any column explicitly set to be a hardware address, we don't
bother generating text for the hardware address if the packet also has a
network-layer address.

My first try at making the SMB code use this replaced the "ip_src" and
"ip_dst" fields of an "smb_request_key" structure with "address"
structures "src" and "dst".   However, as the dissectors set the data
pointers of the "address" structures in "pi" to point either to stuff in
the packet data buffer or to private static buffers (yes, this is a
gross hack, and should be done better), that meant that the address
structures in "smb_request_key" had to contain pointers to
dynamically-allocated copies of the address data.

This had two problems:

	1) I didn't yet have code to free them all in
	   "smb_init_protocol()";

	2) reading in a really huge capture file with boatloads of SMB
	   packets chewed up a lot of memory.

Given that, and given that I suspect there are several places in
Ethereal where we might want to flag packets as belonging to a
particular "conversation" between two address/port pairs, I added some
routines to manipulate "conversations":

	a routine to throw out all existing conversation data and
	allocate new hash tables, etc. (called before reading in a new
	capture file);

	a routine that takes source and destination addresses and port
	numbers and finds the conversation for those source and
	destination pairs, or allocates a new one if it doesn't find
	one.

The latter routine returns a 32-bit unsigned number which is an
"identifier" for that conversation.  It doesn't know which direction the
packet is going in, so the hash table comparison routine, when comparing
a conversation for A:X <-> B:Y with a packet going between C:Z and D:W,
has to check whether:

	A == C && X == Z && B == D && Y == W

or

	A == D && X == W && B == C && Y == Z

I changed the SMB code to first see what conversation the packet
belonged to, and then hash based on the conversation ID and the MID in
the request/response.  We may, however, want to move much of the
manipulation of "conversations" to the transport layer, with the
conversation ID being a field in the "pi" structure, or perhaps even in
the "frame_data" structure; the "Follow TCP Stream" could use that to
figure out whether a given packet is in the current TCP stream or not
(which would also let it handle TCP-over-IPv6 as well as TCP-over-IPv4,
which it currently doesn't do).  We might even want that for UDP - you
don't have connections in UDP, but you *do* have conversations.

To handle, say, closing a conversation, we'd probably want a call to
"inactivate" a conversation - the structure would stay around, but it
would be removed from the hash table.  The TCP code, say, would call
this if, say, it saw something such as an initial SYN for a conversation
the last packet for which came a significant time ago (2MSL, say); you
don't necessarily want to destroy the conversation on the ACK of the
last FIN, because you might want to associate bogus extra packets with
the conversation in case they get delayed on the network or something.

Right now, I don't do anything clever with port numbers; however, to
handle, say, SMB over protocols other than TCP, we probably want to
figure out some way of generalizing port numbers.  For example, the LLC
code might use the SSAP and DSAP as port numbers (to handle SMB over
NetBIOS Frame over LLC Type 2); I don't know what would be done for,
say, Novell or OSI protocols.

I'll send a patch, plus the new files to handle conversations, in a
subsequent message.