Ethereal-dev: [Ethereal-dev] RFC: new TVBUFF type
Note: This archive is from the project's previous web site, ethereal.com. This list is no longer active.
From: "Ronnie Sahlberg" <rsahlber@xxxxxxxxxxxxxx>
Date: Tue, 22 May 2001 18:10:50 +1000
Hi list, Please give me comments on this idea. I have been thinking a lot over how the IP reassembly is implemented and I really dont like that it requires double backing store in malloc()ed memory. Then I thought, that what IP defragment needs is basically the same thing as what some other interesting features would need. The features I think of are 1: treating a TCP datastream as a "array of bytes". 2: merging parts of various UDP/TCP payloads into larger TVBUFFs which one can dissect/do other interesting stuff with. After deciding on and implementing it usefully I plan to use IP defragmentation as a test/proof of concept. The design I have come up which which would be reasonable generic and (hopefully) useful is: TVBUFF_SPARSE_ARRAY Create a new type of TVBUFF, lets call it TVBUFF_SPARSE_ARRAY. This TVBUFF type can be thought of similar to COMPOSITE, but that one can insert a new TVBUFF segment anywhere not only prepend and append. Also, one can have "holes" in the data which this TVBUFF describes. This type of TVBUFF can be created and destroyed as any other TVBUFF, though none of the accessorts or any other functions in tvbuff.c can operate on this TVBUFF. * tvb_sparse_array_new() * tvb_sparse_array_free() This TVBUFF type is only a placeholder to keep other tvbuffs in until TVBUFF_COMPOSITE can be extracted. Then there will be two functions which only operate on TVBUFF_SPARSE_ARRAY * int tvb_sparse_array_insert(tvb_sparse_array, start, len, tvb, offset) tvb is any ordinary tvb type, not SPARSE_ARRAY. This creates (if nessecary) a TVB_SUBSET of tvb of the data from offset to offset+len and links it into tvb_sparse_array from start to start+len. Linking will be similar to how COMPOSITE works today, ie no copying/duplication of data. If entire tvb_sparse_array[start,start+len] is already mapped (i.e. no holes in this interval) the insertion will fail and return 0. If there is a partial overlap between tvb[offset,offset+len] and tvb_sparse_array[start,start+len], 1+ SUBSETS will be created of tvb[offset,offset+len] to fit the holes in tvb_sparse_array[start,start+len]. To keep track of all REAL_DATA and SUBSET tvbs we have mapped into the SPARSE_ARRAY we keep a "list" of { start, len, tvb, offset } structures in each SPARSE_ARRAY. How this list is sorted and how it is managed I will describe below. I think one can limit the size of start,len,offset to be guint32, ie allow a virtual array of 4Gb for each SPARSE_ARRAY. (not really good, but will do for now) ((e.g. this allows direct mapping of TCP sequence numbers of a packet to start)) One can se this function as populating the "virtual" 4Gb array space of the sparse_array tvb. * tvbuff_t *tvb_sparse_array_extract_tvb(tvb_sparse_array, start, len, flags) This function will scan the tvb_sparse_array from start to start+len and if there are no holes in this interval (i.e. we have all this data mapped from other tvbs) it will create a new COMPOSITE tvb and return this tvb. If there are any data not mapped (a hole) in tvb_sparse_array[start,start+len] the call will fail and return NULL. The flags are for special features, as follow-tcp-stream. 0x0001 Scan back until start of current mapped patch If this flag is set, the mapping will not start from tvb_sparse_array[start] but from as far before start which is possible without passing a hole. 0x0002 Scan forward until end of current mapped patch With this flag set, the returned tvb will not onlky map up to tvb_sparse_array[start+len] but as far beyond start+len as possible without creating a hole. flags==0 will probably be normal usage, but flags ==0x0003 would be useful for Follow-TCP-Stream where one packet in the middle of a conversation is selected and one wants to process as much as possible from the conversation. One must extract a tvb using this function before one can access any data in the sparse_array. One can not extract data or use any accessor on the sparse_array itself. How to structure/sort the patches in the sparse_array. There can potentially be a very large number of REAL_DATA or SUBSETS making up a large SPARSE_ARRAY. I think of NFS over TCP and similar things, ie I can easily imagine a sparse_array consisting of 10s of thousands of SUBSETs. Hash-tables will probably not be very useful. What is there to hash? If one considers start==tcp.sequencenumber or start==ip.fragmentoffset one sees that for these applications, the range of start values will be local in the address domain (e.g. 10Mb contignous over 4Gb address space) also, when extracting data from start to start+len there is no guarantee that start will match the start value of the underlying SUBSET. No one needs to be able to search for which mapped SUBSET will start fall into. Ie if we search for start==27, we want to find the SUBSET that maps 20 to 29 (and extract starting from27). Sequential searches through linked lists will be too slow if the list is big. But double linked lists are fast to scan forward and backwards in. In a double linked list, it is very inexpensive to add a new head or tail. Balanced binary trees are very fast to do a random search in (Logn) but more complicated to do forward/backward scans in (thats what macros are for). Another drawback is that self-balancing trees are very expensive to delete nodes in. It is reasonably easy and cheap to keep a tree balanced after adding new nodes to the tree though. One can also estimate that the most common case will be (always) adding new nodes either to the far left in the tree (start values decreasing as we add new data from new packets, i.e. ip.fragmentoffset decreasing packet by packet (linux sending fragments in the "nonobvious" order)) or to the far right (ever increasing TCP secuence numbers. I think that adding new nodes to the middle of the tree will be a special/unusual case. To manage this datastructure efficiently in the common cases I think the best solution is to implement a a special version of a self-balanced-tree which is optimized for adding leaf-nodes to the far right or far left of the tree. Also, this tree need not support deleting any nodes, either we add nodes or we delete the entire tree. It is very easy to create an very cheap algorithm to add nodes when the common case is adding either nodes to the extreme right or extreme left. (this common case will also easily lead to almost perfectly balanced trees). But sequental scan forward/backward can potentially be an expensive operation in a tree (even a balanced one might need traversing up to 30-35 nodes in the tree of 100.000 elements just to get from one element to the sequentially next one). Sequential scans forward/backward will be used when creating the extracted COMPOSITE scanning from start to start+len. Linked lists are much more efficient in this case. So combine the tree with linked list. When adding new nodes, add the node also to the linked list (always cheap since when linking it to the tree we always know either the element immediately before or after the new one in the list). I have always thought the combination : binary-tree+linked-list is the best data structure there is. Am I crazy? Well, I guess a specialized manager for the binary-tree+linked-list structure can be implemented in a few hundred lines of code. Applications for SPARSE_ARRAY would be: * IP reassembly * TCP streams as array of bytes, extract a COMPOSITE containing all data within an arbitrary subset of any TCP conversation. (if all data exists) * RPCoverTCP Create and dissect the payload of large upper layer protocols, NFS Read/Write over TCP. * NFS files, Merge the COMPOSITES from NFS Read/Writes into a SPARSE_ARRAY and extract a COMPOSITE which spans the entire file. * Allow TCP to link all packets with len>0 into SPARSE_ARRAYs. One for each conversation (and direction). Follow TCP stream would then be extremely easy to reimplement. Follow-TCP-Stream would not need to filter any data, just ask for a COMPOSITE as big as possible (flags==0x0003) from the SPARSE_ARRAY which maps the conversation. * Reconstructing files over FTP would be trivial. * Reconstructing files/transactions over HTTP would be trivial. etc etc
- Follow-Ups:
- Re: [Ethereal-dev] RFC: new TVBUFF type
- From: Guy Harris
- Re: [Ethereal-dev] RFC: new TVBUFF type
- Prev by Date: Re: [Ethereal-dev] patch for SSL detection and Solaris compile
- Next by Date: Re: [Ethereal-dev] Question on applying patches
- Previous by thread: [Ethereal-dev] ethereal (any recent version) and ucd-snmp 4.2
- Next by thread: Re: [Ethereal-dev] RFC: new TVBUFF type
- Index(es):