Ethereal-dev: Re: [Ethereal-dev] Reassembling TCP segments

Note: This archive is from the project's previous web site, ethereal.com. This list is no longer active.

From: Guy Harris <gharris@xxxxxxxxx>
Date: Thu, 01 Dec 2005 12:16:11 -0800
DRZOIDBERG@xxxxxxxx wrote:

Actually I'm writing an application which tries to retrieve all the HTTP packets transported into TCP packets. This TCP packets are often fragmented, forming different segments associated to the whole TCP packet. I think I can't use a filter to directly obtain the HTTP packets, thus, I have to reassemble TCP segments by myself. First of all, I notice that TCP segments should be reassembled using its sequence numbers, which link up the segments. I also know how works this kind of fragmentation, where the next sequence number expected is obtained adding the payload length of the segment to the current sequence number. Now, I have some doubts:

* I don't know how to know what TCP segment is the first one of the chain of segments which forms a packet.

For HTTP:

the very first data segments, in each direction, of the entire TCP connection are the first segments in an HTTP request or response;

you then process, according to the HTTP 1.1 specification, the request or response, until you get to the end;

the next byte in the flow is the first byte of the next request or response.

Note that I said "next byte in the flow" - there is *NO* guarantee that an HTTP request or response begins at the beginning of a TCP segment. TCP doesn't supply to protocols running atop it any notion of "packets"; it supplies a notion of a sequenced byte stream, and it's entirely the responsibility of the protocol running atop TCP to divide the data stream into packets.

I.e., there's no such thing as a "TCP packet"; there are only TCP segments.

* I don't know what to decide if a TCP segment is the last one of a sequence of segments. I notice that an acknowledgement for the last segment would be used for this purpose, but If I haven't got this acknowledgment, I don't know when the TCP packet finishes.

As I said, there's no such thing as a "TCP packet"; you can't tell, purely from using information in a TCP header, where HTTP requests start or end. Acknowledgements are *NOT* used to indicate whether a TCP segment is the start or end of a packet for a protocol running atop TCP; they're used *solely* by TCP to indicate that it has received a segment.

In fact, there's no guarantee that an HTTP request starts at the beginning of a TCP segment or ends at the end of a TCP segment.

You will have to process the HTTP requests or responses yourself to see where they end.