Ethereal-dev: Re: [Ethereal-dev] [HTTP]Desegmentation/Reassembly of HTTP headers/bodies
On Oct 8, 2003, at 3:40 AM, Loïc Minier wrote:
First, I preferred using tvb_find_guint8/tvb_strneql functions
instead of tvb_find_line_end because I guessed they could be trouble
with malformed headers. The original code is commented in C++ style.
One problem with "tvb_find_guint8()", at least as you're using it, is
that it assumes that lines end with CR-LF. Perhaps they *should*, but
that doesn't mean that they necessarily *will*.
"tvb_find_line_end()" doesn't care whether the line ends with CR, LF,
CR-LF, or LF-CR.
How would "tvb_find_line_end()" have more problems with malformed
headers than "tvb_find_guint8()"?
Second, I wonder if I should do the "Content-Length:" research inside
the normal loop of dissect_http for adding header. I am not sure
it would be cleaner, and I would not detect the end of headers -
"\r\n\r\n" - easily, it was too complex for my first try.
If the intent is to collect all the headers, and the content, in a
single reassembled frame, you definitely need a loop that runs before
the columns are set or the protocol tree is constructed, and you'd have
to handle Content-Length: in that loop, not the loop that puts the
headers into the protocol tree.
An alternative would be to have a state variable for the conversation,
indicating whether we're processing the request/reply line, the
headers, or the body, along with another state variable giving the
content length, and just do enough reassembly to reassemble a single
header line.
Third, the code miss a content-type filter I do not plan to write, is
it still acceptable when the options are switched off by default?
Perhaps, although it does run the risk of reassembling a really large
amount of content even if the user doesn't care. We can always add a
way to specify what content types should be reassembled if that's a
problem.
Possibly fourth, I read in RFC2616 the Content-Length isn't always
present, but should be for backward compatibility with HTTP 1.0.
If you're referring to section 4.4 "Message Length", then, if
Content-Length is missing, either
1) the message is one that's not allowed to have a message-body, in
which case Ethereal shouldn't even try to reassemble the message body;
2) the message has a Transfer-Encoding field other than
Transfer-Encoding: identity, in which case Ethereal would have to
handle chunked encoding, which is probably something that would be
worth doing eventually, but it's probably not something that needs to
be done now;
3) the message has a media type "multipart/byteranges", in which case
Ethereal would have to handle it, which, again, is probably something
Ethereal should do eventually but that it doesn't need to do now;
4) the message is a reply from the server and the connection is closed
at the end.
Ethereal should, as noted, not even try to process a message-body for
response messages that "MUST NOT" include a message body (although to
tell whether something is a response to a HEAD request we'd have to see
the request, so that might be difficult to handle...). In the case of
non-identity transfer encodings, or multipart/byteranges, it should
probably not reassemble traffic *or* hand it to a subdissector (as it's
not raw data). Otherwise, it could probably assume that the transfer
finishes when the connection is closed, although there's *currently* no
way for TCP to send a "connection closed" indication to the
subdissector.