Ethereal-dev: Re: Patch in reassemble.c (was Re: [Ethereal-dev] Crash in ethereal 0.10.8, some

Note: This archive is from the project's previous web site, ethereal.com. This list is no longer active.

From: Peter Johansson <Peter.Johansson@xxxxxxxxxxxx>
Date: Mon, 25 Jul 2005 22:43:58 +0200
Guy Harris wrote:

Peter Johansson wrote:

I have not forgot about this, I have just been a little bit more busy than usual. I finally tracked down the problem though.
My conclusion is this:
Ethereal crashes in reassemble.c because reassemble.c copies data to a memory area that is not yet allocated (fd_i->flags has the FD_NOT_MALLOCED bit set). I have a solution to this (ensuring that a crash does not occur) which I will post once I have done some cleaning up.

The crash in reassemble.c occurs only as the result of a faulty protocol dissector. In this case it is packet-bittorrent.c that is the reason for the crash. The Bittorrent dissector registers only a heur-dissector (which should be fine). But once the heur test function detects that this TCP stream is in fact Bittorrent data, it creates a conversation, making sure that all future data in the same TCP stream is decoded by the Bittorrent protocol dissector without the use of the heur test function (this too should be fine I guess). The heur test function is capable of telling the calling framework whether the PDU was in fact decoded by this dissector or not by returning TRUE or FALSE packet-bittorrent.c. The function that dissects Bittorrent data based on the fact that it belongs to a conversation does not have the opportunity of telling the calling framework that it in fact cannot decode the supplied PDU if necessary. And this is necessary in the rare event that packet-tcp has marked the current PDU with "[TCP previous segment lost]". In this case some data is missing but the Bittorrent dissector still assumes that the first 4 bytes of the PDU denotes the length of the PDU to be dissected. The problem now is that since data was lost, the length is read using a random offset into the original Bittorrent packet (since some data was lost). My guess is that this could happen for any dissector that is called since the data belongs to a conversation created by the specific dissector when data has been lost.

Should packet-tcp perhaps not call higher level dissector when the PDU is marked with "[TCP previous segment lost]" or at least not perform the try_conversation_dissector(...) call? What would be the better way of ensuring that this does not happen with any of the already existing dissectors? Should perhaps the API at hand for dissectors be changed so that when decoding PDU data, the dissector would be able to return TRUE or FALSE in a similar way to the heur functions? This way, any dissector would be able to tell the lower layer dissector that although it should have handled this PDU, it could not.

What is your opinion?

/ Peter

_______________________________________________
Ethereal-dev mailing list
Ethereal-dev@ethereal.
http://www.ethereal.com/mailman/listinfo/ethereal-dev


Unfortunately:



and

2) I don't understand Swedish so I can't easily tell what the technical discussion above says.

That feature is used by several dissectors, such as the HTTP dissector which, when it's reassembling the entity headers of an HTTP request or response, keeps requesting more data until it sees the blank line at the end of the entity headers, at which point it says the reassembly is complete.

That feature is now broken because:

The "If it was already defragmented and this new fragment goes beyond data limits" loop at the top of "fragment_add_work()" "undoes" the reassembly by pointing fragments that no longer have data, because it was copied to the reassembled chunk and then freed, at the target of the copy in the reassembled chunk, and sets the FD_NOT_MALLOCED flag on those fragments.

The "we have received an entire packet, defragment it and free all fragments" code in "fragment_add_work()" saves the pointer to the old reassembled chunk, allocates a new chunk to hold the reassembled data, and then falls into the "add all data fragments" loop.

The "add all data fragments" loop in "fragment_add_work()" then used to copy *all* the fragments, regardless of whether FD_NOT_MALLOCED was set on the fragment or not, into the newly-allocated chunk. It now copies only the chunks with FD_NOT_MALLOCED set, and reports the others as being "Reassemble error"s.

This means that, in the reassemblies after the first reassembly, some of the data in the reassembled chunk is whatever just happened to be there at the time of the allocation.

The old code *did* work correctly for some captures I have with HTTP traffic in them - FD_NOT_MALLOCED doesn't mean "fd_i->data isn't valid", it means "it's not the address of a mallocated chunk, it's an address *in* a mallocated chunk".

What are the details of the cases where the old code *didn't* work?

It might be that the correct fix is to, in the "we have received an entire packet..." code, set "fd_head->data" to "g_realloc(fd_head->data, max)", which means that the data that was already copied there during previous reassemblies will still be there. However, we *still* need to get rid of the printout of the "Reassemble error" message, because it's bogus to print that message every time we, for example, reassemble HTTP entity headers - which means we should really figure out why we're doing that in cases where it *is* an error, and figure out where to fix that.

"tcp_dissect_pdus()" uses the "continue reassembly" feature - it first tries reassembling the fixed-length portion of the PDU, so that the "get the length" routine has enough of that portion to find out how large the packet is, and then tries reassembling the entire packet, so if the 4-byte header of a presumed BitTorrent packet is split across TCP segments, that code path would be used.

One place where there's *definitely* a risk of problems is a presumed BitTorrent packet where the presumed length field is greater than 2^32-5, so that when 4 is added to it we overflow and get a value *less* than 4. However, going back to at least 0.10.8, if the get_pdu_len routine called by "tcp_dissect_pdus()" returns a value less than the length of the fixed-length portion of the PDU, that's assumed to be an overflow, so it just shows a "Malformed packet" error and quits.

This is a repost since my first attempt (a week ago) did not seem to reach the list.

1) My apologies, my intention was of course not to break the reassembly code...although I don't quite understand what you imply. I thought I had verified that packet reassembly still works after the patch. Am I just not looking at enough layers of dissectors on top of packet-tcp?


Are you looking at HTTP packets with the HTTP entity headers split across TCP segment boundaries and with reassembly of HTTP entity headers enabled?

No, I haven't looked at that. Is there perhaps a sample capture file that can be used for this. I looked (now) at the wiki but could not find one.

If not, you're less likely to see this problem.  As per my message:

>>     1) this completely breaks the feature wherein a TCP dissector,
>> handed a reassembled chunk of data, can indicate that it needs at
>> least N more bytes of data to be added to the reassembled chunk, so
>> that the reassembly has to be continued

reassembly isn't *completely* broken, it's just broken in the case where reassembly is done but the dissector says, when handed the reassembled chunk of data, "sorry, I need even more data" causing reassembly to be restarted.

OK.

2) My end-part of my conversation with Ronnie Sahlberg translated from swedish is attached in the Swedish2English.txt file.

Do you also have a translation of what Ronnie said?

He sent me a private mail (hence the Swedish conversation) and gave me some ideas about what to look for as he wrote the reassembly routines from the beginning. At the same time he requested some sample capture files but he did not see the need to get them after I had presented what I had found so far (the translation that you have already read).

Since the reassembly function is now broken, I guess that the best thing to do right now is to back out of my earlier proposed change to reassembly.c. This will however make ethereal crash again in memcpy called from reasseble.c most certainly when decoding Bittorrent data.

*All* Bittorrent data, or just some? "Update list of packets in real time":

Just some. It seems to occur especially when the capture is missing frames.
Anyhow, it only occurs when Bittorrent dissector is enabled.

only if performing a capture with "Update list of packets in realtime"
enabled.

3. The crash only ocurrs when ethereal decodes gathered data, that is

doesn't affect the way dissection is done (it dissects as it reads the file - i.e., as new packets are put in the file - and puts up the dissection as that happens), but it *might* increase the chances of packets being dropped (as Ethereal's doing a lot more work as packets arrive).

Do you have a capture file that shows this problem? (If Ethereal crashes, the capture file will probably be in /tmp or /var/tmp or C:\temp or...; with the bug fix, you can save it.

Currently, with the capture files that I have, it requires two sets of capture files (I captured bittorrent data to rotating sets of files). Each file is about 20MB.
I will fiddle with this a bit using your changes to reassemble.c.

Should perhaps the API at hand for dissectors be changed so that when decoding PDU data, the dissector would be able to return TRUE or FALSE in a similar way to the heur functions?

It should be changed to allow it to return some "mine" or "not mine" indication. Currently, we have such a mechanism, but it's not really clean. It returns an indication of how many bytes of the packet were dissected; unfortunately, there are places were turning 0 for "not mine" causes problems (yes, really - I could find out where it was, but it'd take some work). I've eliminated one place where that number is used; if I can eliminate all of them (or provide some other way to get that information), we could just have the dissectors return a TRUE/FALSE indication.

That would be neat.

I honestly do not know what the best solution would be for this problem as it is reassemble.c that is vulnerable to dissectors that cannot handle their data correctly.

I understand (now) that my interpretation of FD_NOT_MALLOCED was not correct, the fact still remains though that reassemble.c passed invalid (unallocated data) to memcpy. I only noticed this when also the FD_NOT_MALLOCED bit was set, hence my misinterpretation. I guess that Ronnie Sahlbergs proposal for a new memory allocation API could be used to detect that a memory are is not valid and should not be passed on to for instance memcpy.

The best solution to this problem is to figure out what code path leads there and, based on that, to figure out at what point a check should be inserted to detect it.

I do agree!

A too-large length, at least for a protocol running atop TCP, *shouldn't* cause a problem (other than filling up memory with reassembled data), as tcp_dissect_pdus() checks for overflows.

/ Peter