Wireshark-dev: Re: [Wireshark-dev] Insufficient Data for Heuristic
From: Guy Harris <guy@xxxxxxxxxxxx>
Date: Sat, 22 Feb 2014 16:46:22 -0800
On Feb 22, 2014, at 4:13 PM, Evan Huus <eapache@xxxxxxxxx> wrote:

> If a dissector checks the captured length and finds that it doesn't
> have enough data captured to run its heuristic (assuming there was
> enough on the wire for the packet to be valid), should that count as
> an auto-pass, or an auto-fail (ie should the heuristic reject the
> packet, or assume that it's valid and skip the check)?
> 
> My instinct is to count it as a pass; we'll dissect the first few
> fields then throw an exception. I suppose there are potentially other
> dissectors in line that would actually accept the packet, but then
> there might also be cases where there aren't any, and we'd be leaving
> it undissected.

"Leaving it undissected" is independent of the order in which the dissectors' register-handoff routines are run; "letting the first one dissect it" isn't independent of that order.

Perhaps it's time to split the "check if this is a packet for this protocol" and "dissect this packet" operations into separate functions.  With that, for any given protocol with zero or more key-based dissector tables and a heuristic dissector table, you would have dissectors that are registered in one of the key-based dissector tables, if there are any, and dissectors that are registered in the heuristic dissector table.  The only difference between the two tables would be that entries in the key-based tables have a key (port number, protocol number, media type, etc.) and entries in the heuristic-based tables don't.

If there's one or more entries in a key-based dissector table matching a given key, the "check if this is a packet for the protocol" routine would be run for each of them; if there is no such routine for an entry, we'd treat that as a routine that always says "yes".  If only one routine matches, we'd call the corresponding "dissect this packet" routine; if more than one matches, or if none matches, we'd dissect it as data.

If there's one or more heuristic dissectors in a heuristic dissector table, the "check if this is a packet for the protocol" routine would be run for each of them.  (We would reject attempts to register a null "check if this is a packet for the protocol" routine in a heuristic dissector table.)  If only one routine matches, we'd call the corresponding "dissect this packet" routine; if more than one matches, or if none matches, we'd dissect it as data.

In the cases where there's more than one, we'd note the protocols for them, and, in the "Dissect As..." dialog, present those protocols.  If a protocol is selected, we'd somehow mark its entry as "always use this entry", so that the above searches for a dissector to hand off to are skipped.

In this case, if we count "not enough data" as an auto-pass, we'd end up punting the choice of dissector to the user if more than one matched.

A variant would be to have a "strong pass" (enough data to check, and the check passed) and a "weak pass" (not enough data to check), prefer strong passes to weak passes, choose the strong pass if there's only one, and punt to the user if there are no strong passes but there's at least one weak pass or if there's more than one strong pass (and possibly sort the strong passes before the weak ones).