Wireshark-dev: [Wireshark-dev] Idle Thoughts on Parallelized Packet Dissection
From: Evan Huus <eapache@xxxxxxxxx>
Date: Sun, 13 May 2012 10:33:56 -0400
This is a topic that's been stewing in the back of my brain for a
while now, but it's cooked enough that I think it's worth getting
feedback on. This is a long, (overly) detailed email - read with
caution :)

tl;dr; I think it's possible to support parallelized (multi-threaded)
packet dissection in a manner that's both useful (provides a good
distribution of work over multiple cores) and backwards compatible
with dissectors written in the current, single-threaded style. The
changes to the Wireshark core would be significant and intrusive, but
once finished individual dissectors could be updated at our own
convenience.

---

First off, there are a lot of obvious, known problems (like global
variables) that would need to be fixed before multi-threading makes
sense. Most of them appear to be documented at [1]. But we know, more
or less, how to fix those - we just need to put in the work.
Unfortunately, there isn't a lot of motivation, because we're still
stuck at the second part of the problem.

In Guy's words (from [2], #6), packet dissection is an "embarrassingly
serial" problem in a lot of ways. Dissecting a single protocol in a
single packet can depend on other packets, on other protocols in the
same packet, on conversations and other structures, and on
who-knows-what-else, and right now that information isn't really
stored anywhere (except implicitly through the use of certain APIs,
but that's certainly incomplete). And even if we were to somehow
collect all that information, what would we do with it? Since TCP
conversations (and window calculations, and ack analysis, and ...)
depend on all the previous TCP packets in the capture, does that mean
that we have to revert to strictly single-threaded dissection as soon
as we see a TCP packet? That wouldn't be very useful at all.

It is worth noting that a lot of TCP dissection could be done without
any previous information. The locations and values of the fields
themselves don't depend on anything stored in previous packets.
Neither does the choice of sub-dissector. It's really all of the bells
and whistles (conversations, ack analysis, expert info, etc.) that are
the problem, but right now they're all mixed together with the easy
stuff.

So let's split them up.

---

In broad strokes, the idea is this: instead of registering a single
dissect_proto() function, allow dissectors to register multiple
functions to be chained together for dissecting a single packet. Allow
these registrations to specify various levels of dependency on other
parts of the capture (the three that come to mind are "Totally
Independent", "Conversation Only", and "Everything", but I'm sure
there are others). Then the core can run parts of the dissection in
parallel, making sure for each function that its dependencies are
satisfied before it gets called.

For example, let's consider a bunch of TCP/IP/Ethernet packets (with
nothing below TCP for simplicity's sake). The total work for a single
packet, in the traditional serial dissection, would look like:
1. Ethernet
2. IP
3. TCP

Now let's say that each of those three dissectors have been converted
to use two chained functions, where the first is "Totally Independent"
and the second depends on "Everything". If the newly parallelized
dissectors were run serially, it would look like:
1. Ethernet
---a) Totally Independent
---b) Depends on Everything
2. IP
---a) Totally Independent
---b) Depends on Everything
3. TCP
---a) Totally Independent
---b) Depends on Everything

However, they can now be run at least partially in parallel. Since
sub-dissector choice depends only on a field in the current packet, it
could be placed in the first, "Independent" function (part a) for all
three protocols. With proper parallelization, 1b and 2a could be run
simultaneously, and 2a could trigger 3a whether or not 1b is done yet.
It would also be possible to do inter-packet parallelization, with
step 1a being started in parallel on as many packets as desired.

---

This idea is based on the assumption that most protocols have at least
some parts of their dissection that don't depend on anything prior.
These parts can therefore be split out into lower-dependency
functions. Based on what I've seen looking at some common protocol
dissectors, I don't think that's an entirely unreasonable assumption.

It does mean, however, that APIs will need to be enhanced to support
accessing and manipulating proto_trees in more interesting ways: if I
register a field in my first function and then want to verify it in a
later function, I should be able to pull it from the proto_tree by
field (hf_whatever), rather than finding it's value in the tvb again.
If I subsequently want to add expert info to the field, or insert a
generated field immediately after it, I should be able to do so easily
and quickly. Based on my understanding of the current proto_tree
layout, all of this should be doable with some work (and possibly an
extra data structure or two).

On the plus side, this method provides a really easy way to maintain
backwards compatibility for older dissectors. Simply leave the current
registration function as a wrapper around the new registration
function with a dependency of "All", and dissectors that haven't been
adapted yet will be automatically serialized.

---

Obviously this is a hugely ambitious undertaking, but I think it's
doable, and the benefits on modern multi-core systems would be
significant.

Please ask questions and provide feedback, I'm sure there are things
I've missed.

Thoughts?

[1] http://wiki.wireshark.org/Development/multithreading
[2] http://wiki.wireshark.org/Development/Wishlist#General_.2F_Unsorted