Wireshark-dev: Re: [Wireshark-dev] [Wireshark-bugs] [Bug 1179] cmake build integration + dbus +
From: "ronnie sahlberg" <ronniesahlberg@xxxxxxxxx>
Date: Tue, 7 Nov 2006 12:11:50 +0000
I think the most interesting useage would be to do dissection of
packets for very large traces in paralell across all the cores
greatly speeding up the dissection/filtering of packets.

If doing this, it would as a sideeffect also make your useage example
: multiple captures dissected/operated on in parallell semitrivial to
add,   or things like a separate gui thread so one can operate on the
summary and dissect pane even while dissection is running.


There would be a massive amount of work involved, but would be
feasible. Myself have no interest in doing this though unless some
philantrop makes a hugely parallell bigbox pc materlializes on my
doorstep.


my scetchy design would probably be something like:

1, create multiple threads, one thread per core   just before entering
the while loop in cf_read()  and then blocking and waiting for all
threads to reach the end of the loop and terminate them   but the
while loop would be paralellized.

2, this also means that the functions read_packet() and
add_packet_to_packet_list() would have their signatures changes since
they can no longer pull structures like wth off the single global cf
structure.
these structures would instead have to be made thread local.

3, the actual reading of the next packet from the capture file would
obviously have to be protected by a mutex so only one thread at a time
eats the bytes off a packet from the file.

4, all helpers in epan  such as conversations, proto.c and friends
would have to be audited and all critical sections be protected by
mutexes.    after a causal review it does seem like there are few such
critical sections in those files.

5, emem :   ep allocators need to allocate from a heap that is
specific from each thread.   some parts of the sp allocators and the
assosiative arrays needs protection.


6, the clist.  currently our clist semantics are that we only append
new lines to the end of the clist.   this would not work anymore when
we suddently start dissecting packets out of order which emans we
either start preparing the clist to support insert instead of just
append.
However a much better solution might be to always create a row for
every packet that exists for the clist and just add a new flag
visible/hidden to each row that would indicate whether the row is to
be displayed or not.    rows would still exist in the clist even if
not displayed. then an empry row could be created to the clist at the
same time the packet was read froim the cf file  and dissection would
just populate the columns and set/clear the visible/hidden flag
depending on whether the packet is displayed or not. instead of as
currently only creating the new row if the packet is to be displayed.

This would then trivially allow several other very nice properties.
You could then control if a packet were to be shown or not by just
hiding/unhiding a row in the clist.
For example :  filters could operate on transactions instead of just
packets.   Like if when dissecting a response packet and the response
packet matches the filter and is to be displayed,  the response packet
could then just make sure the matching request is unhidden as well.
or if a response packet is not matching a filter  it could still check
whether the request matched and if so the response would be displayed
anyway.

This would make filters work much better.   if either of the request
or the response matched the filter   both woudl be displayed.

Another feature could be that if you have a filtered trace where you
see only packets 1, 10, 20  but you really want to have a quick peek :
what other stuff happened between packets 10 and 20.   instead of
refitlering the trace completely to remove all fitlering   one could
have a small icon on packet 10 to "expand" and show all the hidden
packets between 10 and the next dispalyed packet.   this would then
just be a matter of "unhiding" all pacekts between 10 and 20 and not
really require any refiltering at all.

As such   paralell dissection woudl require changes to the clist   but
these changes would add other nice features we might havbe wanted even
without multithreading. (so the clist stuff would be zero-cost to
implement)


7, the big one would be to create a new function to register dissectors.
To start with all dissectors would be as today, threads unsafe and
when one such dissector is entered the thread would block until all
previous packets have been completely dissected. This would
essentially be the same as today.

One by one dissectors could be marked as paralell-safe which would be
marked by using a new registration routine
dissector_add_paralell_safe()  which by coincidence would only accept
dissectors that do heuristic and not ones returning void (so at the
same time   at zero-cost we get all dissectors to do proper
heuristical tests verifying the data)

These dissectors could either be like ethernet   which would be
completely paralell-safe   or  like ONC-RPC where we would mark it as
paralell-safe when registering but then explicitely test in the
dissector
if(this_is_a_reply && we_have_not_yet_seen_the_matching_request &&
there_exists_previous_packets_not_yet_fully_dissected){
manually_block_until_all_previous_packets_are_fully_dissected;}


One by one auditing all dissectors and manually add all the code to
check when we can dissect the packet or when we have to block and wait
before all previous packets have been dissected first (in case we need
state from a previous packet)



i have some notes that i can try put down on a wiki page later in the weekend.


there are a lot of other things as well   but i think the above are
the big items



On 11/7/06, Joerg Mayer <jmayer@xxxxxxxxx> wrote:
On Tue, Nov 07, 2006 at 09:26:50AM +0000, ronnie sahlberg wrote:
> As someone that has actually studied the feasibility of making
> wireshark multithreaded and what would be required
>
> "... Written with tons of globals variables, non thread safe, ..."

[Good reasons about the real problems of multithreading and lots of old
messages removed]

While that type of usage would be interesting for people with multicore
machines (like me) I don't think that this is the main reason why people
talk about multithreading. The real reason might me, that they mean
reentrancy with different contexts - like being able to run two tabs
with different capture files in each at the same time.

 ciao
    Joerg
--
Joerg Mayer                                           <jmayer@xxxxxxxxx>
We are stuck with technology when what we really want is just stuff that
works. Some say that should read Microsoft instead of technology.
_______________________________________________
Wireshark-dev mailing list
Wireshark-dev@xxxxxxxxxxxxx
http://www.wireshark.org/mailman/listinfo/wireshark-dev