Ethereal-dev: [Ethereal-dev] Performance. Ethereal is slow when using large captures.
Ethereal is very slow when working with very large captures containing
hundred of thousands of packets.
Every operation we do that makes ethereal rescan teh capture file taks a
long time.
I would like to start looking at methods to make ethereal faster, especially
for large captures trading a larger memory footprint
for faster refiltering speeds but would like to know if there are others
interested in helping.
(you dont have to be a programmer, someone that knows how to run gprof and
read the data would be very very helpful as well)
If there is interest maybe we should try to ask Gerald to set up an
additional mailinglist as well : ethereal-speed@xxxxxxxxxxxx but that is a
question for later
depending on whether there is interest.
If there is interest in this, I propose we take 0.9.16 as baseline, and
measure the execution speed and memory footprint for ethereal
for this version to start with.
A good example capture might be setting snaplen to 100 and downloading
ethereal from the website over and over until the capture file has 100.000
packets (10MB).
Create three color filters : "ip" "tcp" and "frame.pkt_len>80" (random
filter strings from the top of my head)
Open the UPD and the TCP conversation list.
Load the capture into ethereal.
then measure the time it takes to refilter the capture as fast as possible
10 times using the display filter "tcp.port==80"
After the 10th refilter of the trace, measure the memory footprint of
ethereal using ps.
Repeat this using a gzipped copy of the capture file.
This could be the baseline.
(the time to perform the test should be a couple of minutes at least so that
we can see any differences in test implementations easier)
After this it would be good if someone did this test and collected gprof
data to see where most of the cpu is spent.
I guess but am not sure it is in the actual dissection of the packet.
It would be great to have this verified.
Everything below this line assumes the theory that the dissection of packets
are taking up a significant part of the time to refilter a packet.
Hypothesis: a lot of time is spent redissecting the packets over and over
everytime we refilter the trace.
Possible Solution: reduce the number of times we redissect the packets.
(We would need a preference option called "Faster refiltering at the expense
of memory footprint". Memory is cheap but CPU power is finite)
currently:
Everytime we rescan a capture file we loop over every single packet and:
dissect the packet completely and build an edt tree (in file.c)
prune the edt tree to only contain those fields that are part of any of the
filters we use (display filter, color filter, the filter used by taps etc)
evaluate each filter one at a time using the now pruned edt tree.
discard the edt tree and go to the next packet.
possible optimization:
When we dissect the packet the first time, keep the full edt tree around
and never prune it.
The loop would then be:
Loop over all the packets:
For this packet, find the unpruned edt tree (dont dissect the packet at
all)
apply all filters one at a time on the unpruned edt tree.
This would at the expense of keeping the potentially very memory consuming
edt tree around eliminate the need to redissect the packet.
The memory required by the edt tree could probably be optimized over time.
The risk is, how will the time it takes to evaluate a filter be affected by
the tree not being pruned? What is the execution time compared to number
of fields held in the edt tree ? Ordo(n) or Ordo(n^2) ? Maybe we would
have to do major brainsurgery there as well?
Maybe it turns out to be unfeasible? Who knows until we have tried.
Still, SOME things we do would require a redissect of the packet anyway such
as when clicking on the packet in the packet list but that is not
performance critical.
Can anyone see any obvious flaws in the reasoning above which would make it
unfeasible or the approach irrelevant?
Would anyone that knows gprof be interesting is doing the measurements on
the "baseline" as suggested? and keep testing the same sample file
for all further performacne tests we do so we know if we are going in the
right direction or not?