Wireshark-dev: Re: [Wireshark-dev] [Wireshark-commits] rev 35393: /trunk/epan/dissectors/ /trun
From: Jeff Morriss <jeff.morriss.ws@xxxxxxxxx>
Date: Thu, 06 Jan 2011 22:33:12 -0500
On 01/06/2011 12:49 PM, Martin Mathieson wrote:
    Martin, I assume the pre-tvb_get_ptr() code here was similar to this
    change in that it only retrieved the string once?  (I ask since
    several of the strings are used multiple times.)


The diff is at
http://anonsvn.wireshark.org/viewvc/trunk/epan/dissectors/packet-catapult-dct2000.c?r1=28085&r2=28183&pathrev=35393
<http://anonsvn.wireshark.org/viewvc/trunk/epan/dissectors/packet-catapult-dct2000.c?r1=28085&r2=28183&pathrev=35393>.

(Sorry, I don't know why I didn't look that up myself...)

There were some strings that were being retrived in more than one place,
but probably only once for any given frame.  I suspect the slowest part
was probably the block around line 1610 where it was pulling out these 3
strings for every frame.

Well, a couple are added to the protocol item (around line 1610) and then again to col_info (around line 2102).

After remembering that profiling (at least in its easiest form with the 'time' command) isn't so hard, I played around a bit. After building a decent-sized dct2000 file (taking the sample from SampleCaptures and merging it with itself until I had a 276 Mb file), I tried before and after this change and I can't find a measurable difference in the CPU usage. I even tried forcing my (AMD) CPU down to 1 GHz to exaggerate the difference, but I still got only a couple of seconds CPU time difference out of over 5 minutes--and in that case rev 35393's code was faster.

Maybe I'll try tomorrow on a SPARC: I know that memcpy()s are a lot more expensive there than on x86.

    I suppose for cases like this a get_string function that uses the
    TVB as the backing store (and only pushes the
    tvb_get_strsize()+tvb_get_ptr() out of the dissector and into
    tvbuff.c) might be an option to help reduce the prevalence of
    tvb_get_ptr() in dissectors.  That may not be a bad idea because I
    think I saw some other dissectors doing the same thing as this
    one--and why take the performance hit?

My strings are null-terminated, and I was using
tvb_get_ephemeral_string(), whereas I already knew the lengths from when
I first parsed the line, and that they do terminate with a NULL before
the end of the tvb.  Would this function do more than just cast
tvb_get_ptr() to a char* ?  If not, it wouldn't be any more safe.

Note that my change also removed the tvb_getstrsize() in favor of using the _string*z* function (which does it for you) and each string is retrieved exactly once.

My thought on the new function was to only provide a, say, tvb_get_const_stringz() function which:

- gets the string length with tvb_get_strsize() # now we know we'll provide a NULL-terminated string
- calls tvb_get_ptr() to get a pointer (and ensure the TVB is flat)
- return a *const* char*

It is no safer than the code in this dissector (which is already safe) and doesn't leave the dissector writer in a more dangerous position (going off the end of a string--whether it's stored in a TVB or not--is always going to be bad). Its sole purpose is to remove some dissector usage of tvb_get_ptr(). Not sure if it's worth it--maybe I'll see if other dissectors could use it too.