Wireshark-dev: Re: [Wireshark-dev] RFC: sorted value_string + bsearch
From: Anders Broman <anders.broman@xxxxxxxxxxxx>
Date: Fri, 23 Apr 2010 16:18:40 +0200
Hi,
I have the same concerns (unsorted vs). Using a script
Is probably the only feasable option possibly debug code can be used
To scan the vs in the match_strval_fast() call and be used by the build bot but
That would only catch problems with the captures we have.

I'm going to toy with this implementation
When I get the time:
Using the index method should realy improve things.
Index: value_string.c
===================================================================
--- value_string.c      (revision 32542)
+++ value_string.c      (working copy)
@@ -49,6 +49,19 @@
   return ep_strdup_printf(fmt, val);
 }

+const gchar*
+val_to_str_fast(const guint32 val, const value_string_fast *vs, const char *fmt
) {
+  const gchar *ret;
+
+  g_assert(fmt != NULL);
+
+  ret = match_strval_fast(val, vs);
+  if (ret != NULL)
+    return ret;
+
+  return ep_strdup_printf(fmt, val);
+}
+
 /* Tries to match val against each element in the value_string array vs.
    Returns the associated string ptr on a match.
    Returns 'unknown_str', on failure. */
@@ -65,6 +78,19 @@
   return unknown_str;
 }

+const gchar*
+val_to_str_fast_const(const guint32 val, const value_string_fast *vs, const cha
r *unknown_str) {
+  const gchar *ret;
+
+  g_assert(unknown_str != NULL);
+
+  ret = match_strval_fast(val, vs);
+  if (ret != NULL)
+    return ret;
+
+  return unknown_str;
+}
+
 /* Tries to match val against each element in the value_string array vs.
    Returns the associated string ptr, and sets "*idx" to the index in
    that table, on a match, and returns NULL, and sets "*idx" to -1,
@@ -94,6 +120,40 @@
     return match_strval_idx(val, vs, &ignore_me);
 }

+const gchar*
+match_strval_fast(const guint32 val, const value_string_fast *vs) {
+  guint low, idx, max;
+  guint32 item;
+  if(vs) {
+       switch(vs->match_type){
+               case VS_SEARCH:
+                       match_strval( val, vs->vals);
+                       break;
+               case VS_INDEX:
+                       return vs->vals[val].strptr;
+                       break;
+               case VS_BIN_TREE:
+                       /* XXX, bsearch() */
+                       for (low = 0, max = vs->length; low < max; ) {
+                         idx = (low + max) / 2;
+                         item = vs->vals[idx].value;
+
+                         if (val < item)
+                               max = idx;
+                         else if (val > item)
+                               low = idx + 1;
+                         else
+                               return vs->vals[idx].strptr;
+                       }
+                       break;
+               default:
+                       g_assert_not_reached();
+                       break;
+       }/*switch*/
+  }
+  return NULL;
+}
+
 /* Tries to match val against each element in the value_string array vs.
    Returns the associated string ptr on a match.
    Formats val with fmt, and returns the resulting string, on failure. */
Index: value_string.h
===================================================================
--- value_string.h      (revision 32542)
+++ value_string.h      (working copy)
@@ -34,6 +34,24 @@
   const gchar   *strptr;
 } value_string;

+/* The way matching of value is done in a value_string:
+ * 0 Sequential search (as in a normal value string)
+ * 1 The value used as an index(the value string MUST have all values 0-max def
ined)
+ * 2 Binary search, the valuse MUST be in numerical order.
+ */
+#define VS_SEARCH      0
+#define VS_INDEX    1
+#define VS_BIN_TREE 2
+
+typedef struct {
+  guint match_type;                            /* One of the values abowe */
+  guint length;                                        /* length of the array
   */
+  const value_string *vals;     /* the value string        */
+} value_string_fast;
+
+#define VALUE_STRING_FAST(x,y) \
+       value_string_fast x##_fast = { y, array_length(x)-1, x }
+
 /* Struct for the str_to_str, match_strstr_idx, and match_strstr functions */

 typedef struct _string_string {
@@ -59,16 +77,19 @@

 /* Like match_strval_idx(), but doesn't return the index. */
 extern const gchar* match_strval(const guint32 val, const value_string *vs);
+extern const gchar* match_strval_fast(const guint32 val, const value_string_fas
t *vs);

 /* Tries to match val against each element in the value_string array vs.
    Returns the associated string ptr on a match.
    Formats val with fmt, and returns the resulting string, on failure. */
 extern const gchar* val_to_str(const guint32 val, const value_string *vs, const
 char *fmt);
+extern const gchar* val_to_str_fast(const guint32 val, const value_string_fast
*vs, const char *fmt);

 /* Tries to match val against each element in the value_string array vs.
    Returns the associated string ptr on a match.
    Returns 'unknown_str', on failure. */
 extern const gchar* val_to_str_const(const guint32 val, const value_string *vs,
 const char *unknown_str);
+extern const gchar* val_to_str_fast_const(const guint32 val, const value_string
_fast *vs, const char *unknown_str);

 /* Tries to match val against each element in the value_string array vs.
    Returns the associated string ptr, and sets "*idx" to the index in
 

-----Original Message-----
From: wireshark-dev-bounces@xxxxxxxxxxxxx [mailto:wireshark-dev-bounces@xxxxxxxxxxxxx] On Behalf Of Maynard, Chris
Sent: den 23 april 2010 16:05
To: Developer support list for Wireshark
Subject: Re: [Wireshark-dev] RFC: sorted value_string + bsearch

I think this is a good idea, but it obviously requires the value string  array to be properly sorted for it to work well.  And the bigger the array, the more enticing it would be to use this faster method, but I think it will also be potentially harder to ensure that those bigger arrays are indeed properly sorted.  Developers are bound to make mistakes, especially with the very large arrays, and as other developers add new entries, they may not know that a particular array is supposed to be sorted, and even if they do, there's no guarantee that a properly sorted array will remain so.

So, with that in mind ... is there a way, or could there be a way, to verify that certain value string arrays are properly sorted and remain so, perhaps during compilation or at least via a tools/checkXYZ.pl script?

Also, is there a recommendation on the cutoff size of a value string array where one would choose to use the faster method?  Maybe this requires more profiling?  Perhaps it would be a good idea to also have checkXYZ.pl complain about all unsorted value string arrays above a certain size, then they could be more easily identified and switched over to the faster method?

- Chris

________________________________________
From: wireshark-dev-bounces@xxxxxxxxxxxxx [wireshark-dev-bounces@xxxxxxxxxxxxx] On Behalf Of Anders Broman [anders.broman@xxxxxxxxxxxx]
Sent: Friday, April 23, 2010 4:04 AM
To: Developer support list for Wireshark
Subject: Re: [Wireshark-dev] RFC: sorted value_string + bsearch

Hi,
Looking at the code again I think it looks promising a further enhancement could Be to add a flags field indicating the search method if the value_string is "full"
E.g. all values 0-x used the value could be directly used as an index to the sought string.

I just need to find the time to do some profiling/tests...

Regards
Anders

-----Original Message-----
From: wireshark-dev-bounces@xxxxxxxxxxxxx [mailto:wireshark-dev-bounces@xxxxxxxxxxxxx] On Behalf Of Jakub Zawadzki
Sent: den 22 april 2010 23:40
To: Ed Beroset; Developer support list for Wireshark
Subject: Re: [Wireshark-dev] RFC: sorted value_string + bsearch

On Thu, Apr 22, 2010 at 10:47:14AM -0400, Ed Beroset wrote:
> I may have missed it, but have we measured to see if this is worth optimizing in the first place?

I have some old callgrind log, where match_strval (for 58736 frames) is called a lot:

 from dissect_ieee80211 58,7k
 from ieee_80211_add_tagged_parameters 228k

It's problem with ieee80211 dissector, but when you grep dissector sources:

#v+
$ grep -Ir 'val_to_str' ./ | wc -l
3621

$ grep -Ir 'match_strval' ./ | wc -l
305
#v-

It's a common thing to use these functions.

> If not, I think I'd favor choosing code

Come on, this code require only adding:
   static const VALUE_STRING_FAST(some_value_string_array);

I don't see much difference between using:
   value_to_str_fast(..., &foo_fast) and value_to_str(..., foo)

If you don't like automatic variable naming I can replace VALUE_STRING_FAST with:

  #define VALUE_STRING_FAST_INIT(x) = { array_length(x)-1, x }
  static const value_string_fast foo = VALUE_STRING_FAST_INIT(bar);

Dissector maintainer is free to use old/new (or write his own) function he want, but putting 57 new lines to core won't hurt.

> clarity over an inconsequential speedup.

ATM I don't have working profiler to benchmark wireshark, but I made some _raw_ benchmark [1], for 5k entries and searching 0 to 6k three times

Average time for 16 tests:
 val_to_str_fast: 0.02s
      val_to_str: 1.56s

Sorry for not providing real-life benchmark.

Cheers.

[1] http://szara.chmurka.net/value_string-bench.c
CONFIDENTIALITY NOTICE: The contents of this email are confidential and for the exclusive use of the intended recipient. If you receive this email in error, please delete it from your system immediately and notify us either by email, telephone or fax. You should not copy, forward, or otherwise disclose the content of the email.

___________________________________________________________________________
Sent via:    Wireshark-dev mailing list <wireshark-dev@xxxxxxxxxxxxx>
Archives:    http://www.wireshark.org/lists/wireshark-dev
Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev
             mailto:wireshark-dev-request@xxxxxxxxxxxxx?subject=unsubscribe