Wireshark-dev: Re: [Wireshark-dev] guint8* and gchar* ... and Vim ?! :)
Sebastien Tandel wrote:
is there any reason to use guint8* instead of gchar*?
For what purpose?
If you're dealing with an array of 8-bit bytes, or a pointer to a
sequence of those, guint8 is the right type; it makes it clear that
they're bytes, not characters (it might be binary, it might be a
sequence of 16-bit "bytes" in a UTF-16-encoded string, it might be a
UTF-8 string, etc.).
I.e., tvb_get_ptr(), for example, should return a "guint8 *", as should
tvb_memdup(), and the raw packet data you get from Wiretap should be
pointed to by a "guint8 *".
Note also that you can safely pass a guint8 or guchar to one of the
<ctype.h> routines, but you can't safely pass a gchar to them, as they
might get sign-extended into negative values if the 8th bit is set (I
think that none of the popular platforms for Windows and modern UN*Xes
have C compilers with "char" an unsigned type, so I think "might" can be
replaced by "will" in practice).
With gcc-4.0, there is the new feature warning you that "pointer target
differs in signedness" (which is not such a bad thing).
I suspect most of those warnings are for cases where you're treating
byte sequences as character strings.
What I think we *really* need to do, for those cases, is have a
different way of handling strings. The current way we handle strings
doesn't take into account the fact that there are a number of different
character encodings for strings - "ASCII" (which would imply that a byte
with the 8th bit set is an error), ISO 8859/n, other EUC encodings,
Shift-JIS, KOI8, UTF-8, UTF-16, etc..
See the first item under "Dissector infrastructure" on the
http://wiki.wireshark.org/Development/Wishlist
page. (That discusses two items - the dissector APIs for handling
strings, and the UI aspects of this. The former doesn't require the
latter - we can continue to display non-ASCII characters as escape
sequences - but the latter, which is something we should ultimately do,
requires some way of getting all strings from packets translated into
Unicode.)
May we change these guint8* to gchar* ? I mean may we change the type of
the concerned variables and not cast to every call of a function ?
Which ones are you thinking of? We shouldn't globally replace guint8
with gchar, as per my comments in the beginning.