Wireshark-dev: Re: [Wireshark-dev] tvb_get_string_enc() doesn't always return valid UTF-8
From: Guy Harris <guy@xxxxxxxxxxxx>
Date: Sun, 26 Jan 2014 14:43:18 -0800
On Jan 26, 2014, at 2:32 PM, Evan Huus <eapache@xxxxxxxxx> wrote:

> OK. I just meant that since tvb_get_string() is currently ASCII, a
> dumb search and replace will let us make the API change now without
> any regressions. We can then audit calls that could be incorrect.

I apologize - I misparsed your question as "why would dumb search-and-replace of tvb_get_string with tvb_get_string_enc and ENC_ASCII be an easy way to make (part of) the API transition?", i.e. that you were saying that dumb search-and-replace didn't sound like a good idea to you, rather than as "so does that mean that we should start by doing a dumb search-and-replace of tvb_get_string with tvb_get_string_enc and ENC_ASCII, as an easy way to make (part of) the API transition?"

(It might've been clearer as "in which case, is dumb search and replace", so that dummies like me read "in which case" as meaning "therefore" rather than "to which case are you referring where...")

> Admittedly, it's easier to track which calls have been audited if we
> do it gradually, so that's probably a better choice anyways.

Yes.  In some cases, ENC_ASCII may well be appropriate, if the protocol spec says that the string must be ASCII (i.e., ASCII, and not ISO 8859-n, and not MacWhatever, and not DOS or Windows code page whatever, and not PickYourEUCMultiByteCodeSet, and not UTF-8...), and ENC_ASCII as the result of a dumb search-and-replace is, absent a "this really means ASCII" comment, indistinguishable from ENC_ASCII as the result of looking in the protocol specification and seeing that they really mean ASCII.