Wireshark-dev: Re: [Wireshark-dev] tvb_get_string_enc() doesn't always return valid UTF-8
From: Guy Harris <guy@xxxxxxxxxxxx>
Date: Sun, 26 Jan 2014 13:53:17 -0800
On Jan 21, 2014, at 5:01 AM, Evan Huus <eapache@xxxxxxxxx> wrote:

> On Tue, Jan 21, 2014 at 2:40 AM, Guy Harris <guy@xxxxxxxxxxxx> wrote:
>> 
>> On Jan 20, 2014, at 5:59 PM, Evan Huus <eapache@xxxxxxxxx> wrote:
>> 
>>> In which case is dumb search-and-replace of tvb_get_string with
>>> tvb_get_string_enc and ENC_ASCII an easy way to make (part of) the API
>>> transition?
>> 
>> Did somebody say that had been done or suggest that it be done?
> 
> I thought it was kind of implied when you wrote "We should also
> probably audit all calls to tvb_get_string() and tvb_get_stringz() in
> dissectors and change them to tvb_get_string_enc() with the
> appropriate encoding."

No, it isn't implied; the last four words of what you quote are "with the appropriate encoding".  A dumb search-and-replace won't do that, as ENC_ASCII isn't the appropriate encoding for all those calls; that's why I said "audit", as in "read the code and the spec and see what the right encoding is".

> If tvb_get_string() behaves identically to tvb_get_string_enc(...
> ENC_ASCII) then there doesn't seem much point in having both.

There *isn't* much point in having both.  If we "probably audit all calls to tvb_get_string() and tvb_get_stringz() in dissectors and change them to tvb_get_string_enc() with the appropriate encoding.", the next step would be to eliminate tvb_get_string() and tvb_get_stringz() and then rename tvb_get_string_enc() and tvb_get_stringz_enc() to just tvb_get_string() and tvb_get_stringz() (as "_enc" is then redundant).

Given that we already added a wmem scope argument to tvb_get_string() and tvb_get_stringz(), there's no particular reason not to make further API changes to those routines.