Wireshark-dev: Re: [Wireshark-dev] FT_STRING, FT_STRINGZPAD, and null padding
From: Guy Harris <gharris@xxxxxxxxx>
Date: Sat, 5 Sep 2020 14:03:57 -0700
On Sep 5, 2020, at 9:51 AM, John Thacker <johnthacker@xxxxxxxxx> wrote:

> On Sat, Sep 5, 2020 at 2:49 AM Guy Harris <gharris@xxxxxxxxx> wrote:
> 
>> OK, here's the types of strings that there can be in protocols.
>> 
>> 1. Counted strings.  A counted string has a count of the number of octets (it might be half the number of octets if it's UTF-16 or UCS-2, but it's really a count of octets).  There is nothing special about null characters, other than "if some software treats the string as a C string, it won't work correctly if the string happens to include a null character, so maybe that software shouldn't treat it as a C string".
>> 
>> 2. Null-terminated, non-counted strings.  A null-terminated, non-counted, string has no count of the number of octets; it runs until a null terminator (8-bit, 16-bit, whatever) is seen.  Obviously, that string can't contain a null terminator.
>> 
>> 3. Padded strings in a fixed-size buffer.  If the string is the length of the buffer, there is no null terminator; otherwise, there's a null terminator; the protocol may specify that all octets after the null terminator, if the string isn't the length of the buffer - 1, must be null, or may say there is no such requirement.
>> 
>> 4. Strings that are counted *and* null-terminated.  This is redundant, but maybe somebody wants to make sure that there's a null at the end, to make life easier for implementations that use C strings internally.  This doesn't *really* make things easier, unless you're in a 100% controlled environment, because, in any environment in which the software has to deal with packets from buggy or malicious senders, and uses C strings internally, they have to make sure there's a null terminator *anyway* - they *can't* just copy the string and blithely assume that a terminating null is part of the counted blob.
>> 
>> In the case of a counted string, if the string is a 7-octet value of "foo\0bar", it should show up as exactly that in the packet tree item; if not, we should fix it to do so.
> 
> So this does not happen, at least not when proto_tree_add_item() is used on a FT_STRING.

Then it should be fixed.

> Things that call tvb_format_text() will end up producing "foo\000bar" (because \0 is not one of the special cases handled, it defaults to using octal in format_text() in epan/strutil.c, which is used for non-printable but valid ASCII characters.

That's acceptable - if there's a control-A, it'll show up as \001, not \1.

> However, proto_tree_add_item() does not do that.
> 
> I tested this by hexediting a pcap (from here: https://gitlab.com/wireshark/wireshark/-/issues/16208) to put a NUL in the middle of a counted FT_STRING, and it looked like this:
> 
> I had applied the patch to allow SSIDs to be UTF-8 in that bug, it looks like this:
> 
> Tag: SSID parameter set: admini\000tración
>     Tag Number: SSID parameter set (0)
>     Tag length: 15
>     SSID: admini
>         [Expert Info (Warning/Undecoded): Trailing stray characters]
> 
> (admini\000tración also showed up in the FT_COLUMN). In the first entry, where \000 shows up, tvb_format_text() is used explicitly and then that text is appended. In the one where it is cut off, proto_tree_add_item() is used. So that is a bug that should be fixed, then, we want to display the trailing characters as well as warn about them?

Display them. yes.  Warn about them, possibly, although, once we display them, the warning might not be necessary.

> Currently, we use FT_STRING for the first of those, FT_STRINGZ for the second of those, FT_STRINGZPAD for the third of those, and FT_STRINGZ for the fourth of those; the difference between the two FT_STRINGZ cases is whether a length was specified or not.
> That helps, but isn't there also FT_UINT_STRING for the common case of counted strings where the count of the number of octets appears immediately before the string? It seems like mostly a helper combining "add a FT_UINT[...] and return its value, then add a FT_STRING with that length." So ideally detect_trailing_stray_characters() should also operate on the string part of FT_UINT_STRING, and that just hasn't been added, correct?

It hasn't been added yet.

A lot of this is a work in progress.  I need to do some regression testing to discover fields that have the incorrect FT_STRING... type; when that's done, I'll update epan/proto.c to fix various issues.