Wireshark-dev: Re: [Wireshark-dev] Note about proto_tree_add_unicode_string (r43379)
From: Pascal Quantin <pascal.quantin@xxxxxxxxx>
Date: Tue, 19 Jun 2012 21:14:19 +0200
Le 19/06/2012 21:01, Jakub Zawadzki a écrit :
> Hi,
>
> String from tvb_get_ephemeral_string() still needs escaping with format_text(),
> cause it doesn't check encoding.
>
> When you use:
>   tvb_get_ephemeral_string_enc(tvb, offset, length, ENC_UTF_8 | ENC_NA);
>
> It guarantees result encoded in UTF-8:
>  * string as converted from the appropriate encoding to UTF-8 ...
>
> (Code to do it is still in XXX's but this is bug in libwireshark and no one can blame you that you used wrong function :))
Hi,

thanks for the hint (and for adding proto_tree_add_unicode_string :) ).
Still I probably miss something but when looking at the code for
tvb_get_ephemeral_string_enc, I see:
    case ENC_ASCII:
    default:
        /*
         * For now, we treat bogus values as meaning
         * "ASCII" rather than reporting an error,
         * for the benefit of old dissectors written
         * when the last argument to proto_tree_add_item()
         * was a gboolean for the byte order, not an
         * encoding value, and passed non-zero values
         * other than TRUE to mean "little-endian".
         *
         * XXX - should map all octets with the 8th bit
         * not set to a "substitute" UTF-8 character.
         */
        strbuf = tvb_get_ephemeral_string(tvb, offset, length);
        break;

    case ENC_UTF_8:
        /*
         * XXX - should map all invalid UTF-8 sequences
         * to a "substitute" UTF-8 character.
         */
        strbuf = tvb_get_ephemeral_string(tvb, offset, length);
        break;

Do you mean we should already start using tvb_get_ephemeral_string_enc
to continue working once the check for the ASCII 8th bit will be in place?

Regards,
Pascal.