Wireshark · Wireshark-dev: Re: [Wireshark-dev] Wrongly escaped UTF-8 characters in JSON values ( epan/print.c )

Wireshark-dev: Re: [Wireshark-dev] Wrongly escaped UTF-8 characters in JSON values ( epan/print

From: Andrea Lo Pumo <alopumo@xxxxxxxxx>

Date: Fri, 6 Jul 2018 13:46:34 +0200

From: Dario Lombardo

What do you mean by "I do not know the Wireshark code"? What did you patch? Do you mean you don't know the submission procedure instead?

I mean I do not know the full implications of changing the code as I did. It worked for me because I am just interested in gsm_sms.sms_text, however, before accepting this patch someone with better understanding of the Wireshark code should think if it is ok.

What did you patch?

print_escaped_bare() of epan/print.c

2018-07-05 16:01 GMT+02:00 Andrea Lo Pumo <alopumo@xxxxxxxxx>:

I am using "tshark -T json -V -r file.pcap" and specifically I am looking for the gsm_sms.sms_text field.
I get this output:

"gsm_sms.sms_text": "Ok per\u00c3\u00b2 non piove"

Instead, using "tshark -V -r file.pcap" I get:

SMS text: Ok però non piove

(There is an accent in the "o" of "però")

The problem is that the \uXXYY syntax is UTF-16 (see [1]), while "ò" is UTF-8 and its bytes are c3 b2. Wireshark writes c3 b2 as they were UTF-16.

I solved the problem by changing print_escaped_bare() of epan/print.c as follow:
substitute

        default:
            if (g_ascii_isprint(*p))
                fputc(*p, fh);
            else {
                g_snprintf(temp_str, sizeof(temp_str), "\\u00%02x", (guint8)*p);
                fputs(temp_str, fh);
            }

with

        default:
            fputc(*p, fh);

I do not know the Wireshark code, so I am not submitting a patch. This, however, should work because JSON supports UTF-8 (see again [1]).

[1] >From the JSON page on Wikipedia: JSON exchange in an open ecosystem must be encoded in UTF-8. However, if escaped, those characters must be written using UTF-16 surrogate pairs, a detail missed by some JSON parsers.

Follow-Ups:
- Re: [Wireshark-dev] Wrongly escaped UTF-8 characters in JSON values ( epan/print.c )
  - From: Dario Lombardo
- Re: [Wireshark-dev] Wrongly escaped UTF-8 characters in JSON values ( epan/print.c )
  - From: Richard Sharpe

References:
- [Wireshark-dev] Wrongly escaped UTF-8 characters in JSON values ( epan/print.c )
  - From: Andrea Lo Pumo

Prev by Date: [Wireshark-dev] newbie Q re: user manual
Next by Date: Re: [Wireshark-dev] Wrongly escaped UTF-8 characters in JSON values ( epan/print.c )
Previous by thread: Re: [Wireshark-dev] Wrongly escaped UTF-8 characters in JSON values ( epan/print.c )
Next by thread: Re: [Wireshark-dev] Wrongly escaped UTF-8 characters in JSON values ( epan/print.c )
Index(es):
- Date
- Thread