Wireshark-bugs: [Wireshark-bugs] [Bug 10445] New: UTF-8 characters end up escaped in psml output
Date: Thu, 04 Sep 2014 19:57:19 +0000
Bug ID 10445
Summary UTF-8 characters end up escaped in psml output
Product Wireshark
Version 1.12.0
Hardware All
OS All
Status UNCONFIRMED
Severity Normal
Priority Low
Component TShark
Assignee bugzilla-admin@wireshark.org
Reporter joe@qacafe.com

Build Information:
TShark 1.12.0 (Git Rev Unknown from unknown)

Copyright 1998-2014 Gerald Combs <gerald@wireshark.org> and contributors.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Compiled (64-bit) with GLib 2.26.1, with libpcap, with libz 1.2.8, without
POSIX
capabilities, without libnl, without SMI, with c-ares 1.7.0, with Lua 5.1,
without Python, with GnuTLS 2.8.5, with Gcrypt 1.4.5, with MIT Kerberos,
without
GeoIP.

Running on Linux 2.6.32-358.el6.x86_64, with locale en_US.UTF-8, with libpcap
version 1.4.0, with libz 1.2.8.
Intel(R) Core(TM) i5-4430 CPU @ 3.00GHz

Built using gcc 4.4.7 20120313 (Red Hat 4.4.7-4).

--
Some protocol dissectors are now using UTF-8 characters. Notably TCP, UDP, and
SCTP are using UTF-8 \xe2\x86\x92 for a UTF-8 right arrow instead of the ascii
friendly " > ".

Unfortunately the psml and pdml output calls print_escaped_xml() which ends up
escaping UTF-8 characters. UTF-8 is the default encoding for XML and these
characters don't need really need to be escaped.

The escape UTF-8 characters end up in the psml where they are not very useful.

See the last <section> below.

<packet>
<section>31843</section>
<section>568.363627</section>
<section>193.37.150.253</section>
<section>1.2.3.4</section>
<section>TCP</section>
<section>60</section>
<section>80\xe2\x86\x924267 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0</section>
</packet>


You are receiving this mail because:
  • You are watching all bug changes.