Wireshark-bugs: [Wireshark-bugs] [Bug 8318] New: UTF-16 surrogate pair characters in JSON string
Date: Tue, 12 Feb 2013 05:06:52 +0000
Bug ID 8318
Summary UTF-16 surrogate pair characters in JSON strings trigger DISSECTOR_ASSERT
Classification Unclassified
Product Wireshark
Version SVN
Hardware All
OS All
Status UNCONFIRMED
Severity Minor
Priority Low
Component Wireshark
Assignee bugzilla-admin@wireshark.org
Reporter jyoung@gsu.edu

Build Information:
Version 1.9.0-SVN-47637 (SVN Rev 47637 from /trunk)

Copyright 1998-2013 Gerald Combs <gerald@wireshark.org> and contributors.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Compiled (64-bit) with GTK+ 2.24.10, with Cairo 1.8.6, with Pango 1.30.0, with
GLib 2.32.3, with libpcap, with libz 1.2.3, without POSIX capabilities, without
libnl, with SMI 0.4.8, without c-ares, without ADNS, with Lua 5.1, without
Python, with GnuTLS 2.12.19, with Gcrypt 1.5.0, with MIT Kerberos, with GeoIP,
with PortAudio V19-devel (built Aug 12 2012 22:27:54), with AirPcap.

Running on Mac OS X 10.8.2, build 12C2034 (Darwin 12.2.1), with locale .UTF-8,
with libpcap version 1.1.1, with libz 1.2.5, GnuTLS 2.12.19, Gcrypt 1.5.0,
without AirPcap.
      Intel(R) Core(TM) i7-3720QM CPU @ 2.60GHz

Built using gcc 4.2.1 (Apple Inc. build 5666) (dot 3).

Wireshark is Open Source Software released under the GNU General Public
License.

Check the man page and http://www.wireshark.org for more information.
--
A DISSECTOR_ASSERT() is triggered in proto.c's proto_tree_add_unicode_string()
when a utf8 string generated by packet-json.c's json_string_unescape() 
has processed an escaped UTF-16 surrogate character pair.

JSON (RFC 4627) [1] allows arbitrary plane 0 Unicode characters to be encoded
using _javascript_ style six-character "\uXXXX" sequences where XXXX is a
four-hexadecimal value that represents the UTF-16 value for a Unicode
character.   

Characters from Unicode planes 1-16 are represented by UTF-16 surrogate
character pairs.  The UTF-16 surrogate character pairs are encoded as
twelve-character sequences i.e. "\uXXXX\uXXXX".

json_string_unescape() currently fails to determine if a decoded "\uXXXX"
sequence represents a Unicode surrogate character pair.  When the first
surrogate character is encountered it must be combined with following surrogate
character to produce the final unicode character.   

[1] http://www.ietf.org/rfc/rfc4627.txt


You are receiving this mail because:
  • You are watching all bug changes.