Wireshark-bugs: [Wireshark-bugs] [Bug 8318] New: UTF-16 surrogate pair characters in JSON string
Bug ID |
8318
|
Summary |
UTF-16 surrogate pair characters in JSON strings trigger DISSECTOR_ASSERT
|
Classification |
Unclassified
|
Product |
Wireshark
|
Version |
SVN
|
Hardware |
All
|
OS |
All
|
Status |
UNCONFIRMED
|
Severity |
Minor
|
Priority |
Low
|
Component |
Wireshark
|
Assignee |
bugzilla-admin@wireshark.org
|
Reporter |
jyoung@gsu.edu
|
Build Information:
Version 1.9.0-SVN-47637 (SVN Rev 47637 from /trunk)
Copyright 1998-2013 Gerald Combs <gerald@wireshark.org> and contributors.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Compiled (64-bit) with GTK+ 2.24.10, with Cairo 1.8.6, with Pango 1.30.0, with
GLib 2.32.3, with libpcap, with libz 1.2.3, without POSIX capabilities, without
libnl, with SMI 0.4.8, without c-ares, without ADNS, with Lua 5.1, without
Python, with GnuTLS 2.12.19, with Gcrypt 1.5.0, with MIT Kerberos, with GeoIP,
with PortAudio V19-devel (built Aug 12 2012 22:27:54), with AirPcap.
Running on Mac OS X 10.8.2, build 12C2034 (Darwin 12.2.1), with locale .UTF-8,
with libpcap version 1.1.1, with libz 1.2.5, GnuTLS 2.12.19, Gcrypt 1.5.0,
without AirPcap.
Intel(R) Core(TM) i7-3720QM CPU @ 2.60GHz
Built using gcc 4.2.1 (Apple Inc. build 5666) (dot 3).
Wireshark is Open Source Software released under the GNU General Public
License.
Check the man page and http://www.wireshark.org for more information.
--
A DISSECTOR_ASSERT() is triggered in proto.c's proto_tree_add_unicode_string()
when a utf8 string generated by packet-json.c's json_string_unescape()
has processed an escaped UTF-16 surrogate character pair.
JSON (RFC 4627) [1] allows arbitrary plane 0 Unicode characters to be encoded
using _javascript_ style six-character "\uXXXX" sequences where XXXX is a
four-hexadecimal value that represents the UTF-16 value for a Unicode
character.
Characters from Unicode planes 1-16 are represented by UTF-16 surrogate
character pairs. The UTF-16 surrogate character pairs are encoded as
twelve-character sequences i.e. "\uXXXX\uXXXX".
json_string_unescape() currently fails to determine if a decoded "\uXXXX"
sequence represents a Unicode surrogate character pair. When the first
surrogate character is encountered it must be combined with following surrogate
character to produce the final unicode character.
[1] http://www.ietf.org/rfc/rfc4627.txt
You are receiving this mail because:
- You are watching all bug changes.