Wireshark · Wireshark-bugs: [Wireshark-bugs] [Bug 8318] New: UTF-16 surrogate pair characters in JSON strings trigger DISSECTOR

Wireshark-bugs: [Wireshark-bugs] [Bug 8318] New: UTF-16 surrogate pair characters in JSON string

Date: Tue, 12 Feb 2013 05:06:52 +0000

Bug ID	8318
Summary	UTF-16 surrogate pair characters in JSON strings trigger DISSECTOR_ASSERT
Classification	Unclassified
Product	Wireshark
Version	SVN
Hardware	All
OS	All
Status	UNCONFIRMED
Severity	Minor
Priority	Low
Component	Wireshark
Assignee	bugzilla-admin@wireshark.org
Reporter	jyoung@gsu.edu

Build Information:
Version 1.9.0-SVN-47637 (SVN Rev 47637 from /trunk)

Copyright 1998-2013 Gerald Combs <gerald@wireshark.org> and contributors.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Compiled (64-bit) with GTK+ 2.24.10, with Cairo 1.8.6, with Pango 1.30.0, with
GLib 2.32.3, with libpcap, with libz 1.2.3, without POSIX capabilities, without
libnl, with SMI 0.4.8, without c-ares, without ADNS, with Lua 5.1, without
Python, with GnuTLS 2.12.19, with Gcrypt 1.5.0, with MIT Kerberos, with GeoIP,
with PortAudio V19-devel (built Aug 12 2012 22:27:54), with AirPcap.

Running on Mac OS X 10.8.2, build 12C2034 (Darwin 12.2.1), with locale .UTF-8,
with libpcap version 1.1.1, with libz 1.2.5, GnuTLS 2.12.19, Gcrypt 1.5.0,
without AirPcap.
      Intel(R) Core(TM) i7-3720QM CPU @ 2.60GHz

Built using gcc 4.2.1 (Apple Inc. build 5666) (dot 3).

Wireshark is Open Source Software released under the GNU General Public
License.

Check the man page and http://www.wireshark.org for more information.
--
A DISSECTOR_ASSERT() is triggered in proto.c's proto_tree_add_unicode_string()
when a utf8 string generated by packet-json.c's json_string_unescape() 
has processed an escaped UTF-16 surrogate character pair.

JSON (RFC 4627) [1] allows arbitrary plane 0 Unicode characters to be encoded
using _javascript_ style six-character "\uXXXX" sequences where XXXX is a
four-hexadecimal value that represents the UTF-16 value for a Unicode
character.   

Characters from Unicode planes 1-16 are represented by UTF-16 surrogate
character pairs.  The UTF-16 surrogate character pairs are encoded as
twelve-character sequences i.e. "\uXXXX\uXXXX".

json_string_unescape() currently fails to determine if a decoded "\uXXXX"
sequence represents a Unicode surrogate character pair.  When the first
surrogate character is encountered it must be combined with following surrogate
character to produce the final unicode character.   

[1] http://www.ietf.org/rfc/rfc4627.txt

You are receiving this mail because:

You are watching all bug changes.

Follow-Ups:
- [Wireshark-bugs] [Bug 8318] UTF-16 surrogate pair characters in JSON strings trigger DISSECTOR_ASSERT
  - From: bugzilla-daemon
- [Wireshark-bugs] [Bug 8318] UTF-16 surrogate pair characters in JSON strings trigger DISSECTOR_ASSERT
  - From: bugzilla-daemon
- [Wireshark-bugs] [Bug 8318] UTF-16 surrogate pair characters in JSON strings trigger DISSECTOR_ASSERT
  - From: bugzilla-daemon
- [Wireshark-bugs] [Bug 8318] UTF-16 surrogate pair characters in JSON strings trigger DISSECTOR_ASSERT
  - From: bugzilla-daemon

Prev by Date: [Wireshark-bugs] [Bug 8287] HTTP dissector: add timespan between req/res, add links to req/res and next/prev req
Next by Date: [Wireshark-bugs] [Bug 8318] UTF-16 surrogate pair characters in JSON strings trigger DISSECTOR_ASSERT
Previous by thread: [Wireshark-bugs] [Bug 7885] Buildbot crash output: fuzz-2012-10-21-7332.pcap
Next by thread: [Wireshark-bugs] [Bug 8318] UTF-16 surrogate pair characters in JSON strings trigger DISSECTOR_ASSERT
Index(es):
- Date
- Thread