Wireshark-bugs: [Wireshark-bugs] [Bug 8318] New: UTF-16 surrogate pair characters in JSON string
      
      
    
    
        
          | Bug ID | 8318 | 
        
          | Summary | UTF-16 surrogate pair characters in JSON strings trigger DISSECTOR_ASSERT | 
        
          | Classification | Unclassified | 
        
          | Product | Wireshark | 
        
          | Version | SVN | 
        
          | Hardware | All | 
        
          | OS | All | 
        
          | Status | UNCONFIRMED | 
        
          | Severity | Minor | 
        
          | Priority | Low | 
        
          | Component | Wireshark | 
        
          | Assignee | bugzilla-admin@wireshark.org | 
        
          | Reporter | jyoung@gsu.edu | 
      
        
        Build Information:
Version 1.9.0-SVN-47637 (SVN Rev 47637 from /trunk)
Copyright 1998-2013 Gerald Combs <gerald@wireshark.org> and contributors.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Compiled (64-bit) with GTK+ 2.24.10, with Cairo 1.8.6, with Pango 1.30.0, with
GLib 2.32.3, with libpcap, with libz 1.2.3, without POSIX capabilities, without
libnl, with SMI 0.4.8, without c-ares, without ADNS, with Lua 5.1, without
Python, with GnuTLS 2.12.19, with Gcrypt 1.5.0, with MIT Kerberos, with GeoIP,
with PortAudio V19-devel (built Aug 12 2012 22:27:54), with AirPcap.
Running on Mac OS X 10.8.2, build 12C2034 (Darwin 12.2.1), with locale .UTF-8,
with libpcap version 1.1.1, with libz 1.2.5, GnuTLS 2.12.19, Gcrypt 1.5.0,
without AirPcap.
      Intel(R) Core(TM) i7-3720QM CPU @ 2.60GHz
Built using gcc 4.2.1 (Apple Inc. build 5666) (dot 3).
Wireshark is Open Source Software released under the GNU General Public
License.
Check the man page and http://www.wireshark.org for more information.
--
A DISSECTOR_ASSERT() is triggered in proto.c's proto_tree_add_unicode_string()
when a utf8 string generated by packet-json.c's json_string_unescape() 
has processed an escaped UTF-16 surrogate character pair.
JSON (RFC 4627) [1] allows arbitrary plane 0 Unicode characters to be encoded
using _javascript_ style six-character "\uXXXX" sequences where XXXX is a
four-hexadecimal value that represents the UTF-16 value for a Unicode
character.   
Characters from Unicode planes 1-16 are represented by UTF-16 surrogate
character pairs.  The UTF-16 surrogate character pairs are encoded as
twelve-character sequences i.e. "\uXXXX\uXXXX".
json_string_unescape() currently fails to determine if a decoded "\uXXXX"
sequence represents a Unicode surrogate character pair.  When the first
surrogate character is encountered it must be combined with following surrogate
character to produce the final unicode character.   
[1] http://www.ietf.org/rfc/rfc4627.txt
         
      
      
      You are receiving this mail because:
      
      
          - You are watching all bug changes.