Wireshark-bugs: [Wireshark-bugs] [Bug 6613] "matches" operator fails to match hex
https://bugs.wireshark.org/bugzilla/show_bug.cgi?id=6613
Tony Trinh <tony19@xxxxxxxxx> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |tony19@xxxxxxxxx
--- Comment #3 from Tony Trinh <tony19@xxxxxxxxx> 2011-11-24 15:36:48 EST ---
(In reply to comment #2)
That's a great point. G_REGEX_RAW indeed disables metacharacter matching for
Unicode (but absolute values are still matched). That is, G_REGEX_RAW allow
these display filters to "work":
udp && data matches "\x1F\xFF\xCA\xFE"
udp && data matches "Ä ä Ü ü ß"
...but not these (when evaluating UCPs):
udp && data matches "\\w+" # double slash is necessary
udp && data matches "\\w{7}$"
For instance, I would use the dfilter below to match the following packet. It
searches for UDP with at least 10 word characters (UCPs included) from the end.
The packet would only show up in the results if G_REGEX_RAW were not initially
set.
Display Filter:
udp && data matches "\\w{10}$"
UDP Packet:
0000 00 00 00 00 00 00 00 00 00 00 00 00 08 00 45 00 ..............E.
0010 00 2d 98 19 00 00 40 11 00 00 c0 a8 01 82 01 01 .-....@.........
0020 01 01 d0 f6 04 57 00 19 c4 56 77 69 65 20 73 70 .....W...Vwie sp
0030 c3 a4 74 20 69 73 74 20 65 73 3f ..t ist es?
There seems to be more than one use case for the "matches" operator:
* Match UTF8-encoded strings as absolute values
* Match UTF8-encoded strings as regex metacharacter patterns
* Match raw byte sequences
I see all three cases as being very useful. I can think of 3 possible
solutions:
1) Set G_REGEX_RAW only if the pattern contains two-letter hex. This
assumes that all non-two-letter hex can be matched without G_REGEX_RAW.
2) Add a flag-character to the beginning of the pattern, not passed to
GRegex (e.g., matches "@\xFF"). If the pattern actually needed the literal '@',
it would have to be escaped with '\'. The flag tells Wireshark to set
G_REGEX_RAW.
3) Add a new operator (e.g., "rmatches") that always sets G_REGEX_RAW.
I like #1 because it's transparent to the user, and it's easiest of the three
to implement. The caveat with 1 is that patterns can't contain a mixture of
metacharacter patterns and two-letter hex, but this never worked to begin with.
--
Configure bugmail: https://bugs.wireshark.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching all bug changes.