On Sun, Mar 21, 2004 at 06:12:33PM -0800, Richard Sharpe wrote:
> It seems that there are going to be a number of cases where Ethereal does
> not handle internationalization. For example, if someone has an SMB
> capture that contains file names that are in a non-ASCII character set,
> they might have difficulty entering text strings to perform functions
> like:
>
> smb.file contains "some non-ASCII string"
There are two issues here.
The first issue is character sets in fields - not all string fields are
in the same character set, and even if they are, they might use
different encodings (UTF-8 vs. some 2-byte encoding, for example).
The second issue is user input - is the value of a text entry field ISO
8859-x, or UTF-8, or....?
The first issue causes problems even when you *aren't* filtering fields
- we need to somehow handle it. I suspect the right answer is to have
part of the value of a string field be the character set and encoding of
the field; we could canonicalize into some standard encoding, e.g.
UTF-8, but if we're just building a protocol tree to do filtering, and
the filter doesn't involve a particular field, canonicalizing the
field's value is a waste of time (then again, so is storing the value at
all...).
The second issue might be soluble fairly straightforwardly if the text
entry field is UTF-8 (which it might be in GTK2) *and* if if the GUI
includes input methods to let you enter arbitrary characters (e.g., the
Character Palette in Mac OS X).