Ethereal-dev: Re: [Ethereal-dev] UCP protocols descissor (sms)

Note: This archive is from the project's previous web site, ethereal.com. This list is no longer active.

From: Guy Harris <guy@xxxxxxxxxxxx>
Date: Mon, 15 Dec 2003 01:32:30 -0800
On Thu, Dec 11, 2003 at 11:57:38AM +0300, Taras wrote:
> Is it possible to add support for IRA encoded strings

By "IRA" do you mean the ISO 646 character set?

If so, then would that require a run-time configuration option to
specify which national variant of ISO 646 is being used?

> to ./epan/ftypes/ftype-string.c file?

It's probably not the right thing to do.  ftype-string.c is for the
FT_STRING types, and there should't be different string types for
different character sets.  For one thing, there are fields where the
character set can't be determined at compile time, just as there are
fields where the byte order can't be determined at compile time.  For
another thing, there are a *lot* of character sets that Ethereal should,
eventually, be capable of handling, and it'd probably be best if the
strings handled in epan/ftypes/ftype-string.c were in some standardized
character set and encoding.

In the short term, at least for those national variants that map to ISO
8859/1, the way to handle ISO 646 would be to map it to ISO 8859/1 and
use the mapped string as the value to use in "proto_tree_add_string()".

That's not the ideal answer outside of the Americas and Western Europe
if there are ISO 646 variants that require characters not in 8859/1.  If
that's the case, in the medium term, it might be useful to, at least
temporarily, have some option for Ethereal to specify which 8859/x
variant is being used.

That's not the ideal answer outside of locales using only single-byte
character sets, however.  In the long term, Ethereal should probably
have the string types have values in some ISO 10646 encoding (UTF-8,
etc.), and, for cases where the packet data isn't UTF-8, have the
dissector translate from the character set in the packet (some Windows
code page, some Mac character set, some EUC character set, some other
"extended ASCII" character set, EBCDIC, UTF-8-encoded ISO 10646, 16-bit
Unicode, etc.) into that 10646 encoding.