|From: Guy Harris
|
|
|On Wed, May 26, 2004 at 12:23:59AM +0200, Olivier Biot wrote:
|> I spotted another issue: the HTML files are based on the generated
|> short AUTHORS file; it gets incorrectly incorporated into the HTML
|> documentation. This is probably also the case for the UNIX man pages.
|>
|> Do for example a grep on 'mayer' in order to understand what I mean.
|> You'll see 2 unexpected characters instead of a
|> "lowercase-O-with-German-Umlaut".
|
|Presumably you mean "ISO 8859-1 lowercase-O-with-German-Umlaut" (or, to
|use the term in the ISO 8859-1 files I have, "LATIN SMALL LETTER O WITH
|DIAERESIS", perhaps because there are languages other than German that
|use that glyph - BTW, what umlauts are there *other* than the German
|one? :-), the code for which might mean something other than
|"lowercase-O-with-umlaut" in other character encodings, e.g. other ISO
|8859 encodings (although those might all have lowercase-O-with-diaresis
|there - 8859-2 appears to have it there, for example), various two-byte
|Asian character sets (in which case it might be, for example, the first
|byte of a two-byte character), or UTF-8.
This is correct. Although I expect *NIX man pages to be rendered in the OS
character set.
|So how *should* we deal with the UTF-8 in the AUTHORS file when copying
|the authors to the man pages?
I think we'll require another tool for generating man pages (and HTML
documentation?): recode.
Regarding the HTML issues, I tried adding an <?xml version="1.0"
encoding="UTF-8"> to the HTML output, but that didn't solve the issue :(.
Regards,
Olivier