Wireshark-bugs: [Wireshark-bugs] [Bug 12763] tshark print 'TCP flags' with non-ascci chars on so
Comment # 9
on bug 12763
from Gerald Combs
(In reply to Francois-Xavier Le Bail from comment #8)
> $ ./tshark -r pkt-1.pcap | iconv -f UTF-8 -t ASCII//TRANSLIT
> 1 0.000000000 192.139.46.66 iconv: illegal input sequence at position 30
> $ ./tshark -V -r pkt-1.pcap | iconv -f UTF-8 -t ASCII//TRANSLIT
> [TCP Flags: iconv: illegal input sequence at position 3090
> with LANG=C and iconv, it's better for the arrow, not for the dot.
> with LANG=C.UTF-8 and iconv, errors.
> See the changes when filter with grep.
> Why not just '->' and '.' for tshark ?
TShark doesn't generate the arrows and dots, the TCP dissector and other
dissectors do. Text from dissectors is printed to stdout by TShark and rendered
in a window by Wireshark. For the purposes of this bug dissector output can be
split into four categories:
- Plain ASCII
- Conservative, non-ASCII UTF-8 sequences which are present in the dissector
sources
- Other valid, non-ASCII UTF-8 sequences which dissectors might read from the
wire
- Invalid UTF-8 sequences which buggy dissectors might generate
The following conservative sequences are used in the dissector source code:
- Middle dot: ·
- Degree sign: °
- Mu: µ
- Arrows: → ← ↔︎
Note that these are all part of code page 437, which shipped with the original
IBM PC. I'm pretty sure part or all of them were available in the VT3xx or
VT2xx series of terminals, and they're all present in Windows Glyph List 4. I
think it's reasonable to expect a general purpose computer in 2016 to be able
to render them. Either way, you still have to deal with valid and sometimes
invalid sequences from the wire.
For environments that don't support UTF-8 there are a number of utilities that
will transliterate UTF-8 to ASCII including including iconv, uconv, and recode.
I'm not sure why iconv isn't working for you, but it transliterates "→" to "->"
and "·" to "." here on both macOS and Linux. I ran a few experiments with
recode and uconv, and recode also worked well:
tshark ... | recode -f utf8..ascii
"uconv -x 'Any-Latin;Latin-ASCII'" translated the dots but passed the arrows
through unchanged here.
You are receiving this mail because:
- You are watching all bug changes.