Wireshark-dev: Re: [Wireshark-dev] TPG tool files
From: Guy Harris <guy@xxxxxxxxxxxx>
Date: Thu, 13 Sep 2012 10:58:10 -0700
On Sep 13, 2012, at 10:35 AM, Jeff Morriss <jeff.morriss.ws@xxxxxxxxx> wrote:

> Gisle Vanem wrote:
>> The files in tools/tpg seems to have DOS/Win line-endings (CR-LF). This doesn't work well with the Cygwin compiled perl I have here. Could you pass them through dos2unix before adding to svn?
> 
> Those files all have svn:eol-style set to 'native' (like most of the rest of Wireshark) which means they should have, well, native line endings.  I forget exactly what that means in the Cygwin world though...

The problem is that "in the Cygwin world", you're running Windows, and, in Windows, native line endings are CR-LF, but:

	http://cygwin.com/faq-nochunks.html#faq.api.cr-lf


"5.3.  How is the DOS/Unix CR/LF thing handled?

Let's start with some background.

On POSIX systems, a file is a file and what the file contains is whatever the program/programmer/user told it to put into it. In Windows, a file is also a file and what the file contains depends not only on the program/programmer/user but also the file processing mode.

When processing in text mode, certain values of data are treated specially. A \n (new line, NL) written to the file will prepend a \r (carriage return, CR) so that if you `printf("Hello\n") you in fact get "Hello\r\n". Upon reading this combination, the \r is removed and the number of bytes returned by the read is 1 less than was actually read. This tends to confuse programs dependent on ftell() and fseek(). A Ctrl-Z encountered while reading a file sets the End Of File flags even though it truly isn't the end of file.

One of Cygwin's goals is to make it possible to mix Cygwin-ported POSIX programs with generic Windows programs. As a result, Cygwin allows to open files in text mode. In the accompanying tools, tools that deal with binaries (e.g. objdump) operate in POSIX binary mode and many (but not all) tools that deal with text files (e.g. bash) operate in text mode. There are also some text tools which operate in a mixed mode. They read files always in text mode, but write files in binary mode, or they write in the mode (text or binary) which is specified by the underlying mount point. For a description of mount points, see the Cygwin User's Guide.

Actually there's no really good reason to do text mode processing since it only slows down reading and writing files. Additionally many Windows applications can deal with POSIX \n line endings just fine (unfortunate exception: Notepad). So we suggest to use binary mode as much as possible and only convert files from or to DOS text mode using tools specifically created to do that job, for instance, d2u and u2d from the cygutils package.

It is rather easy for the porter of a Unix package to fix the source code by supplying the appropriate file processing mode switches to the open/fopen functions. Treat all text files as text and treat all binary files as binary. To be specific, you can select binary mode by adding O_BINARY to the second argument of an open call, or "b" to second argument of an fopen call. You can also call setmode (fd, O_BINARY). To select text mode add O_TEXT to the second argument of an open call, or "t" to second argument of an fopen call, or just call setmode (fd, O_TEXT).

You can also avoid to change the source code at all by linking an additional object file to your executable. Cygwin provides various object files in the /usr/lib directory which, when linked to an executable, changes the default open modes of any file opened within the executed process itself. The files are

  binmode.o      - Open all files in binary mode.
  textmode.o     - Open all files in text mode.
  textreadmode.o - Open all files opened for reading in text mode.
  automode.o     - Open all files opened for reading in text mode,
                   all files opened for writing in binary mode.

Note
Linking against these object files does not change the open mode of files propagated to a process by its parent process, for instance, if the process is part of a shell pipe expression.
Note that of the above flags only the "b" fopen flags are defined by ANSI. They exist under most flavors of Unix. However, using O_BINARY, O_TEXT, or the "t" flag is non-portable."

so a given Cygwin ported-from-UNIX program might or might not handle CR-LF line endings.

I'm not sure what the right fix for this is.  Googling for

	cygwin perl script line endings

found a bunch of stuff about problems with Perl scripts processing text files, and some stuff about problems with *bash* handling shell scripts with CR-LF line endings, but nothing specifically about Perl.