Wireshark-dev: Re: [Wireshark-dev] Adding pcap-ng pipe support to dumpcap
From: Guy Harris <guy@xxxxxxxxxxxx>
Date: Fri, 1 Sep 2017 14:03:33 -0700
On Aug 31, 2017, at 10:13 PM, Anthony Coddington <anthony.coddington@xxxxxxxxxx> wrote:

> Wtap filetypes could indicate that they do/do not support sequential only access with a flag in dump_open_table. There is already a flag for writing_must_seek. I'm sure at some point I saw some support in wtap internals for trying to read from pipes/stdin without random seeking support?

Libwiretap, in principle, requires seeking, because it determines the file type by, for all file types, attempting to read the file as if it were a file of that type until it either finds a file type that claims the file, gets an I/O error, or runs out of file types.  It does this by, for each file type, seeking back to the beginning of the file and calling the "open" routine.

In practice, the low-level code for reading from files has a buffer and implements seeking within the buffer by just adjusting an internal pointer, so *if* the first buffer read contains sufficient data, for all file types, to either identify the file as being of that type or not being of that type, we can determine the file type without actually adjusting the file descriptor's seek position, allowing it to work when reading from a pipe.

For files with a magic number at the beginning, that could work, as the buffer size is 4K (unless the OS supports the st_blksize member of the stat structure, and, for a given file/file system, the returned "recommended" block size isn't 4K, e.g. 8K on an 8K/1K BSD FFS file system), and for most (if not all) files with magic numbers, the magic number is within the first 4K bytes of the file, so the test just keep reading out of the buffer.

That would require, however, either that:

	1) if the wiretap/file_wrappers.c code reads into the buffer, and doesn't get a full 4K, subsequent reads append to the buffer until it fills, rather than having the code process what's in the buffer and then do the next read starting at the beginning of the buffer

or

	2) anything writing to a pipe is guaranteed to, in the first write, write enough data for all magic numbers, and all OSes buffer that full amount, so that the first read gets enough data for all magic numbers.

For file formats unfortunate enough not to have a magic number, the open routine has to be able to say "this is my file" or "this is not my file" given only the first 4K bytes of data - or we need to increase the "guaranteed minimum unless the file itself is smaller" value to something big enough for all file types.

> If necessary wtap could be extended to have wtap filetypes return a 'need more data' rather than WTAP_ERR_SHORT_READ or blocking if they don't have the rest of the packet but have not reached EOF.

If there's currently any place in libwiretap where we return WTAP_ERR_SHORT_READ if we haven't finished reading a record but we haven't reached EOF, rather than just reading until we *do* finish reading the record, that's a colossal bug and needs to be fixed immediately.

We *could*, I guess, provide a "non-blocking" mode, wherein if an attempt to read from a file returns EAGAIN before we're finished reading a record, we return an equivalent "trying to read more stuff would block" error, but:

	1) that change would go all the way down into the middle of wiretap/file_wrappers.c, and would probably complicate that code path significantly;

	2) once that's done, other code paths also get more complicated (look at the stuff that reads from files vs. the stuff that captures from pipes!);

so it may be that if we want to be able to capture from a pipe, and not block the UI when waiting for input from the pipe, a simpler solution might be to stuff the "capture from a pipe" stuff into a separate thread that just does ordinary blocking reads from the pipe.  (Back when Wireshark^WEthereal was originally written, threading wasn't supported by many of the platforms on which it ran.  Now, we use threading in dumpcap and the GTK+ UI, with the GLib g_thread_ routines.)

> There is also nothing stopping having an interface for extcap that requires dumping to an intermediate file for files that require seeking.

At least one file type that requires seeking is NetMon files, and you can't read them until you've written the entire file, so you just couldn't use it for extcap.  But I'm not sure there's any good reason to do so.

Another is the NetXRay/Windows Sniffer format, where the file's contents are probably an in-memory ring buffer dumped to the file, so you start reading in the middle of the file and, when you hit the end of the file, you wrap around.  But I'm not sure there's a good reason to use that format, either.

There are cases where we seek forward, but we can implement that, on a sequential stream, by just reading data and throwing it away, using file_read() with a null pointer to the buffer into which to read data.