Wireshark-dev: Re: [Wireshark-dev] Revive the happy-shark repository?
From: Jirka Novak <j.novak@xxxxxxxxxxxx>
Date: Sat, 23 Jan 2021 12:49:18 +0100
Hi,

> Years ago we added a repository for dissector regression tests at
> https://github.com/wireshark/happy-shark. Unfortunately it hasn't
> received much attention, and instead we've been adding dissector tests
> in the main repository. Should we
> 
> - Import happy-shark into GitLab and move our current dissector tests
> there?
> 
> - Retire happy-shark and do all of our testing in the main repository?
> 
> - Something else?
> 
> I'm leaning toward the first option for the simple reason that it will
> minimize the number of files we accrue in test/captures.

I'm personally missing happy-shark alive. Very often I'm looking for a
sample which contains e.g. correct sequence of messages for a protocol
on the Internet (with no success). I imagine it has potential to be
world wide library of samples...
But in this case it should contain not only sample, but its
description/context where and why it was collected and how it should be
decoded. I proposed this to happy-shark long time ago.

Back to the questions. I think that before the decision we should agree
on the goal and related process. There are my ideas.

When I added samples to happy-shark, my idea was to use it as QA tool.
Therefore for each sample I stored description why sample is there and
what is important to check during decoding. For some samples e.g. I
stored "script" which extracted e.g. sequence numbers only which where
then compared.
Then I added versioning of output - to be able to track that e.g.
wireshark version 2.0 decoded the sample some way and the version 2.2 do
it different way. Both cases could be correct because there was
change/improvement in decoding between versions.
I proposed to store outputs from each last version in branch and a few
development branches.

It is possible to use it different way too - e.g. between version 2.0
and 2.2 this set of samples is decoded different way. Someone should
check why?
If it is correct, store new output in repository. If it is incorrect, we
can ask author of change to fix it and provide them with sample where
the issue is observed.

The idea described above mean that there should be a team which will be
engaged on every "stable" release and will check whether all samples are
decoded expected way.
For it, separate project is probably more suitable.

Another point of view is that we can trigger same check for every build
and notify author that a new change introduced change in decoding of
sample1, sample3101, ... Author than should check it and if new decoding
is correct, all sample outputs should be updated by the author as part
of patch. For some changes it could be very extensive task.
For this case, make happy-shark as part of repository is probably more
suitable.

The question is whether author (everyone) should be allowed to update
stored/compared outputs. If they do it incorrectly, QA check will pass
even decoding is incorrect.
On the other hand, if we create team for it, I'm not sure whether we can
find team which will understand every protocol to be able to decide
whether new change is correct or not.
We are working in open source, therefore open approach is more suitable
I think.

There might be something in between.
Store happy-shark in separate repository. A user can decide whether it
will be cloned locally. If so, special build script or target will check
samples with local build. If not, it will be skipped.
On gitlab triggered builds, happy-shark will be available every time and
checks will be done. Fixing of decoding of failed samples might be
optional. We can record/track which commit broken which samples.
For every build we can create "statistics" e.g. how many samples are
decoded correct way (mean same way as stored in happy-shark repository).

						Best regards,

							Jirka Novak