Ethereal-dev: Re: [Ethereal-dev] manuf file munging

Note: This archive is from the project's previous web site, ethereal.com. This list is no longer active.

From: Joerg Mayer <jmayer@xxxxxxxxx>
Date: Sun, 17 Nov 2002 19:35:04 +0100
On Thu, Oct 31, 2002 at 11:52:11AM -0800, Chris Waters wrote:
> The manufacturer names in the manuf file are often not all that helpful. For
> example, any Linksys equipment shows up with an address like: The_93:45:34
> since their name is "The Linksys Group".
> 
> Is the purpose of the manuf.tmpl file to provide replacement names for those
> addresses that don't have good names?
> 
> Alternatively the make-manuf script could be modified to clean the names up:
> 
> * Convert to consistent case.
> * Remove any leading "The ".
> * Remove any punctuation.
> * Replace any spaces with underscores.
> * Truncate all names to a reasonable length, say 10 characters.

Attached is a patch to make-manuf that does this. I've ignored Guy's ideas
for now because it was much easier to do in perl :-)
This patch is intended for testing/comments right now. Depending on the
feedback I'll check it in.

 Ciao
       Jörg

--
Joerg Mayer                                          <jmayer@xxxxxxxxx>
I found out that "pro" means "instead of" (as in proconsul). Now I know
what proactive means.
Changelog: <jmayer@xxxxxxxxx>
- Reorganize code into blocks (read template, read ieee, read cb, write).
- Add funtion shorten which does some pretty printing of the manuf sting
  and use it for ieee and cb.

Index: make-manuf
===================================================================
RCS file: /usr/local/cvsroot/ethereal/make-manuf,v
retrieving revision 1.7
diff -u -p -r1.7 make-manuf
--- make-manuf	9 Sep 2002 19:38:09 -0000	1.7
+++ make-manuf	17 Nov 2002 18:26:23 -0000
@@ -14,12 +14,15 @@
 # precedence.
 
 # LWP is part of the standard Perl module libwww 
+
 eval "require LWP::UserAgent;";
 if( $@ ) {
   die "LWP isn't installed. It is part of the standard Perl\n" .
 	" module libwww.  Bailing.\n";
 }
 
+$agent    = LWP::UserAgent->new;
+
 $template = "manuf.tmpl";
 $wkatmpl  = "wka.tmpl";
 $outfile  = "manuf";
@@ -38,50 +41,65 @@ $cb_skipped   = 0;
 $ieee_added   = 0;
 $ieee_skipped = 0;
 
-$agent    = LWP::UserAgent->new;
-
-print "Fetching $cb_url.\n";
-$request  = HTTP::Request->new(GET => $cb_url);
-$result   = $agent->request($request);
-
-if (!$result->is_success) {
-  die ("Error fetching $cb_url: " . $result->status_line . "\n");
+sub shorten
+{
+  my $origmanuf = shift; 
+  my $manuf = " " . $origmanuf . " ";
+  # Remove any punctuation
+  $manuf =~ tr/,.()/    /;
+  # & isn't needed when Standalone
+  $manuf =~ s/ \& / /g;
+  # Remove any "the", "inc", "plc" ...
+  $manuf =~ s/\s(the|inc|incorporated|plc||systems|corp|corporation|a\/s|ab|ag|kg|gmbh|co|company|limited|ltd)(?= )//gi;
+  # Cleanup multiple spaces
+  $manuf =~ s/^\s+//g;
+  $manuf =~ s/\s+/ /g;
+  # Truncate all names to a reasonable length, say 10 characters.
+  $manuf = substr($manuf, 0, 20); # XXX 20 for testing only
+  # Remove trailing whitespaces
+  $manuf =~ s/\s+$//g;
+  # Convert to consistent case
+  $manuf =~ s/(\w+)/\u\L$1/g;
+  # Replace any spaces with underscores
+  $manuf =~ s/\s+/_/g;
+
+  if ($manuf =~ /\Q$origmanuf\E/i) {
+    return $manuf;
+  } else {
+    return "$manuf\t\t# $origmanuf";
+  }
 }
-$cb_list = $result->content;
 
-print "Fetching $ieee_url.\n";
-$request  = HTTP::Request->new(GET => $ieee_url);
-$result   = $agent->request($request);
-
-if (!$result->is_success) {
-  die ("Error fetching $ieee_url: " . $result->status_line . "\n");
-}
-$ieee_list = $result->content;
+# Write out the header and populate the OUI list with our entries.
 
 open (TMPL, "< $template") || 
   die "Couldn't open template file for reading ($template)\n";
 
-open (WKATMPL, "< $wkatmpl") || 
-  die "Couldn't open well-known address template file for reading ($wkatmpl)\n";
-
-open (OUT, "> $outfile") ||
-  die "Couldn't open output file for writing ($outfile)\n";
-
-# Write out the header and populate the OUI list with our entries.
 while ($line = <TMPL>) {
   chomp($line);
   if ($line !~ /^$oui_re\s+\S/ && $inheader) {
-    print(OUT "$line\n");
+    $header .= "$line\n";
   } elsif (($oui, $manuf) = ($line =~ /^($oui_re)\s+(\S.*)$/)) {
     $inheader = 0;
     # Ensure OUI is all upper-case
     $oui =~ tr/a-f/A-F/;
+    # $oui_list{$oui} = &shorten($manuf);
     $oui_list{$oui} = $manuf;
     $tmpl_added++;
   }
 }
 
 # Add IEEE entries for OUIs not yet known.
+
+print "Fetching $ieee_url.\n";
+$request  = HTTP::Request->new(GET => $ieee_url);
+$result   = $agent->request($request);
+
+if (!$result->is_success) {
+  die ("Error fetching $ieee_url: " . $result->status_line . "\n");
+}
+$ieee_list = $result->content;
+
 foreach $line (split(/\n/, $ieee_list)) {
   if (($oui, $manuf) = ($line =~ /^($ieee_re)\s+\(hex\)\s+(\S.*)$/)) {
     $oui =~ tr /-/:/;  # The IEEE bytes are separated by dashes.
@@ -91,13 +109,23 @@ foreach $line (split(/\n/, $ieee_list)) 
       printf "$oui - Skipping IEEE \"$manuf\" in favor of \"$oui_list{$oui}\"\n";
       $ieee_skipped++;
     } else {
-      $oui_list{$oui} = $manuf;
+      $oui_list{$oui} = &shorten($manuf);
       $ieee_added++;
     }
   }
 }
 
 # Add CaveBear entries for OUIs not yet known.
+
+print "Fetching $cb_url.\n";
+$request  = HTTP::Request->new(GET => $cb_url);
+$result   = $agent->request($request);
+
+if (!$result->is_success) {
+  die ("Error fetching $cb_url: " . $result->status_line . "\n");
+}
+$cb_list = $result->content;
+
 foreach $line (split(/\n/, $cb_list)) {
   if (($oui, $manuf) = ($line =~ /^($cb_re)\s+(\S.*)$/)) {
     ($h1, $h2, $h3) = ($oui =~ /($hp)($hp)($hp)/);  # The CaveBear bytes have no separators
@@ -108,21 +136,30 @@ foreach $line (split(/\n/, $cb_list)) {
       printf "$oui - Skipping CaveBear \"$manuf\" in favor of \"$oui_list{$oui}\"\n";
       $cb_skipped++;
     } else {
-      $oui_list{$oui} = $manuf;
+      $oui_list{$oui} = &shorten($manuf);
       $cb_added++;
     }
   }
 }
 
+# Write output file
+
+open (OUT, "> $outfile") ||
+  die "Couldn't open output file for writing ($outfile)\n";
+
+print(OUT "$header");
+
 foreach $oui (sort(keys %oui_list)) {
   print(OUT "$oui\t$oui_list{$oui}\n");
 }
 
-#
 # Write out a blank line separating the OUIs from the well-known
 # addresses, and then read the well-known address template file
 # and write it to the manuf file.
-#
+
+open (WKATMPL, "< $wkatmpl") || 
+  die "Couldn't open well-known address template file for reading ($wkatmpl)\n";
+
 # XXX - it'd be nice to get this from the Cavebear file, but inferring
 # the address mask from entries in that file involves some work.
 #