Wireshark-bugs: [Wireshark-bugs] [Bug 6755] New: slow loading/processing of conversations with o
Date: Mon, 23 Jan 2012 06:54:27 -0800 (PST)
https://bugs.wireshark.org/bugzilla/show_bug.cgi?id=6755

           Summary: slow loading/processing of conversations with over
                    500k frames.
           Product: Wireshark
           Version: SVN
          Platform: x86
        OS/Version: All
            Status: NEW
          Severity: Enhancement
          Priority: Medium
         Component: Wireshark
        AssignedTo: bugzilla-admin@xxxxxxxxxxxxx
        ReportedBy: const.crist@xxxxxxxxxxxxxx


Cristian Constantin <const.crist@xxxxxxxxxxxxxx> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Attachment #7718|                            |review_for_checkin?
              Flags|                            |

Created attachment 7718
  --> https://bugs.wireshark.org/bugzilla/attachment.cgi?id=7718
patch

Build Information:
wireshark 1.7.1 (SVN Rev 40560 from /trunk)

Copyright 1998-2012 Gerald Combs <gerald@xxxxxxxxxxxxx> and contributors.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Compiled (64-bit) with GTK+ 2.24.4, with Cairo 1.10.2, with Pango 1.28.4, with
GLib 2.28.6, with libpcap (version unknown), with libz 1.2.3.4, with POSIX
capabilities (Linux), with SMI 0.4.8, with c-ares 1.7.5, with Lua 5.1, without
Python, with GnuTLS 2.12.10, with Gcrypt 1.5.0, with MIT Kerberos, with GeoIP,
with PortAudio V19-devel (built Jul 20 2011 00:01:38), without AirPcap.

Running on Linux 3.1.0-1-amd64, with locale en_US.UTF-8, with libpcap version
1.1.1, with libz 1.2.3.4, GnuTLS 2.12.10, Gcrypt 1.5.0.

Built using gcc 4.6.1.
--
wireshark dumping and analysis of the packets uses
the concept of "conversations"; in case of the 
{udp,tcp,sctp}/ip packets the conversations are identified 
using the pair (ip_address, port) per protocol type.

for (sip) proxy to proxy, (sigtran) server to server aso traffic 
the conversations can become quite large pretty quickly.

however it seems that wireshark does not do very well at
handling these large conversations at all. for example, when
using the svn version on the following architecture:

4xAMD Phenom(tm) II X4 940 Processor
4GB RAM

it takes around 15 minutes to load and display a pcap file
of about 600MB containing conversations having over 500k packets.

curiosity made me to do some profiling (using oprofile).
this is what I get: 


CPU: AMD64 family10, speed 3e+06 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask
of 0x00 (No unit mask) count 750000
samples  %        image name               symbol name
342405   49.2224  libwireshark.so.0.0.1    conversation_lookup_hashtable
327297   47.0506  libwireshark.so.0.0.1    conversation_new
1036      0.1489  libglib-2.0.so.0.2800.6  g_hash_table_lookup
982       0.1412  libwireshark.so.0.0.1    guint8_pbrk
900       0.1294  libwireshark.so.0.0.1    serv_name_lookup

so, the suspects are:

* conversation_new()
* conversation_lookup_hashtable()

I had a look at both of them.

1. conversation_new()

conversations are stored in a hash table implemented using GHashTable from
glib.
however elements having _exactly_ the same key (in this case the ip:port,
protocol)
are kept in separate lists.

what I find really strange here is the folowing loop:

    if (conversation) {
        for (; conversation->next; conversation = conversation->next)
            ;
        conversation->next = se_alloc(sizeof(conversation_t));
        conversation = conversation->next;
    } else {
        conversation = se_alloc(sizeof(conversation_t));
    }

if it is needed to add the new conversation always at the
end of the list than why not _storing_ a pointer to the
end of the list?

2. conversation_lookup_hashtable()

now, this loop looks suspicious to me:

if (match) {
        for (conversation = match->next; conversation; conversation =
conversation->next) {
            if ((conversation->setup_frame <= frame_num)
                && (conversation->setup_frame > match->setup_frame))
                match = conversation;
        }
        if (match->setup_frame > frame_num)
            match = NULL;
}

if it is needed to find a conversation based on some frame number,
why not having the conversation list sorted on frame numbers?

I attach a patch against svn trunk that tries to improve the problems described
above.

with the patch applied, on the same machine, the same file took approx. 25
seconds
to be decoded and displayed. here is how the profiling looks like after
applying the patch:

CPU: AMD64 family10, speed 3e+06 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask
of 0x00 (No unit mask) count 750000
samples  %        image name               symbol name
4601      4.6589  libwireshark.so.0.0.1    guint8_pbrk
4545      4.6022  libglib-2.0.so.0.2800.6  g_atomic_pointer_get
4484      4.5404  libglib-2.0.so.0.2800.6  g_hash_table_lookup
3252      3.2929  libwireshark.so.0.0.1    compute_offset_length
2824      2.8595  libc-2.13.so             __memcpy_sse2
2771      2.8059  libwireshark.so.0.0.1    emem_free_all
2364      2.3938  libglib-2.0.so.0.2800.6  g_slice_free1
2116      2.1426  libwireshark.so.0.0.1    dissect_sip_common
2069      2.0950  libwireshark.so.0.0.1    emem_alloc_chunk
1931      1.9553  libwireshark.so.0.0.1    check_offset_length_no_exception

I have tested the patch with: udp traffic (sip), tcp traffic (http, ftp).
I have run the fuzz tests for the udp packets as well (6 passes, all O.K.)

pls. let me know what you think.

-- 
Configure bugmail: https://bugs.wireshark.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching all bug changes.