Wireshark-dev: [Wireshark-dev] Fwd: tvb allocator (was: Re: [Wireshark-commits] master b6d20a2:
From: Evan Huus <eapache@xxxxxxxxx>
Date: Fri, 11 Jul 2014 21:08:34 -0400
(sorry Balint for the double-post, I don't know why my "reply" button dropped the mailing list)

---------- Forwarded message ----------
From: Evan Huus <eapache@xxxxxxxxx>
Date: Fri, Jul 11, 2014 at 9:05 PM
Subject: Re: [Wireshark-dev] tvb allocator (was: Re: [Wireshark-commits] master b6d20a2: Optimize reseting epan_dissect_t when filtering.)
To: Bálint Reczey <balint@xxxxxxxxxxxxxxx>


On Fri, Jul 11, 2014 at 8:27 PM, Bálint Réczey <balint@xxxxxxxxxxxxxxx> wrote:
Hi Evan,

2014-07-11 23:51 GMT+02:00 Evan Huus <eapache@xxxxxxxxx>:
> On Fri, Jul 11, 2014 at 5:12 PM, Bálint Réczey <balint@xxxxxxxxxxxxxxx>
> wrote:
>>
>> Hi All,
>>
>> Please provide the input data for letting others reproduce the results
>> or perform the performance tests on pcap files already available to
>> the public.
>>
>> I'm not a fan of implementing custom memory management methods because
>> partly because I highly doubt we can beat jemalloc easily on
>> performance
>
>
> The only place we reliably beat jemalloc (or even glib) is when we have a
> large number of allocations that live together, and can be freed with
> free_all. Anything else is basically noise. As Jakub's test noted, the main
> block allocator is actually slightly slower than g_slice_* if the frees are
> done manually.
>
>>
>> and custom allocation methods can also have nasty bugs
>> like the one observed in OpenSSL:
>> http://www.tedunangst.com/flak/post/analysis-of-openssl-freelist-reuse
>>
>> I wrote a short post about making all programs in Debian resistant to
>> malloc() related attacks using ASAN and wmem in its current form
>> prevents implementing the protection:
>>
>> http://balintreczey.hu/blog/proposing-amd64-hardened-architecture-for-debian/
>
>
> It's not clear to me from reading the blog post or the mail to debian what
> the actual protections would be, or why wmem would prevent them from
> working. Could you clarify please? Glib has its own allocator logic
> internally for g_slice_*, does this also cause problems?
I plan using ASAN for all programs which would catch (among others)
use-after-free and reading below or over the malloc()-ed
memory area. Those can't be caught if the program uses another layer
of bulk memory allocations.
g_malloc() and g_slice_* has the same problem, but they can be
overrideb by passing G_SLICE=always-malloc .

I know wmem has simple allocator

Which can be foreced everywhere by passing WIRESHARK_DEBUG_WMEM_OVERRIDE=simple like we do in "valgrind-wireshark.sh". I don't see how this is different from using G_SLICE=always-malloc?
 
but wmem_free() is really
inefficient

I grant that wmem_simple_free has a terrible worst-case behaviour, but its practical efficiency is actually the best of anything I've tried. The only alternative which doesn't involve adding a prefix to the memory block (which would break ASAN/valgrind/etc) is a hash table, which is actually substantially slower.
 
and as a fix I would like to propose removing wmem_free()
from the API.
IMO Wireshark needs wmem_alloc() for allocating and freeing memory
needed during whole scopes, but should not offer wmem_free() and
wmem_realloc() to let us able to provide efficient per-scope
allocations.

Hmm. We need free and realloc to efficiently implement certain data structures, and I see you've addressed that below, so I will include my answer there.
 
Dissectors which should simply do g_malloc()/g_free() for
allocations when they know when they need to free memory

(which they can't do safely if they also call proto_tree functions inbetween)
 
and using
wmem_free() also does not keep the promise of having a
wmem_alloc()-ated object available during the whole scope.

I don't think that promise was ever made, really. The promise is that such memory won't be available after the scope, not that it will be available right until the end.
 
Wmem also have a lot of data structures reimplemented using
wmem-backed memory, but I think it would be way easier to use GLists
registered to be g_list_free()-d using wmem_register_callback() than
using wmem_list_* functions for manipulating per-scope allocated lists
for example.

This is a follow-up to the performance concern. If we use callbacks to free everything, we lose the *substantial* speed-up we get from the free_all operation. I grant that we should avoid optimizations unless well-justified, but I do believe this is well-justified given the benefit.

Taking it to the extreme, wmem could be made *extraordinarily* simple if it were nothing more than a linked list of callbacks (some to g_free, some to g_list_free, etc) and we used glib memory for everything. It would just be very, very, very slow. The simplicity has an obvious appeal, but I can't justify a performance hit of that magnitude.

Cheers,
Balint

>
>>
>> Please don't sacrifice protection for 2% speedup. Please keep wmem
>> usage for cases where it is used for garbage collecting (free() after
>> end of frame/capture file) not when the allocation and deallocation
>> are already done properly.
>>
>> Thanks,
>> Balint
>>
>> 2014-07-11 8:58 GMT+02:00 Jakub Zawadzki <darkjames-ws@xxxxxxxxxxxx>:
>> > Hi,
>> >
>> > On Sat, Jun 21, 2014 at 10:12:48PM +0200, Jakub Zawadzki wrote:
>> >> If we're in topic of optimizing 'slower' [de]allocations in common
>> >> functions:
>> >>
>> >> - tvb allocation/deallocation (2.5%, or 3.4% when no filtering)
>> >>
>> >>    243,931,671  *  ???:tvb_new
>> >> [/tmp/wireshark/epan/.libs/libwireshark.so.0.0.0]
>> >>    202,052,290  >   ???:g_slice_alloc (2463493x)
>> >> [/usr/lib64/libglib-2.0.so.0.3600.4]
>> >>
>> >>    291,765,126  *  ???:tvb_free_chain
>> >> [/tmp/wireshark/epan/.libs/libwireshark.so.0.0.0]
>> >>    256,390,635  >   ???:g_slice_free1 (2435843x)
>> >> [/usr/lib64/libglib-2.0.so.0.3600.4]
>> >
>> >> This, or next week I'll try to do tvb.
>> >
>> > ... or maybe this week:
>> >
>> > ver0 | 18,055,719,820 (-----------) | Base version
>> > 96f0585268f1cc4e820767c4038c10ed4915c12a
>> > ver1 | 18,185,185,838 (0.6% slower) | Change tvb allocator g_slice_* to
>> > wmem with file scope
>> > ver2 | 17,809,433,204 (1.4% faster) | Change tvb allocator g_slice_* to
>> > wmem with file/packet scope
>> > ver3 | 17,812,128,887 (1.3% faster) | Change tvb allocator g_slice_* to
>> > simple object allocator with epan scope
>> > ver4 | 17,704,132,561 (2.0% faster) | Change tvb allocator g_slice_* to
>> > simple object allocator with file scope
>> >
>> > I have uploaded patches & profiler outputs to:
>> > http://www.wireshark.org/~darkjames/tvb-opt-allocator/
>> >
>> > Please review, and check what version is OK to be applied.
>> >
>> >
>> > P.S: I'll might find some time to do ver5 with slab allocator, but it'll
>> > look like object allocator API with epan scope.
>> >
>> > Cheers,
>> > Jakub.
>> >
>> > ___________________________________________________________________________
>> > Sent via:    Wireshark-dev mailing list <wireshark-dev@xxxxxxxxxxxxx>
>> > Archives:    http://www.wireshark.org/lists/wireshark-dev
>> > Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev
>> >
>> > mailto:wireshark-dev-request@xxxxxxxxxxxxx?subject=unsubscribe
>>
>> ___________________________________________________________________________
>> Sent via:    Wireshark-dev mailing list <wireshark-dev@xxxxxxxxxxxxx>
>> Archives:    http://www.wireshark.org/lists/wireshark-dev
>> Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev
>>
>> mailto:wireshark-dev-request@xxxxxxxxxxxxx?subject=unsubscribe
>
>