\chapter{Conclusion}

% \noindent
% ====================
% 
% Writing Points:
% \begin{itemize}
% \item
% Summarize u-benchmark suite.
% \item
% Summarize @uHeapLmmm@.
% \item
% Make recommendations on memory allocator design.
% \end{itemize}
% 
% \noindent
% ====================

The goal of this thesis was to build a low-latency memory allocator for both KT and UT multi-threads systems, which is competitive with the best current memory allocators, while extending the feature set of existing and new allocator routines.
The new llheap memory-allocator achieves all of these goals, while maintaining and managing sticky allocation information without a performance loss.
Hence, it becomes possible to use @realloc@ frequently as a safe operation, rather than just occasionally.
Furthermore, the ability to query sticky properties and information allows programmers to write safer programs, as it is possible to dynamically match allocation styles from unknown library routines that return allocations.

Extending the C allocation API with @resize@, advanced @realloc@, @aalloc@, @amemalign@, and @cmemalign@ means programmers do not make mistakes writing theses useful allocation operations.
The ability to use \CFA's advanced type-system (and possibly \CC's too) to have one allocation routine with completely orthogonal sticky properties shows how far the allocation API can be pushed, which increases safety and greatly simplifies programmer's use of dynamic allocation.

Providing comprehensive statistics for all allocation operations is invaluable in understanding and debugging a program's dynamic behaviour.
No other memory allocator provides comprehensive statistics gathering.
This capability was used extensively during the development of llheap to verify its behaviour.
As well, providing a debugging mode where allocations are checked, along with internal pre/post conditions and invariants, is extremely useful, especially for students.
While not as powerful as the @valgrind@ interpreter, a large number of allocations mistakes are detected.
Finally, contention-free statistics gathering and debugging have a low enough cost to be used in production code.

The ability to compile llheap with static/dynamic linking and optional statistics/debugging provides programers with multiple mechanisms to balance performance and safety.
These allocator versions are easy to use because they can be linked to an application without recompilation.

Starting a micro-benchmark test-suite for comparing allocators, rather than relying on a suite of arbitrary programs, has been an interesting challenge.
The current micro-benchmark allows some understand of allocator implementation properties without actually looking at the implementation.
For example, the memory micro-benchmark quickly identified how several of the allocators work at the global level.
It was not possible to show how the micro-benchmarks adjustment knobs were used to tune to an interesting test point.
Many graphs were created and discarded until a few were selected for the thesis.


\section{Future Work}

A careful walk-though of the allocator fastpath should yield additional optimizations for a slight performance gain.
In particular, looking at the implementation of rpmalloc, which is often the fastest allocator,

The micro-benchmarks project requires more testing and analysis.
Additional allocations patterns are needed to extract meaningful information about allocators, and within allocation patterns, what are the best tuning knobs.
Also, identifying ways to visualize the results of the micro-benchmarks is a work in progress.

After llheap is made available on gitHub, interacting with its users to locate problems and improvements, will make llbench a more robust memory allocator.