Context Navigation

← Previous Changeset
Next Changeset →

Changeset 3a038fa

Timestamp:

Feb 23, 2022, 10:31:12 AM (4 years ago)

Author:

Peter A. Buhr <pabuhr@…>

Branches:

ADT, ast-experimental, enum, master, pthread-emulation, qualifiedEnum, stuck-waitfor-destruct

Children:

Parents:

Message:

more updates to added text

Location:

doc/theses/mubeen_zulfiqar_MMath

Files:

: 5 edited

allocator.tex (modified) (1 diff)
background.tex (modified) (26 diffs)
benchmarks.tex (modified) (1 diff)
figures/AllocatorComponents.fig (modified) (5 diffs)
intro.tex (modified) (8 diffs)

Legend:

: Unmodified
: Added
: Removed

doc/theses/mubeen_zulfiqar_MMath/allocator.tex

-              rf53afafb
+              r3a038fa
 We added a few more features and routines to the allocator's C interface that can make the allocator more usable to the programmers. THese features will programmer more control on the dynamic memory allocation.
+\subsection{Out of Memory}
+Most allocators use @nullptr@ to indicate an allocation failure, specifically out of memory;
+hence the need to return an alternate value for a zero-sized allocation.
+The alternative is to abort a program when out of memory.
+In theory, notifying the programmer allows recovery;
+in practice, it is almost impossible to gracefully when out of memory, so the cheaper approach of returning @nullptr@ for a zero-sized allocation is chosen.
 \subsection{\lstinline{void * aalloc( size_t dim, size_t elemSize )}}
 @aalloc@ is an extension of malloc. It allows programmer to allocate a dynamic array of objects without calculating the total size of array explicitly. The only alternate of this routine in the other allocators is calloc but calloc also fills the dynamic memory with 0 which makes it slower for a programmer who only wants to dynamically allocate an array of objects without filling it with 0.

doc/theses/mubeen_zulfiqar_MMath/background.tex

-              rf53afafb
+              r3a038fa
 \chapter{Background}
-\section{Memory-Allocator Background}
-\label{s:MemoryAllocatorBackground}
 A program dynamically allocates and deallocates the storage for a variable, referred to as an \newterm{object}, through calls such as @malloc@ and @free@ in C, and @new@ and @delete@ in \CC.
 Space for each allocated object comes from the dynamic-allocation zone.
 A \newterm{memory allocator} is a complex data-structure and code that manages the layout of objects in the dynamic-allocation zone.
+A \newterm{memory allocator} contains a complex data-structure and code that manages the layout of objects in the dynamic-allocation zone.
 The management goals are to make allocation/deallocation operations as fast as possible while densely packing objects to make efficient use of memory.
 Objects cannot be moved to aid the packing process.
+Objects in C/\CC cannot be moved to aid the packing process, only adjacent free storage can be \newterm{coalesced} into larger free areas.
 The allocator grows or shrinks the dynamic-allocation zone to obtain storage for objects and reduce memory usage via operating-system calls, such as @mmap@ or @sbrk@ in UNIX.
 \subsection{Allocator Components}
+\section{Allocator Components}
 \label{s:AllocatorComponents}
 There are two important parts to a memory allocator, management and storage data (see \VRef[Figure]{f:AllocatorComponents}), collectively called the \newterm{heap}.
+\VRef[Figure]{f:AllocatorComponents} shows the two important data components for a memory allocator, management and storage, collectively called the \newterm{heap}.
 The \newterm{management data} is a data structure located at a known memory address and contains all information necessary to manage the storage data.
 The management data starts with fixed-sized information in the static-data memory that flows into the dynamic-allocation memory.
+The \newterm{storage data} is composed of allocated and freed objects, and reserved memory.
+Allocated objects (white) are variable sized and allocated to and maintained by the program.
+Freed objects (light grey) are memory deallocated by the program that is linked to form a list facilitating easy location of storage for new allocations.
+The \newterm{storage data} is composed of allocated and freed objects, and \newterm{reserved memory}.
+Allocated objects (white) are variable sized, and allocated and maintained by the program;
+\ie only the program knows the location of allocated storage, not the memory allocator.
+\begin{figure}[h]
+\centering
+\input{AllocatorComponents}
+\caption{Allocator Components (Heap)}
+\label{f:AllocatorComponents}
+\end{figure}
+Freed objects (light grey) are memory deallocated by the program, which are linked into one or more lists facilitating easy location for new allocations.
 Often the free list is chained internally so it does not consume additional storage, \ie the link fields are placed at known locations in the unused memory blocks.
 Reserved memory (dark grey) is one or more blocks of memory obtained from the operating system but not yet allocated to the program;
 if there are multiple reserved blocks, they are also chained together, usually internally.
-\begin{figure}
-\centering
-\input{AllocatorComponents}
-\caption{Allocator Components (Heap)}
-\label{f:AllocatorComponents}
-\end{figure}
 Allocated and freed objects typically have additional management data embedded within them.
 \VRef[Figure]{f:AllocatedObject} shows an allocated object with a header, trailer, and padding/spacing around the object.
+\VRef[Figure]{f:AllocatedObject} shows an allocated object with a header, trailer, and alignment padding and spacing around the object.
 The header contains information about the object, \eg size, type, etc.
 The trailer may be used to simplify an allocation implementation, \eg coalescing, and/or for security purposes to mark the end of an object.
 …
 \subsection{Single-Threaded Memory-Allocator}
+\section{Single-Threaded Memory-Allocator}
 \label{s:SingleThreadedMemoryAllocator}
 …
 \subsubsection{Fragmentation}
+\subsection{Fragmentation}
 \label{s:Fragmentation}
 Fragmentation is memory requested from the operating system but not used by the program;
 hence, allocated objects are not fragmentation.
 Fragmentation is often divided into internal or external (see~\VRef[Figure]{f:InternalExternalFragmentation}).
+\VRef[Figure]{f:InternalExternalFragmentation}) shows fragmentation is divided into two forms: internal or external.
 \begin{figure}
 …
 \newterm{Internal fragmentation} is memory space that is allocated to the program, but is not intended to be accessed by the program, such as headers, trailers, padding, and spacing around an allocated object.
 This memory is typically used by the allocator for management purposes or required by the architecture for correctness (\eg alignment).
+This memory is typically used by the allocator for management purposes or required by the architecture for correctness, \eg alignment.
 Internal fragmentation is problematic when management space is a significant proportion of an allocated object.
 For example, if internal fragmentation is as large as the object being managed, then the memory usage for that object is doubled.
 …
 This memory is problematic in two ways: heap blowup and highly fragmented memory.
 \newterm{Heap blowup} occurs when memory freed by the program is not reused for future allocations leading to potentially unbounded external fragmentation growth~\cite{Berger00}.
 Heap blowup can occur due to allocator policies that are too restrictive in reusing freed memory.
+Heap blowup can occur due to allocator policies that are too restrictive in reusing freed memory and/or no coalescing of free storage.
 Memory can become \newterm{highly fragmented} after multiple allocations and deallocations of objects.
 \VRef[Figure]{f:MemoryFragmentation} shows an example of how a small block of memory fragments as objects are allocated and deallocated over time.
 …
 For example, \VRef[Figure]{f:Contiguous} and \VRef[Figure]{f:HighlyFragmented} have the same quantity of external fragmentation, but \VRef[Figure]{f:HighlyFragmented} is highly fragmented.
 If there is a request to allocate a large object, \VRef[Figure]{f:Contiguous} is more likely to be able to satisfy it with existing free memory, while \VRef[Figure]{f:HighlyFragmented} likely has to request more memory from the operating system.
-For a single-threaded memory allocator, three basic approaches for controlling fragmentation have been identified~\cite{Johnstone99}.
-The first approach is a \newterm{sequential-fit algorithm} with one list of free objects that is searched for a block large enough to fit a requested object size.
-Different search policies determine the free object selected, \eg the first free object large enough or closest to the requested size.
-Any storage larger than the request can become spacing after the object or be split into a smaller free object.
-The cost of the search depends on the shape and quality of the free list, \eg a linear versus a binary-tree free-list, a sorted versus unsorted free-list.
 \begin{figure}
 …
 \label{f:FragmentationQuality}
 \end{figure}
+For a single-threaded memory allocator, three basic approaches for controlling fragmentation have been identified~\cite{Johnstone99}.
+The first approach is a \newterm{sequential-fit algorithm} with one list of free objects that is searched for a block large enough to fit a requested object size.
+Different search policies determine the free object selected, \eg the first free object large enough or closest to the requested size.
+Any storage larger than the request can become spacing after the object or be split into a smaller free object.
+The cost of the search depends on the shape and quality of the free list, \eg a linear versus a binary-tree free-list, a sorted versus unsorted free-list.
 The second approach is a \newterm{segregated} or \newterm{binning algorithm} with a set of lists for different sized freed objects.
 …
 When an object is deallocated it is coalesced with the objects immediately before and after it in memory, if they are free, turning them into one larger object.
 Coalescing can be done eagerly at each deallocation or lazily when an allocation cannot be fulfilled.
+In all cases, coalescing increases allocation latency, hence some allocations can cause unbounded delays during coalescing.
 While coalescing does not reduce external fragmentation, the coalesced blocks improve fragmentation quality so future allocations are less likely to cause heap blowup.
 Splitting and coalescing can be used with other algorithms to avoid highly fragmented memory.
 \subsubsection{Locality}
+\subsection{Locality}
 \label{s:Locality}
 …
 Hardware takes advantage of temporal and spatial locality through multiple levels of caching (\ie memory hierarchy).
 When an object is accessed, the memory physically located around the object is also cached with the expectation that the current and nearby objects will be referenced within a short period of time.
 For example, entire cache lines are transfered between memory and cache and entire virtual-memory pages are transferred between disk and memory.
 A program exhibiting good locality has better performance due to fewer cache misses and page faults.
+For example, entire cache lines are transferred between memory and cache and entire virtual-memory pages are transferred between disk and memory.
+A program exhibiting good locality has better performance due to fewer cache misses and page faults\footnote{With the advent of large RAM memory, paging is becoming less of an issue in modern programming.}.
 Temporal locality is largely controlled by how a program accesses its variables~\cite{Feng05}.
 …
 For temporal locality, an allocator can return storage for new allocations that was just freed as these memory locations are still \emph{warm} in the memory hierarchy.
 For spatial locality, an allocator can place objects used together close together in memory, so the working set of the program fits into the fewest possible cache lines and pages.
 However, usage patterns are different for every program as is the underlying hardware architecture (\ie memory hierarchy);
+However, usage patterns are different for every program as is the underlying hardware memory architecture;
 hence, no general-purpose memory-allocator can provide ideal locality for every program on every computer.
 …
 \subsection{Multi-Threaded Memory-Allocator}
+\section{Multi-Threaded Memory-Allocator}
 \label{s:MultiThreadedMemoryAllocator}
 A multi-threaded memory-allocator does not run any threads itself, but is used by a multi-threaded program.
 In addition to single-threaded design issues of locality and fragmentation, a multi-threaded allocator may be simultaneously accessed by multiple threads, and hence, must deal with concurrency issues such as mutual exclusion, false sharing, and additional forms of heap blowup.
 \subsubsection{Mutual Exclusion}
+In addition to single-threaded design issues of locality and fragmentation, a multi-threaded allocator may be simultaneously accessed by multiple threads, and hence, must deal with concurrency issues such as ymutual exclusion, false sharing, and additional forms of heap blowup.
+\subsection{Mutual Exclusion}
 \label{s:MutualExclusion}
 Mutual exclusion provides sequential access to the management data of the heap.
+\newterm{Mutual exclusion} provides sequential access to the shared management data of the heap.
 There are two performance issues for mutual exclusion.
 First is the overhead necessary to perform (at least) a hardware atomic operation every time a shared resource is accessed.
 …
 Contention can be reduced in a number of ways:
 using multiple fine-grained locks versus a single lock, spreading the contention across a number of locks;
 using trylock and generating new storage if the lock is busy, yielding a space vs contention trade-off;
+using trylock and generating new storage if the lock is busy, yielding a classic space versus time tradeoff;
 using one of the many lock-free approaches for reducing contention on basic data-structure operations~\cite{Oyama99}.
 However, all of these approaches have degenerate cases where contention occurs.
 \subsubsection{False Sharing}
+\subsection{False Sharing}
 \label{s:FalseSharing}
 False sharing is a dynamic phenomenon leading to cache thrashing.
 When two or more threads on separate CPUs simultaneously change different objects sharing a cache line, the change invalidates the other thread's associated cache, even though these threads may be uninterested in the modified object.
+When two or more threads on separate CPUs simultaneously change different objects sharing a cache line, the change invalidates the other thread's associated cache, even though these threads may be uninterested in the other modified object.
 False sharing can occur in three different ways: program induced, allocator-induced active, and allocator-induced passive;
 a memory allocator can only affect the latter two.
 \newterm{Program-induced false-sharing} occurs when one thread passes an object sharing a cache line to another thread, and both threads modify the respective objects.
 For example, in \VRef[Figure]{f:ProgramInducedFalseSharing}, when Task$_1$ passes Object$_2$ to Task$_2$, a false-sharing situation forms when Task$_1$ modifies Object$_1$ and Task$_2$ modifies Object$_2$.
+\paragraph{\newterm{Program-induced false-sharing}} occurs when one thread passes an object sharing a cache line to another thread, and both threads modify the respective objects.
+\VRef[Figure]{f:ProgramInducedFalseSharing} shows when Task$_1$ passes Object$_2$ to Task$_2$, a false-sharing situation forms when Task$_1$ modifies Object$_1$ and Task$_2$ modifies Object$_2$.
 Changes to Object$_1$ invalidate CPU$_2$'s cache line, and changes to Object$_2$ invalidate CPU$_1$'s cache line.
 …
 \end{figure}
 \newterm{Allocator-induced active false-sharing} occurs when objects are allocated within the same cache line but to different threads.
+\paragraph{\newterm{Allocator-induced active false-sharing}} occurs when objects are allocated within the same cache line but to different threads.
 For example, in \VRef[Figure]{f:AllocatorInducedActiveFalseSharing}, each task allocates an object and loads a cache-line of memory into its associated cache.
 Again, changes to Object$_1$ invalidate CPU$_2$'s cache line, and changes to Object$_2$ invalidate CPU$_1$'s cache line.
 \newterm{Allocator-induced passive false-sharing} is another form of allocator-induced false-sharing caused by program-induced false-sharing.
+\paragraph{\newterm{Allocator-induced passive false-sharing}} is another form of allocator-induced false-sharing caused by program-induced false-sharing.
 When an object in a program-induced false-sharing situation is deallocated, a future allocation of that object may cause passive false-sharing.
 For example, in \VRef[Figure]{f:AllocatorInducedPassiveFalseSharing}, Task$_1$ passes Object$_2$ to Task$_2$, and Task$_2$ subsequently deallocates Object$_2$.
 …
 \subsubsection{Heap Blowup}
+\subsection{Heap Blowup}
 \label{s:HeapBlowup}
 …
 \subsection{Multiple Heaps}
+\section{Multiple Heaps}
 \label{s:MultipleHeaps}
 …
 \subsubsection{Ownership}
+\subsection{Ownership}
 \label{s:Ownership}
 …
 \subsection{Object Containers}
+\section{Object Containers}
 \label{s:ObjectContainers}
 …
 \subsubsection{Container Ownership}
+\subsection{Container Ownership}
 \label{s:ContainerOwnership}
 …
 \subsubsection{Container Size}
+\subsection{Container Size}
 \label{s:ContainerSize}
 …
 \subsubsection{Container Free-Lists}
+\subsection{Container Free-Lists}
 \label{s:containersfreelists}
 …
 \subsection{Allocation Buffer}
+\section{Allocation Buffer}
 \label{s:AllocationBuffer}
 …
 \subsection{Lock-Free Operations}
+\section{Lock-Free Operations}
 \label{s:LockFreeOperations}
 …
 ====================
-\section{Background}
 % FIXME: cite wasik
 \cite{wasik.thesis}
 \subsection{Memory Allocation}
 With dynamic allocation being an important feature of C, there are many standalone memory allocators that have been designed for different purposes. For this thesis, we chose 7 of the most popular and widely used memory allocators.
+\section{Existing Memory Allocators}
+With dynamic allocation being an important feature of C, there are many stand-alone memory allocators that have been designed for different purposes. For this thesis, we chose 7 of the most popular and widely used memory allocators.
 \paragraph{dlmalloc}
 …
 \paragraph{hoard}
 Hoard (FIX ME: cite allocator) is a thread-safe allocator that is multi-threaded and using a heap layer framework. It has per-thred heaps that have thread-local free-lists, and a gloabl shared heap. (FIX ME: cite wasik)
+Hoard (FIX ME: cite allocator) is a thread-safe allocator that is multi-threaded and using a heap layer framework. It has per-thread heaps that have thread-local free-lists, and a global shared heap. (FIX ME: cite wasik)
 \paragraph{jemalloc}
 …
 \paragraph{rpmalloc}
 rpmalloc (FIX ME: cite allocator) is a thread-safe allocator that is multi-threaded and uses per-thread heap. Each heap has multiple size-classes and each size-calss contains memory regions of the relevant size.
+rpmalloc (FIX ME: cite allocator) is a thread-safe allocator that is multi-threaded and uses per-thread heap. Each heap has multiple size-classes and each size-class contains memory regions of the relevant size.
 \paragraph{tbb malloc}
 …
 \paragraph{tc malloc}
 tcmalloc (FIX ME: cite allocator) is a thread-safe allocator. It uses per-thread cache to store free objects that prevents contention on shared resources in multi-threaded application. A central free-list is used to refill per-thread cache when it gets empty.
-\subsection{Benchmarks}
-There are multiple benchmarks that are built individually and evaluate different aspects of a memory allocator. But, there is not standard set of benchamrks that can be used to evaluate multiple aspects of memory allocators.
-\paragraph{threadtest}
-(FIX ME: cite benchmark and hoard) Each thread repeatedly allocates and then deallocates 100,000 objects. Runtime of the benchmark evaluates its efficiency.
-\paragraph{shbench}
-(FIX ME: cite benchmark and hoard) Each thread allocates and randomly frees a number of random-sized objects. It is a stress test that also uses runtime to determine efficiency of the allocator.
-\paragraph{larson}
-(FIX ME: cite benchmark and hoard) Larson simulates a server environment. Multiple threads are created where each thread allocator and free a number of objects within a size range. Some objects are passed from threads to the child threads to free. It caluculates memory operations per second as an indicator of memory allocator's performance.

doc/theses/mubeen_zulfiqar_MMath/benchmarks.tex

-              rf53afafb
+              r3a038fa
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\section{Benchmarks}
+There are multiple benchmarks that are built individually and evaluate different aspects of a memory allocator. But, there is not standard set of benchamrks that can be used to evaluate multiple aspects of memory allocators.
+\paragraph{threadtest}
+(FIX ME: cite benchmark and hoard) Each thread repeatedly allocates and then deallocates 100,000 objects. Runtime of the benchmark evaluates its efficiency.
+\paragraph{shbench}
+(FIX ME: cite benchmark and hoard) Each thread allocates and randomly frees a number of random-sized objects. It is a stress test that also uses runtime to determine efficiency of the allocator.
+\paragraph{larson}
+(FIX ME: cite benchmark and hoard) Larson simulates a server environment. Multiple threads are created where each thread allocator and free a number of objects within a size range. Some objects are passed from threads to the child threads to free. It caluculates memory operations per second as an indicator of memory allocator's performance.
 \section{Performance Matrices of Memory Allocators}

doc/theses/mubeen_zulfiqar_MMath/figures/AllocatorComponents.fig

-              rf53afafb
+              r3a038fa
 #FIG 3.2  Produced by xfig version 3.2.5
+#FIG 3.2  Produced by xfig version 3.2.7b
 Landscape
 Center
 Inches
 Letter
+Letter
 .00
 Single
 -2
 2
-2325 2775 3000 3150
-2 0 50 -1 0 11 0.0000 2 135 660 3000 2925 reserved\001
-2 0 50 -1 0 11 0.0000 2 135 600 3000 3075 memory\001
--6
 2400 1575 3000 1950
 2 0 50 -1 0 11 0.0000 2 180 555 3000 1875 objects\001
 2 0 50 -1 0 11 0.0000 2 135 300 3000 1725 free\001
+2 0 50 -1 0 11 0.0000 2 165 495 3000 1875 objects\001
+2 0 50 -1 0 11 0.0000 2 120 270 3000 1725 free\001
 -6
 2400 2100 2700 2700
 …
 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
 2100 2700 2400 2400 2400 2400 2100 2700 2100
+-6
+2325 2850 3000 3225
+2 0 50 -1 0 11 0.0000 2 135 600 3000 3000 reserved\001
+2 0 50 -1 0 11 0.0000 2 135 585 3000 3150 memory\001
 -6
 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
 …
 2 0 1 0 7 60 -1 13 0.000 0 0 -1 0 0 5
 2700 6300 2700 6300 3300 3300 3300 3300 2700
-2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
-3300 3600 3300 3600 3600 3300 3600 3300 3300
-2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
-3300 5100 3300 5100 3600 4500 3600 4500 3300
-2 0 1 0 7 50 -1 17 0.000 0 0 -1 0 0 5
-3300 6300 3300 6300 3600 5100 3600 5100 3300
-2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
-3600 4500 3600 4500 3900 3300 3900 3300 3600
-2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
-3600 5400 3600 5400 3900 4500 3900 4500 3600
-2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
-3600 6300 3600 6300 3900 5400 3900 5400 3600
-2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
-3900 3900 3900 3900 4200 3300 4200 3300 3900
-2 0 1 0 7 50 -1 17 0.000 0 0 -1 0 0 5
-3900 5700 3900 5700 4200 5100 4200 5100 3900
-2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
-3900 5100 3900 5100 4200 3900 4200 3900 3900
-2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
-3900 6300 3900 6300 4200 5700 4200 5700 3900
 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 1 0 2
 1 1.00 45.00 90.00
 …
 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 1 0 2
 1 1.00 45.00 90.00
-1950 5100 3300
-1 0 1 0 7 50 -1 -1 0.000 0 0 -1 1 0 2
-1 1.00 45.00 90.00
-3450 5100 3900
-1 0 1 0 7 50 -1 -1 0.000 0 0 -1 1 0 2
-1 1.00 45.00 90.00
 2250 3300 1800
 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 1 0 2
 …
 2550 3300 2700
 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2
 1275 3150 4200
 1 0 50 -1 0 11 0.0000 2 135 885 2325 1425 Static Zone\001
 1 0 50 -1 0 11 0.0000 2 180 1950 4800 1425 Dynamic-Allocation Zone\001
 0 0 50 -1 2 11 0.0000 2 180 645 3300 1725 Storage\001
 2 0 50 -1 2 11 0.0000 2 180 1050 2325 2475 Management\001
+1275 3150 3300
+1 0 50 -1 0 11 0.0000 2 120 795 2325 1425 Static Zone\001
+1 0 50 -1 0 11 0.0000 2 165 1845 4800 1425 Dynamic-Allocation Zone\001
+0 0 50 -1 2 11 0.0000 2 165 585 3300 1725 Storage\001
+2 0 50 -1 2 11 0.0000 2 165 1005 2325 2475 Management\001

doc/theses/mubeen_zulfiqar_MMath/intro.tex

-              rf53afafb
+              r3a038fa
 \chapter{Introduction}
-\section{Introduction}
 % Shared-memory multi-processor computers are ubiquitous and important for improving application performance.
 …
 % Therefore, providing high-performance, scalable memory-management is important for virtually all shared-memory multi-threaded programs.
+\vspace*{-23pt}
 Memory management takes a sequence of program generated allocation/deallocation requests and attempts to satisfy them within a fixed-sized block of memory while minimizing the total amount of memory used.
 A general-purpose dynamic-allocation algorithm cannot anticipate future allocation requests so its output is rarely optimal.
 …
 \subsection{Memory Structure}
+\section{Memory Structure}
 \label{s:MemoryStructure}
 …
 Dynamic code/data memory is managed by the dynamic loader for libraries loaded at runtime, which is complex especially in a multi-threaded program~\cite{Huang06}.
 However, changes to the dynamic code/data space are typically infrequent, many occurring at program startup, and are largely outside of a program's control.
 Stack memory is managed by the program call-mechanism using simple LIFO management, which works well for sequential programs.
+Stack memory is managed by the program call-mechanism using a simple LIFO technique, which works well for sequential programs.
 For multi-threaded programs (and coroutines), a new stack is created for each thread;
 these thread stacks are commonly created in dynamic-allocation memory.
 …
 \subsection{Dynamic Memory-Management}
+\section{Dynamic Memory-Management}
 \label{s:DynamicMemoryManagement}
 Modern programming languages manage dynamic-allocation memory in different ways.
 Some languages, such as Lisp~\cite{CommonLisp}, Java~\cite{Java}, Go~\cite{Go}, Haskell~\cite{Haskell}, provide explicit allocation but \emph{implicit} deallocation of data through garbage collection~\cite{Wilson92}.
+Some languages, such as Lisp~\cite{CommonLisp}, Java~\cite{Java}, Haskell~\cite{Haskell}, Go~\cite{Go}, provide explicit allocation but \emph{implicit} deallocation of data through garbage collection~\cite{Wilson92}.
 In general, garbage collection supports memory compaction, where dynamic (live) data is moved during runtime to better utilize space.
 However, moving data requires finding pointers to it and updating them to reflect new data locations.
 …
 However, high-performance memory-allocators for kernel and user multi-threaded programs are still being designed and improved.
 For this reason, several alternative general-purpose allocators have been written for C/\CC with the goal of scaling in a multi-threaded program~\cite{Berger00,mtmalloc,streamflow,tcmalloc}.
 This work examines the design of high-performance allocators for use by kernel and user multi-threaded applications written in C/\CC.
 \subsection{Contributions}
+This thesis examines the design of high-performance allocators for use by kernel and user multi-threaded applications written in C/\CC.
+\section{Contributions}
 \label{s:Contributions}
 This work provides the following contributions in the area of concurrent dynamic allocation:
+\begin{enumerate}
+\item
+Implementation of a new stand-lone concurrent memory allocator ($\approx$1,200 lines of code) for C/\CC programs using kernel threads (1:1 threading), and specialized versions of the allocator for programming languages \uC and \CFA using user-level threads running over multiple kernel threads (M:N threading).
+\item
+Adopt the return of @nullptr@ for a zero-sized allocation, rather than an actual memory address, both of which can be passed to @free@.
+Most allocators use @nullptr@ to indicate an allocation failure, such as full memory;
+hence the need to return an alternate value for a zero-sized allocation.
+The alternative is to abort the program on allocation failure.
+In theory, notifying the programmer of a failure allows recovery;
+in practice, it is almost impossible to gracefully recover from allocation failure, especially full memory, so adopting the cheaper return @nullptr@ for a zero-sized allocation is chosen.
+\item
+Extended the standard C heap functionality by preserving with each allocation its original request size versus the amount allocated due to bucketing, if an allocation is zero fill, and the allocation alignment.
+\begin{enumerate}[leftmargin=*]
+\item
+Implementation of a new stand-lone concurrent low-latency memory-allocator ($\approx$1,200 lines of code) for C/\CC programs using kernel threads (1:1 threading), and specialized versions of the allocator for programming languages \uC and \CFA using user-level threads running over multiple kernel threads (M:N threading).
+\item
+Adopt returning of @nullptr@ for a zero-sized allocation, rather than an actual memory address, both of which can be passed to @free@.
+\item
+Extended the standard C heap functionality by preserving with each allocation its original request size versus the amount allocated, if an allocation is zero fill, and the allocation alignment.
 \item
 …
 \item
+Provide complete and fast allocation statistics to help understand program behaviour:
+Provide mostly contention-free allocation and free operations via a heap-per-kernel-thread implementation.
+\item
+Provide complete, fast, and contention-free allocation statistics to help understand program behaviour:
 \begin{itemize}
 \item
 …
 \item
+Provide mostly contention-free allocation and free operations via a heap-per-kernel-thread implementation.
+\item
+Provide extensive contention-free runtime checks to valid allocation operations and identify the amount of unfreed storage at program termination.
+Provide extensive runtime checks to valid allocation operations and identify the amount of unfreed storage at program termination.
 \item

Note: See TracChangeset for help on using the changeset viewer.

Download in other formats: