Ignore:
Timestamp:
Feb 23, 2022, 10:31:12 AM (3 years ago)
Author:
Peter A. Buhr <pabuhr@…>
Branches:
ADT, ast-experimental, enum, master, pthread-emulation, qualifiedEnum
Children:
cc7bbe6
Parents:
f53afafb
Message:

more updates to added text

Location:
doc/theses/mubeen_zulfiqar_MMath
Files:
5 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/mubeen_zulfiqar_MMath/allocator.tex

    rf53afafb r3a038fa  
    118118We added a few more features and routines to the allocator's C interface that can make the allocator more usable to the programmers. THese features will programmer more control on the dynamic memory allocation.
    119119
     120\subsection{Out of Memory}
     121
     122Most allocators use @nullptr@ to indicate an allocation failure, specifically out of memory;
     123hence the need to return an alternate value for a zero-sized allocation.
     124The alternative is to abort a program when out of memory.
     125In theory, notifying the programmer allows recovery;
     126in practice, it is almost impossible to gracefully when out of memory, so the cheaper approach of returning @nullptr@ for a zero-sized allocation is chosen.
     127
     128
    120129\subsection{\lstinline{void * aalloc( size_t dim, size_t elemSize )}}
    121130@aalloc@ is an extension of malloc. It allows programmer to allocate a dynamic array of objects without calculating the total size of array explicitly. The only alternate of this routine in the other allocators is calloc but calloc also fills the dynamic memory with 0 which makes it slower for a programmer who only wants to dynamically allocate an array of objects without filling it with 0.
  • doc/theses/mubeen_zulfiqar_MMath/background.tex

    rf53afafb r3a038fa  
    11\chapter{Background}
    22
    3 
    4 
    5 \section{Memory-Allocator Background}
    6 \label{s:MemoryAllocatorBackground}
    73
    84A program dynamically allocates and deallocates the storage for a variable, referred to as an \newterm{object}, through calls such as @malloc@ and @free@ in C, and @new@ and @delete@ in \CC.
    95Space for each allocated object comes from the dynamic-allocation zone.
    10 A \newterm{memory allocator} is a complex data-structure and code that manages the layout of objects in the dynamic-allocation zone.
     6A \newterm{memory allocator} contains a complex data-structure and code that manages the layout of objects in the dynamic-allocation zone.
    117The management goals are to make allocation/deallocation operations as fast as possible while densely packing objects to make efficient use of memory.
    12 Objects cannot be moved to aid the packing process.
     8Objects in C/\CC cannot be moved to aid the packing process, only adjacent free storage can be \newterm{coalesced} into larger free areas.
    139The allocator grows or shrinks the dynamic-allocation zone to obtain storage for objects and reduce memory usage via operating-system calls, such as @mmap@ or @sbrk@ in UNIX.
    1410
    1511
    16 \subsection{Allocator Components}
     12\section{Allocator Components}
    1713\label{s:AllocatorComponents}
    1814
    19 There are two important parts to a memory allocator, management and storage data (see \VRef[Figure]{f:AllocatorComponents}), collectively called the \newterm{heap}.
     15\VRef[Figure]{f:AllocatorComponents} shows the two important data components for a memory allocator, management and storage, collectively called the \newterm{heap}.
    2016The \newterm{management data} is a data structure located at a known memory address and contains all information necessary to manage the storage data.
    2117The management data starts with fixed-sized information in the static-data memory that flows into the dynamic-allocation memory.
    22 The \newterm{storage data} is composed of allocated and freed objects, and reserved memory.
    23 Allocated objects (white) are variable sized and allocated to and maintained by the program.
    24 Freed objects (light grey) are memory deallocated by the program that is linked to form a list facilitating easy location of storage for new allocations.
     18The \newterm{storage data} is composed of allocated and freed objects, and \newterm{reserved memory}.
     19Allocated objects (white) are variable sized, and allocated and maintained by the program;
     20\ie only the program knows the location of allocated storage, not the memory allocator.
     21\begin{figure}[h]
     22\centering
     23\input{AllocatorComponents}
     24\caption{Allocator Components (Heap)}
     25\label{f:AllocatorComponents}
     26\end{figure}
     27Freed objects (light grey) are memory deallocated by the program, which are linked into one or more lists facilitating easy location for new allocations.
    2528Often the free list is chained internally so it does not consume additional storage, \ie the link fields are placed at known locations in the unused memory blocks.
    2629Reserved memory (dark grey) is one or more blocks of memory obtained from the operating system but not yet allocated to the program;
    2730if there are multiple reserved blocks, they are also chained together, usually internally.
    2831
    29 \begin{figure}
    30 \centering
    31 \input{AllocatorComponents}
    32 \caption{Allocator Components (Heap)}
    33 \label{f:AllocatorComponents}
    34 \end{figure}
    35 
    3632Allocated and freed objects typically have additional management data embedded within them.
    37 \VRef[Figure]{f:AllocatedObject} shows an allocated object with a header, trailer, and padding/spacing around the object.
     33\VRef[Figure]{f:AllocatedObject} shows an allocated object with a header, trailer, and alignment padding and spacing around the object.
    3834The header contains information about the object, \eg size, type, etc.
    3935The trailer may be used to simplify an allocation implementation, \eg coalescing, and/or for security purposes to mark the end of an object.
     
    5349
    5450
    55 \subsection{Single-Threaded Memory-Allocator}
     51\section{Single-Threaded Memory-Allocator}
    5652\label{s:SingleThreadedMemoryAllocator}
    5753
     
    6157
    6258
    63 \subsubsection{Fragmentation}
     59\subsection{Fragmentation}
    6460\label{s:Fragmentation}
    6561
    6662Fragmentation is memory requested from the operating system but not used by the program;
    6763hence, allocated objects are not fragmentation.
    68 Fragmentation is often divided into internal or external (see~\VRef[Figure]{f:InternalExternalFragmentation}).
     64\VRef[Figure]{f:InternalExternalFragmentation}) shows fragmentation is divided into two forms: internal or external.
    6965
    7066\begin{figure}
     
    7672
    7773\newterm{Internal fragmentation} is memory space that is allocated to the program, but is not intended to be accessed by the program, such as headers, trailers, padding, and spacing around an allocated object.
    78 This memory is typically used by the allocator for management purposes or required by the architecture for correctness (\eg alignment).
     74This memory is typically used by the allocator for management purposes or required by the architecture for correctness, \eg alignment.
    7975Internal fragmentation is problematic when management space is a significant proportion of an allocated object.
    8076For example, if internal fragmentation is as large as the object being managed, then the memory usage for that object is doubled.
     
    8480This memory is problematic in two ways: heap blowup and highly fragmented memory.
    8581\newterm{Heap blowup} occurs when memory freed by the program is not reused for future allocations leading to potentially unbounded external fragmentation growth~\cite{Berger00}.
    86 Heap blowup can occur due to allocator policies that are too restrictive in reusing freed memory.
     82Heap blowup can occur due to allocator policies that are too restrictive in reusing freed memory and/or no coalescing of free storage.
    8783Memory can become \newterm{highly fragmented} after multiple allocations and deallocations of objects.
    8884\VRef[Figure]{f:MemoryFragmentation} shows an example of how a small block of memory fragments as objects are allocated and deallocated over time.
     
    9187For example, \VRef[Figure]{f:Contiguous} and \VRef[Figure]{f:HighlyFragmented} have the same quantity of external fragmentation, but \VRef[Figure]{f:HighlyFragmented} is highly fragmented.
    9288If there is a request to allocate a large object, \VRef[Figure]{f:Contiguous} is more likely to be able to satisfy it with existing free memory, while \VRef[Figure]{f:HighlyFragmented} likely has to request more memory from the operating system.
    93 
    94 For a single-threaded memory allocator, three basic approaches for controlling fragmentation have been identified~\cite{Johnstone99}.
    95 The first approach is a \newterm{sequential-fit algorithm} with one list of free objects that is searched for a block large enough to fit a requested object size.
    96 Different search policies determine the free object selected, \eg the first free object large enough or closest to the requested size.
    97 Any storage larger than the request can become spacing after the object or be split into a smaller free object.
    98 The cost of the search depends on the shape and quality of the free list, \eg a linear versus a binary-tree free-list, a sorted versus unsorted free-list.
    9989
    10090\begin{figure}
     
    115105\label{f:FragmentationQuality}
    116106\end{figure}
     107
     108For a single-threaded memory allocator, three basic approaches for controlling fragmentation have been identified~\cite{Johnstone99}.
     109The first approach is a \newterm{sequential-fit algorithm} with one list of free objects that is searched for a block large enough to fit a requested object size.
     110Different search policies determine the free object selected, \eg the first free object large enough or closest to the requested size.
     111Any storage larger than the request can become spacing after the object or be split into a smaller free object.
     112The cost of the search depends on the shape and quality of the free list, \eg a linear versus a binary-tree free-list, a sorted versus unsorted free-list.
    117113
    118114The second approach is a \newterm{segregated} or \newterm{binning algorithm} with a set of lists for different sized freed objects.
     
    131127When an object is deallocated it is coalesced with the objects immediately before and after it in memory, if they are free, turning them into one larger object.
    132128Coalescing can be done eagerly at each deallocation or lazily when an allocation cannot be fulfilled.
     129In all cases, coalescing increases allocation latency, hence some allocations can cause unbounded delays during coalescing.
    133130While coalescing does not reduce external fragmentation, the coalesced blocks improve fragmentation quality so future allocations are less likely to cause heap blowup.
    134131Splitting and coalescing can be used with other algorithms to avoid highly fragmented memory.
    135132
    136133
    137 \subsubsection{Locality}
     134\subsection{Locality}
    138135\label{s:Locality}
    139136
     
    144141Hardware takes advantage of temporal and spatial locality through multiple levels of caching (\ie memory hierarchy).
    145142When an object is accessed, the memory physically located around the object is also cached with the expectation that the current and nearby objects will be referenced within a short period of time.
    146 For example, entire cache lines are transfered between memory and cache and entire virtual-memory pages are transferred between disk and memory.
    147 A program exhibiting good locality has better performance due to fewer cache misses and page faults.
     143For example, entire cache lines are transferred between memory and cache and entire virtual-memory pages are transferred between disk and memory.
     144A program exhibiting good locality has better performance due to fewer cache misses and page faults\footnote{With the advent of large RAM memory, paging is becoming less of an issue in modern programming.}.
    148145
    149146Temporal locality is largely controlled by how a program accesses its variables~\cite{Feng05}.
     
    151148For temporal locality, an allocator can return storage for new allocations that was just freed as these memory locations are still \emph{warm} in the memory hierarchy.
    152149For spatial locality, an allocator can place objects used together close together in memory, so the working set of the program fits into the fewest possible cache lines and pages.
    153 However, usage patterns are different for every program as is the underlying hardware architecture (\ie memory hierarchy);
     150However, usage patterns are different for every program as is the underlying hardware memory architecture;
    154151hence, no general-purpose memory-allocator can provide ideal locality for every program on every computer.
    155152
     
    161158
    162159
    163 \subsection{Multi-Threaded Memory-Allocator}
     160\section{Multi-Threaded Memory-Allocator}
    164161\label{s:MultiThreadedMemoryAllocator}
    165162
    166163A multi-threaded memory-allocator does not run any threads itself, but is used by a multi-threaded program.
    167 In addition to single-threaded design issues of locality and fragmentation, a multi-threaded allocator may be simultaneously accessed by multiple threads, and hence, must deal with concurrency issues such as mutual exclusion, false sharing, and additional forms of heap blowup.
    168 
    169 
    170 \subsubsection{Mutual Exclusion}
     164In addition to single-threaded design issues of locality and fragmentation, a multi-threaded allocator may be simultaneously accessed by multiple threads, and hence, must deal with concurrency issues such as ymutual exclusion, false sharing, and additional forms of heap blowup.
     165
     166
     167\subsection{Mutual Exclusion}
    171168\label{s:MutualExclusion}
    172169
    173 Mutual exclusion provides sequential access to the management data of the heap.
     170\newterm{Mutual exclusion} provides sequential access to the shared management data of the heap.
    174171There are two performance issues for mutual exclusion.
    175172First is the overhead necessary to perform (at least) a hardware atomic operation every time a shared resource is accessed.
     
    177174Contention can be reduced in a number of ways:
    178175using multiple fine-grained locks versus a single lock, spreading the contention across a number of locks;
    179 using trylock and generating new storage if the lock is busy, yielding a space vs contention trade-off;
     176using trylock and generating new storage if the lock is busy, yielding a classic space versus time tradeoff;
    180177using one of the many lock-free approaches for reducing contention on basic data-structure operations~\cite{Oyama99}.
    181178However, all of these approaches have degenerate cases where contention occurs.
    182179
    183180
    184 \subsubsection{False Sharing}
     181\subsection{False Sharing}
    185182\label{s:FalseSharing}
    186183
    187184False sharing is a dynamic phenomenon leading to cache thrashing.
    188 When two or more threads on separate CPUs simultaneously change different objects sharing a cache line, the change invalidates the other thread's associated cache, even though these threads may be uninterested in the modified object.
     185When two or more threads on separate CPUs simultaneously change different objects sharing a cache line, the change invalidates the other thread's associated cache, even though these threads may be uninterested in the other modified object.
    189186False sharing can occur in three different ways: program induced, allocator-induced active, and allocator-induced passive;
    190187a memory allocator can only affect the latter two.
    191188
    192 \newterm{Program-induced false-sharing} occurs when one thread passes an object sharing a cache line to another thread, and both threads modify the respective objects.
    193 For example, in \VRef[Figure]{f:ProgramInducedFalseSharing}, when Task$_1$ passes Object$_2$ to Task$_2$, a false-sharing situation forms when Task$_1$ modifies Object$_1$ and Task$_2$ modifies Object$_2$.
     189\paragraph{\newterm{Program-induced false-sharing}} occurs when one thread passes an object sharing a cache line to another thread, and both threads modify the respective objects.
     190\VRef[Figure]{f:ProgramInducedFalseSharing} shows when Task$_1$ passes Object$_2$ to Task$_2$, a false-sharing situation forms when Task$_1$ modifies Object$_1$ and Task$_2$ modifies Object$_2$.
    194191Changes to Object$_1$ invalidate CPU$_2$'s cache line, and changes to Object$_2$ invalidate CPU$_1$'s cache line.
    195192
     
    214211\end{figure}
    215212
    216 \newterm{Allocator-induced active false-sharing} occurs when objects are allocated within the same cache line but to different threads.
     213\paragraph{\newterm{Allocator-induced active false-sharing}} occurs when objects are allocated within the same cache line but to different threads.
    217214For example, in \VRef[Figure]{f:AllocatorInducedActiveFalseSharing}, each task allocates an object and loads a cache-line of memory into its associated cache.
    218215Again, changes to Object$_1$ invalidate CPU$_2$'s cache line, and changes to Object$_2$ invalidate CPU$_1$'s cache line.
    219216
    220 \newterm{Allocator-induced passive false-sharing} is another form of allocator-induced false-sharing caused by program-induced false-sharing.
     217\paragraph{\newterm{Allocator-induced passive false-sharing}} is another form of allocator-induced false-sharing caused by program-induced false-sharing.
    221218When an object in a program-induced false-sharing situation is deallocated, a future allocation of that object may cause passive false-sharing.
    222219For example, in \VRef[Figure]{f:AllocatorInducedPassiveFalseSharing}, Task$_1$ passes Object$_2$ to Task$_2$, and Task$_2$ subsequently deallocates Object$_2$.
     
    224221
    225222
    226 \subsubsection{Heap Blowup}
     223\subsection{Heap Blowup}
    227224\label{s:HeapBlowup}
    228225
     
    256253
    257254
    258 \subsection{Multiple Heaps}
     255\section{Multiple Heaps}
    259256\label{s:MultipleHeaps}
    260257
     
    373370
    374371
    375 \subsubsection{Ownership}
     372\subsection{Ownership}
    376373\label{s:Ownership}
    377374
     
    428425
    429426
    430 \subsection{Object Containers}
     427\section{Object Containers}
    431428\label{s:ObjectContainers}
    432429
     
    490487
    491488
    492 \subsubsection{Container Ownership}
     489\subsection{Container Ownership}
    493490\label{s:ContainerOwnership}
    494491
     
    562559
    563560
    564 \subsubsection{Container Size}
     561\subsection{Container Size}
    565562\label{s:ContainerSize}
    566563
     
    602599
    603600
    604 \subsubsection{Container Free-Lists}
     601\subsection{Container Free-Lists}
    605602\label{s:containersfreelists}
    606603
     
    676673
    677674
    678 \subsection{Allocation Buffer}
     675\section{Allocation Buffer}
    679676\label{s:AllocationBuffer}
    680677
     
    703700
    704701
    705 \subsection{Lock-Free Operations}
     702\section{Lock-Free Operations}
    706703\label{s:LockFreeOperations}
    707704
     
    741738====================
    742739
    743 \section{Background}
    744 
    745740% FIXME: cite wasik
    746741\cite{wasik.thesis}
    747742
    748 \subsection{Memory Allocation}
    749 With dynamic allocation being an important feature of C, there are many standalone memory allocators that have been designed for different purposes. For this thesis, we chose 7 of the most popular and widely used memory allocators.
     743\section{Existing Memory Allocators}
     744With dynamic allocation being an important feature of C, there are many stand-alone memory allocators that have been designed for different purposes. For this thesis, we chose 7 of the most popular and widely used memory allocators.
    750745
    751746\paragraph{dlmalloc}
     
    753748
    754749\paragraph{hoard}
    755 Hoard (FIX ME: cite allocator) is a thread-safe allocator that is multi-threaded and using a heap layer framework. It has per-thred heaps that have thread-local free-lists, and a gloabl shared heap. (FIX ME: cite wasik)
     750Hoard (FIX ME: cite allocator) is a thread-safe allocator that is multi-threaded and using a heap layer framework. It has per-thread heaps that have thread-local free-lists, and a global shared heap. (FIX ME: cite wasik)
    756751
    757752\paragraph{jemalloc}
     
    762757
    763758\paragraph{rpmalloc}
    764 rpmalloc (FIX ME: cite allocator) is a thread-safe allocator that is multi-threaded and uses per-thread heap. Each heap has multiple size-classes and each size-calss contains memory regions of the relevant size.
     759rpmalloc (FIX ME: cite allocator) is a thread-safe allocator that is multi-threaded and uses per-thread heap. Each heap has multiple size-classes and each size-class contains memory regions of the relevant size.
    765760
    766761\paragraph{tbb malloc}
     
    769764\paragraph{tc malloc}
    770765tcmalloc (FIX ME: cite allocator) is a thread-safe allocator. It uses per-thread cache to store free objects that prevents contention on shared resources in multi-threaded application. A central free-list is used to refill per-thread cache when it gets empty.
    771 
    772 \subsection{Benchmarks}
    773 There are multiple benchmarks that are built individually and evaluate different aspects of a memory allocator. But, there is not standard set of benchamrks that can be used to evaluate multiple aspects of memory allocators.
    774 
    775 \paragraph{threadtest}
    776 (FIX ME: cite benchmark and hoard) Each thread repeatedly allocates and then deallocates 100,000 objects. Runtime of the benchmark evaluates its efficiency.
    777 
    778 \paragraph{shbench}
    779 (FIX ME: cite benchmark and hoard) Each thread allocates and randomly frees a number of random-sized objects. It is a stress test that also uses runtime to determine efficiency of the allocator.
    780 
    781 \paragraph{larson}
    782 (FIX ME: cite benchmark and hoard) Larson simulates a server environment. Multiple threads are created where each thread allocator and free a number of objects within a size range. Some objects are passed from threads to the child threads to free. It caluculates memory operations per second as an indicator of memory allocator's performance.
  • doc/theses/mubeen_zulfiqar_MMath/benchmarks.tex

    rf53afafb r3a038fa  
    4141%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    4242
     43
     44\section{Benchmarks}
     45There are multiple benchmarks that are built individually and evaluate different aspects of a memory allocator. But, there is not standard set of benchamrks that can be used to evaluate multiple aspects of memory allocators.
     46
     47\paragraph{threadtest}
     48(FIX ME: cite benchmark and hoard) Each thread repeatedly allocates and then deallocates 100,000 objects. Runtime of the benchmark evaluates its efficiency.
     49
     50\paragraph{shbench}
     51(FIX ME: cite benchmark and hoard) Each thread allocates and randomly frees a number of random-sized objects. It is a stress test that also uses runtime to determine efficiency of the allocator.
     52
     53\paragraph{larson}
     54(FIX ME: cite benchmark and hoard) Larson simulates a server environment. Multiple threads are created where each thread allocator and free a number of objects within a size range. Some objects are passed from threads to the child threads to free. It caluculates memory operations per second as an indicator of memory allocator's performance.
     55
     56
    4357\section{Performance Matrices of Memory Allocators}
    4458
  • doc/theses/mubeen_zulfiqar_MMath/figures/AllocatorComponents.fig

    rf53afafb r3a038fa  
    1 #FIG 3.2  Produced by xfig version 3.2.5
     1#FIG 3.2  Produced by xfig version 3.2.7b
    22Landscape
    33Center
    44Inches
    5 Letter 
     5Letter
    66100.00
    77Single
    88-2
    991200 2
    10 6 2325 2775 3000 3150
    11 4 2 0 50 -1 0 11 0.0000 2 135 660 3000 2925 reserved\001
    12 4 2 0 50 -1 0 11 0.0000 2 135 600 3000 3075 memory\001
    13 -6
    14106 2400 1575 3000 1950
    15 4 2 0 50 -1 0 11 0.0000 2 180 555 3000 1875 objects\001
    16 4 2 0 50 -1 0 11 0.0000 2 135 300 3000 1725 free\001
     114 2 0 50 -1 0 11 0.0000 2 165 495 3000 1875 objects\001
     124 2 0 50 -1 0 11 0.0000 2 120 270 3000 1725 free\001
    1713-6
    18146 2400 2100 2700 2700
     
    21172 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
    2218         2700 2100 2700 2400 2400 2400 2400 2100 2700 2100
     19-6
     206 2325 2850 3000 3225
     214 2 0 50 -1 0 11 0.0000 2 135 600 3000 3000 reserved\001
     224 2 0 50 -1 0 11 0.0000 2 135 585 3000 3150 memory\001
    2323-6
    24242 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
     
    46462 2 0 1 0 7 60 -1 13 0.000 0 0 -1 0 0 5
    4747         3300 2700 6300 2700 6300 3300 3300 3300 3300 2700
    48 2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
    49          3300 3300 3600 3300 3600 3600 3300 3600 3300 3300
    50 2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
    51          4500 3300 5100 3300 5100 3600 4500 3600 4500 3300
    52 2 2 0 1 0 7 50 -1 17 0.000 0 0 -1 0 0 5
    53          5100 3300 6300 3300 6300 3600 5100 3600 5100 3300
    54 2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
    55          3300 3600 4500 3600 4500 3900 3300 3900 3300 3600
    56 2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
    57          4500 3600 5400 3600 5400 3900 4500 3900 4500 3600
    58 2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
    59          5400 3600 6300 3600 6300 3900 5400 3900 5400 3600
    60 2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
    61          3300 3900 3900 3900 3900 4200 3300 4200 3300 3900
    62 2 2 0 1 0 7 50 -1 17 0.000 0 0 -1 0 0 5
    63          5100 3900 5700 3900 5700 4200 5100 4200 5100 3900
    64 2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
    65          3900 3900 5100 3900 5100 4200 3900 4200 3900 3900
    66 2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
    67          5700 3900 6300 3900 6300 4200 5700 4200 5700 3900
    68482 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 1 0 2
    6949        1 1 1.00 45.00 90.00
     
    80602 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 1 0 2
    8161        1 1 1.00 45.00 90.00
    82          5850 1950 5100 3300
    83 2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 1 0 2
    84         1 1 1.00 45.00 90.00
    85          5700 3450 5100 3900
    86 2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 1 0 2
    87         1 1 1.00 45.00 90.00
    8862         2550 2250 3300 1800
    89632 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 1 0 2
     
    9165         2550 2550 3300 2700
    92662 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2
    93          3150 1275 3150 4200
    94 4 1 0 50 -1 0 11 0.0000 2 135 885 2325 1425 Static Zone\001
    95 4 1 0 50 -1 0 11 0.0000 2 180 1950 4800 1425 Dynamic-Allocation Zone\001
    96 4 0 0 50 -1 2 11 0.0000 2 180 645 3300 1725 Storage\001
    97 4 2 0 50 -1 2 11 0.0000 2 180 1050 2325 2475 Management\001
     67         3150 1275 3150 3300
     684 1 0 50 -1 0 11 0.0000 2 120 795 2325 1425 Static Zone\001
     694 1 0 50 -1 0 11 0.0000 2 165 1845 4800 1425 Dynamic-Allocation Zone\001
     704 0 0 50 -1 2 11 0.0000 2 165 585 3300 1725 Storage\001
     714 2 0 50 -1 2 11 0.0000 2 165 1005 2325 2475 Management\001
  • doc/theses/mubeen_zulfiqar_MMath/intro.tex

    rf53afafb r3a038fa  
    11\chapter{Introduction}
    2 
    3 
    4 \section{Introduction}
    52
    63% Shared-memory multi-processor computers are ubiquitous and important for improving application performance.
     
    96% Therefore, providing high-performance, scalable memory-management is important for virtually all shared-memory multi-threaded programs.
    107
     8\vspace*{-23pt}
    119Memory management takes a sequence of program generated allocation/deallocation requests and attempts to satisfy them within a fixed-sized block of memory while minimizing the total amount of memory used.
    1210A general-purpose dynamic-allocation algorithm cannot anticipate future allocation requests so its output is rarely optimal.
     
    1614
    1715
    18 \subsection{Memory Structure}
     16\section{Memory Structure}
    1917\label{s:MemoryStructure}
    2018
     
    2523Dynamic code/data memory is managed by the dynamic loader for libraries loaded at runtime, which is complex especially in a multi-threaded program~\cite{Huang06}.
    2624However, changes to the dynamic code/data space are typically infrequent, many occurring at program startup, and are largely outside of a program's control.
    27 Stack memory is managed by the program call-mechanism using simple LIFO management, which works well for sequential programs.
     25Stack memory is managed by the program call-mechanism using a simple LIFO technique, which works well for sequential programs.
    2826For multi-threaded programs (and coroutines), a new stack is created for each thread;
    2927these thread stacks are commonly created in dynamic-allocation memory.
     
    3937
    4038
    41 \subsection{Dynamic Memory-Management}
     39\section{Dynamic Memory-Management}
    4240\label{s:DynamicMemoryManagement}
    4341
    4442Modern programming languages manage dynamic-allocation memory in different ways.
    45 Some languages, such as Lisp~\cite{CommonLisp}, Java~\cite{Java}, Go~\cite{Go}, Haskell~\cite{Haskell}, provide explicit allocation but \emph{implicit} deallocation of data through garbage collection~\cite{Wilson92}.
     43Some languages, such as Lisp~\cite{CommonLisp}, Java~\cite{Java}, Haskell~\cite{Haskell}, Go~\cite{Go}, provide explicit allocation but \emph{implicit} deallocation of data through garbage collection~\cite{Wilson92}.
    4644In general, garbage collection supports memory compaction, where dynamic (live) data is moved during runtime to better utilize space.
    4745However, moving data requires finding pointers to it and updating them to reflect new data locations.
     
    5856However, high-performance memory-allocators for kernel and user multi-threaded programs are still being designed and improved.
    5957For this reason, several alternative general-purpose allocators have been written for C/\CC with the goal of scaling in a multi-threaded program~\cite{Berger00,mtmalloc,streamflow,tcmalloc}.
    60 This work examines the design of high-performance allocators for use by kernel and user multi-threaded applications written in C/\CC.
    61 
    62 
    63 \subsection{Contributions}
     58This thesis examines the design of high-performance allocators for use by kernel and user multi-threaded applications written in C/\CC.
     59
     60
     61\section{Contributions}
    6462\label{s:Contributions}
    6563
    6664This work provides the following contributions in the area of concurrent dynamic allocation:
    67 \begin{enumerate}
    68 \item
    69 Implementation of a new stand-lone concurrent memory allocator ($\approx$1,200 lines of code) for C/\CC programs using kernel threads (1:1 threading), and specialized versions of the allocator for programming languages \uC and \CFA using user-level threads running over multiple kernel threads (M:N threading).
    70 
    71 \item
    72 Adopt the return of @nullptr@ for a zero-sized allocation, rather than an actual memory address, both of which can be passed to @free@.
    73 Most allocators use @nullptr@ to indicate an allocation failure, such as full memory;
    74 hence the need to return an alternate value for a zero-sized allocation.
    75 The alternative is to abort the program on allocation failure.
    76 In theory, notifying the programmer of a failure allows recovery;
    77 in practice, it is almost impossible to gracefully recover from allocation failure, especially full memory, so adopting the cheaper return @nullptr@ for a zero-sized allocation is chosen.
    78 
    79 \item
    80 Extended the standard C heap functionality by preserving with each allocation its original request size versus the amount allocated due to bucketing, if an allocation is zero fill, and the allocation alignment.
     65\begin{enumerate}[leftmargin=*]
     66\item
     67Implementation of a new stand-lone concurrent low-latency memory-allocator ($\approx$1,200 lines of code) for C/\CC programs using kernel threads (1:1 threading), and specialized versions of the allocator for programming languages \uC and \CFA using user-level threads running over multiple kernel threads (M:N threading).
     68
     69\item
     70Adopt returning of @nullptr@ for a zero-sized allocation, rather than an actual memory address, both of which can be passed to @free@.
     71
     72\item
     73Extended the standard C heap functionality by preserving with each allocation its original request size versus the amount allocated, if an allocation is zero fill, and the allocation alignment.
    8174
    8275\item
     
    117110
    118111\item
    119 Provide complete and fast allocation statistics to help understand program behaviour:
     112Provide mostly contention-free allocation and free operations via a heap-per-kernel-thread implementation.
     113
     114\item
     115Provide complete, fast, and contention-free allocation statistics to help understand program behaviour:
    120116\begin{itemize}
    121117\item
     
    129125
    130126\item
    131 Provide mostly contention-free allocation and free operations via a heap-per-kernel-thread implementation.
    132 
    133 \item
    134 Provide extensive contention-free runtime checks to valid allocation operations and identify the amount of unfreed storage at program termination.
     127Provide extensive runtime checks to valid allocation operations and identify the amount of unfreed storage at program termination.
    135128
    136129\item
Note: See TracChangeset for help on using the changeset viewer.