Changeset 3a038fa for doc/theses/mubeen_zulfiqar_MMath
- Timestamp:
- Feb 23, 2022, 10:31:12 AM (3 years ago)
- Branches:
- ADT, ast-experimental, enum, master, pthread-emulation, qualifiedEnum
- Children:
- cc7bbe6
- Parents:
- f53afafb
- Location:
- doc/theses/mubeen_zulfiqar_MMath
- Files:
-
- 5 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/theses/mubeen_zulfiqar_MMath/allocator.tex
rf53afafb r3a038fa 118 118 We added a few more features and routines to the allocator's C interface that can make the allocator more usable to the programmers. THese features will programmer more control on the dynamic memory allocation. 119 119 120 \subsection{Out of Memory} 121 122 Most allocators use @nullptr@ to indicate an allocation failure, specifically out of memory; 123 hence the need to return an alternate value for a zero-sized allocation. 124 The alternative is to abort a program when out of memory. 125 In theory, notifying the programmer allows recovery; 126 in practice, it is almost impossible to gracefully when out of memory, so the cheaper approach of returning @nullptr@ for a zero-sized allocation is chosen. 127 128 120 129 \subsection{\lstinline{void * aalloc( size_t dim, size_t elemSize )}} 121 130 @aalloc@ is an extension of malloc. It allows programmer to allocate a dynamic array of objects without calculating the total size of array explicitly. The only alternate of this routine in the other allocators is calloc but calloc also fills the dynamic memory with 0 which makes it slower for a programmer who only wants to dynamically allocate an array of objects without filling it with 0. -
doc/theses/mubeen_zulfiqar_MMath/background.tex
rf53afafb r3a038fa 1 1 \chapter{Background} 2 2 3 4 5 \section{Memory-Allocator Background}6 \label{s:MemoryAllocatorBackground}7 3 8 4 A program dynamically allocates and deallocates the storage for a variable, referred to as an \newterm{object}, through calls such as @malloc@ and @free@ in C, and @new@ and @delete@ in \CC. 9 5 Space for each allocated object comes from the dynamic-allocation zone. 10 A \newterm{memory allocator} is a complex data-structure and code that manages the layout of objects in the dynamic-allocation zone.6 A \newterm{memory allocator} contains a complex data-structure and code that manages the layout of objects in the dynamic-allocation zone. 11 7 The management goals are to make allocation/deallocation operations as fast as possible while densely packing objects to make efficient use of memory. 12 Objects cannot be moved to aid the packing process.8 Objects in C/\CC cannot be moved to aid the packing process, only adjacent free storage can be \newterm{coalesced} into larger free areas. 13 9 The allocator grows or shrinks the dynamic-allocation zone to obtain storage for objects and reduce memory usage via operating-system calls, such as @mmap@ or @sbrk@ in UNIX. 14 10 15 11 16 \s ubsection{Allocator Components}12 \section{Allocator Components} 17 13 \label{s:AllocatorComponents} 18 14 19 There are two important parts to a memory allocator, management and storage data (see \VRef[Figure]{f:AllocatorComponents}), collectively called the \newterm{heap}.15 \VRef[Figure]{f:AllocatorComponents} shows the two important data components for a memory allocator, management and storage, collectively called the \newterm{heap}. 20 16 The \newterm{management data} is a data structure located at a known memory address and contains all information necessary to manage the storage data. 21 17 The management data starts with fixed-sized information in the static-data memory that flows into the dynamic-allocation memory. 22 The \newterm{storage data} is composed of allocated and freed objects, and reserved memory. 23 Allocated objects (white) are variable sized and allocated to and maintained by the program. 24 Freed objects (light grey) are memory deallocated by the program that is linked to form a list facilitating easy location of storage for new allocations. 18 The \newterm{storage data} is composed of allocated and freed objects, and \newterm{reserved memory}. 19 Allocated objects (white) are variable sized, and allocated and maintained by the program; 20 \ie only the program knows the location of allocated storage, not the memory allocator. 21 \begin{figure}[h] 22 \centering 23 \input{AllocatorComponents} 24 \caption{Allocator Components (Heap)} 25 \label{f:AllocatorComponents} 26 \end{figure} 27 Freed objects (light grey) are memory deallocated by the program, which are linked into one or more lists facilitating easy location for new allocations. 25 28 Often the free list is chained internally so it does not consume additional storage, \ie the link fields are placed at known locations in the unused memory blocks. 26 29 Reserved memory (dark grey) is one or more blocks of memory obtained from the operating system but not yet allocated to the program; 27 30 if there are multiple reserved blocks, they are also chained together, usually internally. 28 31 29 \begin{figure}30 \centering31 \input{AllocatorComponents}32 \caption{Allocator Components (Heap)}33 \label{f:AllocatorComponents}34 \end{figure}35 36 32 Allocated and freed objects typically have additional management data embedded within them. 37 \VRef[Figure]{f:AllocatedObject} shows an allocated object with a header, trailer, and padding/spacing around the object.33 \VRef[Figure]{f:AllocatedObject} shows an allocated object with a header, trailer, and alignment padding and spacing around the object. 38 34 The header contains information about the object, \eg size, type, etc. 39 35 The trailer may be used to simplify an allocation implementation, \eg coalescing, and/or for security purposes to mark the end of an object. … … 53 49 54 50 55 \s ubsection{Single-Threaded Memory-Allocator}51 \section{Single-Threaded Memory-Allocator} 56 52 \label{s:SingleThreadedMemoryAllocator} 57 53 … … 61 57 62 58 63 \subs ubsection{Fragmentation}59 \subsection{Fragmentation} 64 60 \label{s:Fragmentation} 65 61 66 62 Fragmentation is memory requested from the operating system but not used by the program; 67 63 hence, allocated objects are not fragmentation. 68 Fragmentation is often divided into internal or external (see~\VRef[Figure]{f:InternalExternalFragmentation}).64 \VRef[Figure]{f:InternalExternalFragmentation}) shows fragmentation is divided into two forms: internal or external. 69 65 70 66 \begin{figure} … … 76 72 77 73 \newterm{Internal fragmentation} is memory space that is allocated to the program, but is not intended to be accessed by the program, such as headers, trailers, padding, and spacing around an allocated object. 78 This memory is typically used by the allocator for management purposes or required by the architecture for correctness (\eg alignment).74 This memory is typically used by the allocator for management purposes or required by the architecture for correctness, \eg alignment. 79 75 Internal fragmentation is problematic when management space is a significant proportion of an allocated object. 80 76 For example, if internal fragmentation is as large as the object being managed, then the memory usage for that object is doubled. … … 84 80 This memory is problematic in two ways: heap blowup and highly fragmented memory. 85 81 \newterm{Heap blowup} occurs when memory freed by the program is not reused for future allocations leading to potentially unbounded external fragmentation growth~\cite{Berger00}. 86 Heap blowup can occur due to allocator policies that are too restrictive in reusing freed memory .82 Heap blowup can occur due to allocator policies that are too restrictive in reusing freed memory and/or no coalescing of free storage. 87 83 Memory can become \newterm{highly fragmented} after multiple allocations and deallocations of objects. 88 84 \VRef[Figure]{f:MemoryFragmentation} shows an example of how a small block of memory fragments as objects are allocated and deallocated over time. … … 91 87 For example, \VRef[Figure]{f:Contiguous} and \VRef[Figure]{f:HighlyFragmented} have the same quantity of external fragmentation, but \VRef[Figure]{f:HighlyFragmented} is highly fragmented. 92 88 If there is a request to allocate a large object, \VRef[Figure]{f:Contiguous} is more likely to be able to satisfy it with existing free memory, while \VRef[Figure]{f:HighlyFragmented} likely has to request more memory from the operating system. 93 94 For a single-threaded memory allocator, three basic approaches for controlling fragmentation have been identified~\cite{Johnstone99}.95 The first approach is a \newterm{sequential-fit algorithm} with one list of free objects that is searched for a block large enough to fit a requested object size.96 Different search policies determine the free object selected, \eg the first free object large enough or closest to the requested size.97 Any storage larger than the request can become spacing after the object or be split into a smaller free object.98 The cost of the search depends on the shape and quality of the free list, \eg a linear versus a binary-tree free-list, a sorted versus unsorted free-list.99 89 100 90 \begin{figure} … … 115 105 \label{f:FragmentationQuality} 116 106 \end{figure} 107 108 For a single-threaded memory allocator, three basic approaches for controlling fragmentation have been identified~\cite{Johnstone99}. 109 The first approach is a \newterm{sequential-fit algorithm} with one list of free objects that is searched for a block large enough to fit a requested object size. 110 Different search policies determine the free object selected, \eg the first free object large enough or closest to the requested size. 111 Any storage larger than the request can become spacing after the object or be split into a smaller free object. 112 The cost of the search depends on the shape and quality of the free list, \eg a linear versus a binary-tree free-list, a sorted versus unsorted free-list. 117 113 118 114 The second approach is a \newterm{segregated} or \newterm{binning algorithm} with a set of lists for different sized freed objects. … … 131 127 When an object is deallocated it is coalesced with the objects immediately before and after it in memory, if they are free, turning them into one larger object. 132 128 Coalescing can be done eagerly at each deallocation or lazily when an allocation cannot be fulfilled. 129 In all cases, coalescing increases allocation latency, hence some allocations can cause unbounded delays during coalescing. 133 130 While coalescing does not reduce external fragmentation, the coalesced blocks improve fragmentation quality so future allocations are less likely to cause heap blowup. 134 131 Splitting and coalescing can be used with other algorithms to avoid highly fragmented memory. 135 132 136 133 137 \subs ubsection{Locality}134 \subsection{Locality} 138 135 \label{s:Locality} 139 136 … … 144 141 Hardware takes advantage of temporal and spatial locality through multiple levels of caching (\ie memory hierarchy). 145 142 When an object is accessed, the memory physically located around the object is also cached with the expectation that the current and nearby objects will be referenced within a short period of time. 146 For example, entire cache lines are transfer ed between memory and cache and entire virtual-memory pages are transferred between disk and memory.147 A program exhibiting good locality has better performance due to fewer cache misses and page faults .143 For example, entire cache lines are transferred between memory and cache and entire virtual-memory pages are transferred between disk and memory. 144 A program exhibiting good locality has better performance due to fewer cache misses and page faults\footnote{With the advent of large RAM memory, paging is becoming less of an issue in modern programming.}. 148 145 149 146 Temporal locality is largely controlled by how a program accesses its variables~\cite{Feng05}. … … 151 148 For temporal locality, an allocator can return storage for new allocations that was just freed as these memory locations are still \emph{warm} in the memory hierarchy. 152 149 For spatial locality, an allocator can place objects used together close together in memory, so the working set of the program fits into the fewest possible cache lines and pages. 153 However, usage patterns are different for every program as is the underlying hardware architecture (\ie memory hierarchy);150 However, usage patterns are different for every program as is the underlying hardware memory architecture; 154 151 hence, no general-purpose memory-allocator can provide ideal locality for every program on every computer. 155 152 … … 161 158 162 159 163 \s ubsection{Multi-Threaded Memory-Allocator}160 \section{Multi-Threaded Memory-Allocator} 164 161 \label{s:MultiThreadedMemoryAllocator} 165 162 166 163 A multi-threaded memory-allocator does not run any threads itself, but is used by a multi-threaded program. 167 In addition to single-threaded design issues of locality and fragmentation, a multi-threaded allocator may be simultaneously accessed by multiple threads, and hence, must deal with concurrency issues such as mutual exclusion, false sharing, and additional forms of heap blowup.168 169 170 \subs ubsection{Mutual Exclusion}164 In addition to single-threaded design issues of locality and fragmentation, a multi-threaded allocator may be simultaneously accessed by multiple threads, and hence, must deal with concurrency issues such as ymutual exclusion, false sharing, and additional forms of heap blowup. 165 166 167 \subsection{Mutual Exclusion} 171 168 \label{s:MutualExclusion} 172 169 173 Mutual exclusion provides sequential access to themanagement data of the heap.170 \newterm{Mutual exclusion} provides sequential access to the shared management data of the heap. 174 171 There are two performance issues for mutual exclusion. 175 172 First is the overhead necessary to perform (at least) a hardware atomic operation every time a shared resource is accessed. … … 177 174 Contention can be reduced in a number of ways: 178 175 using multiple fine-grained locks versus a single lock, spreading the contention across a number of locks; 179 using trylock and generating new storage if the lock is busy, yielding a space vs contention trade-off;176 using trylock and generating new storage if the lock is busy, yielding a classic space versus time tradeoff; 180 177 using one of the many lock-free approaches for reducing contention on basic data-structure operations~\cite{Oyama99}. 181 178 However, all of these approaches have degenerate cases where contention occurs. 182 179 183 180 184 \subs ubsection{False Sharing}181 \subsection{False Sharing} 185 182 \label{s:FalseSharing} 186 183 187 184 False sharing is a dynamic phenomenon leading to cache thrashing. 188 When two or more threads on separate CPUs simultaneously change different objects sharing a cache line, the change invalidates the other thread's associated cache, even though these threads may be uninterested in the modified object.185 When two or more threads on separate CPUs simultaneously change different objects sharing a cache line, the change invalidates the other thread's associated cache, even though these threads may be uninterested in the other modified object. 189 186 False sharing can occur in three different ways: program induced, allocator-induced active, and allocator-induced passive; 190 187 a memory allocator can only affect the latter two. 191 188 192 \ newterm{Program-induced false-sharing} occurs when one thread passes an object sharing a cache line to another thread, and both threads modify the respective objects.193 For example, in \VRef[Figure]{f:ProgramInducedFalseSharing},when Task$_1$ passes Object$_2$ to Task$_2$, a false-sharing situation forms when Task$_1$ modifies Object$_1$ and Task$_2$ modifies Object$_2$.189 \paragraph{\newterm{Program-induced false-sharing}} occurs when one thread passes an object sharing a cache line to another thread, and both threads modify the respective objects. 190 \VRef[Figure]{f:ProgramInducedFalseSharing} shows when Task$_1$ passes Object$_2$ to Task$_2$, a false-sharing situation forms when Task$_1$ modifies Object$_1$ and Task$_2$ modifies Object$_2$. 194 191 Changes to Object$_1$ invalidate CPU$_2$'s cache line, and changes to Object$_2$ invalidate CPU$_1$'s cache line. 195 192 … … 214 211 \end{figure} 215 212 216 \ newterm{Allocator-induced active false-sharing} occurs when objects are allocated within the same cache line but to different threads.213 \paragraph{\newterm{Allocator-induced active false-sharing}} occurs when objects are allocated within the same cache line but to different threads. 217 214 For example, in \VRef[Figure]{f:AllocatorInducedActiveFalseSharing}, each task allocates an object and loads a cache-line of memory into its associated cache. 218 215 Again, changes to Object$_1$ invalidate CPU$_2$'s cache line, and changes to Object$_2$ invalidate CPU$_1$'s cache line. 219 216 220 \ newterm{Allocator-induced passive false-sharing} is another form of allocator-induced false-sharing caused by program-induced false-sharing.217 \paragraph{\newterm{Allocator-induced passive false-sharing}} is another form of allocator-induced false-sharing caused by program-induced false-sharing. 221 218 When an object in a program-induced false-sharing situation is deallocated, a future allocation of that object may cause passive false-sharing. 222 219 For example, in \VRef[Figure]{f:AllocatorInducedPassiveFalseSharing}, Task$_1$ passes Object$_2$ to Task$_2$, and Task$_2$ subsequently deallocates Object$_2$. … … 224 221 225 222 226 \subs ubsection{Heap Blowup}223 \subsection{Heap Blowup} 227 224 \label{s:HeapBlowup} 228 225 … … 256 253 257 254 258 \s ubsection{Multiple Heaps}255 \section{Multiple Heaps} 259 256 \label{s:MultipleHeaps} 260 257 … … 373 370 374 371 375 \subs ubsection{Ownership}372 \subsection{Ownership} 376 373 \label{s:Ownership} 377 374 … … 428 425 429 426 430 \s ubsection{Object Containers}427 \section{Object Containers} 431 428 \label{s:ObjectContainers} 432 429 … … 490 487 491 488 492 \subs ubsection{Container Ownership}489 \subsection{Container Ownership} 493 490 \label{s:ContainerOwnership} 494 491 … … 562 559 563 560 564 \subs ubsection{Container Size}561 \subsection{Container Size} 565 562 \label{s:ContainerSize} 566 563 … … 602 599 603 600 604 \subs ubsection{Container Free-Lists}601 \subsection{Container Free-Lists} 605 602 \label{s:containersfreelists} 606 603 … … 676 673 677 674 678 \s ubsection{Allocation Buffer}675 \section{Allocation Buffer} 679 676 \label{s:AllocationBuffer} 680 677 … … 703 700 704 701 705 \s ubsection{Lock-Free Operations}702 \section{Lock-Free Operations} 706 703 \label{s:LockFreeOperations} 707 704 … … 741 738 ==================== 742 739 743 \section{Background}744 745 740 % FIXME: cite wasik 746 741 \cite{wasik.thesis} 747 742 748 \s ubsection{Memory Allocation}749 With dynamic allocation being an important feature of C, there are many stand alone memory allocators that have been designed for different purposes. For this thesis, we chose 7 of the most popular and widely used memory allocators.743 \section{Existing Memory Allocators} 744 With dynamic allocation being an important feature of C, there are many stand-alone memory allocators that have been designed for different purposes. For this thesis, we chose 7 of the most popular and widely used memory allocators. 750 745 751 746 \paragraph{dlmalloc} … … 753 748 754 749 \paragraph{hoard} 755 Hoard (FIX ME: cite allocator) is a thread-safe allocator that is multi-threaded and using a heap layer framework. It has per-thre d heaps that have thread-local free-lists, and a gloabl shared heap. (FIX ME: cite wasik)750 Hoard (FIX ME: cite allocator) is a thread-safe allocator that is multi-threaded and using a heap layer framework. It has per-thread heaps that have thread-local free-lists, and a global shared heap. (FIX ME: cite wasik) 756 751 757 752 \paragraph{jemalloc} … … 762 757 763 758 \paragraph{rpmalloc} 764 rpmalloc (FIX ME: cite allocator) is a thread-safe allocator that is multi-threaded and uses per-thread heap. Each heap has multiple size-classes and each size-c alss contains memory regions of the relevant size.759 rpmalloc (FIX ME: cite allocator) is a thread-safe allocator that is multi-threaded and uses per-thread heap. Each heap has multiple size-classes and each size-class contains memory regions of the relevant size. 765 760 766 761 \paragraph{tbb malloc} … … 769 764 \paragraph{tc malloc} 770 765 tcmalloc (FIX ME: cite allocator) is a thread-safe allocator. It uses per-thread cache to store free objects that prevents contention on shared resources in multi-threaded application. A central free-list is used to refill per-thread cache when it gets empty. 771 772 \subsection{Benchmarks}773 There are multiple benchmarks that are built individually and evaluate different aspects of a memory allocator. But, there is not standard set of benchamrks that can be used to evaluate multiple aspects of memory allocators.774 775 \paragraph{threadtest}776 (FIX ME: cite benchmark and hoard) Each thread repeatedly allocates and then deallocates 100,000 objects. Runtime of the benchmark evaluates its efficiency.777 778 \paragraph{shbench}779 (FIX ME: cite benchmark and hoard) Each thread allocates and randomly frees a number of random-sized objects. It is a stress test that also uses runtime to determine efficiency of the allocator.780 781 \paragraph{larson}782 (FIX ME: cite benchmark and hoard) Larson simulates a server environment. Multiple threads are created where each thread allocator and free a number of objects within a size range. Some objects are passed from threads to the child threads to free. It caluculates memory operations per second as an indicator of memory allocator's performance. -
doc/theses/mubeen_zulfiqar_MMath/benchmarks.tex
rf53afafb r3a038fa 41 41 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 42 42 43 44 \section{Benchmarks} 45 There are multiple benchmarks that are built individually and evaluate different aspects of a memory allocator. But, there is not standard set of benchamrks that can be used to evaluate multiple aspects of memory allocators. 46 47 \paragraph{threadtest} 48 (FIX ME: cite benchmark and hoard) Each thread repeatedly allocates and then deallocates 100,000 objects. Runtime of the benchmark evaluates its efficiency. 49 50 \paragraph{shbench} 51 (FIX ME: cite benchmark and hoard) Each thread allocates and randomly frees a number of random-sized objects. It is a stress test that also uses runtime to determine efficiency of the allocator. 52 53 \paragraph{larson} 54 (FIX ME: cite benchmark and hoard) Larson simulates a server environment. Multiple threads are created where each thread allocator and free a number of objects within a size range. Some objects are passed from threads to the child threads to free. It caluculates memory operations per second as an indicator of memory allocator's performance. 55 56 43 57 \section{Performance Matrices of Memory Allocators} 44 58 -
doc/theses/mubeen_zulfiqar_MMath/figures/AllocatorComponents.fig
rf53afafb r3a038fa 1 #FIG 3.2 Produced by xfig version 3.2. 51 #FIG 3.2 Produced by xfig version 3.2.7b 2 2 Landscape 3 3 Center 4 4 Inches 5 Letter 5 Letter 6 6 100.00 7 7 Single 8 8 -2 9 9 1200 2 10 6 2325 2775 3000 315011 4 2 0 50 -1 0 11 0.0000 2 135 660 3000 2925 reserved\00112 4 2 0 50 -1 0 11 0.0000 2 135 600 3000 3075 memory\00113 -614 10 6 2400 1575 3000 1950 15 4 2 0 50 -1 0 11 0.0000 2 1 80 555 3000 1875 objects\00116 4 2 0 50 -1 0 11 0.0000 2 1 35 300 3000 1725 free\00111 4 2 0 50 -1 0 11 0.0000 2 165 495 3000 1875 objects\001 12 4 2 0 50 -1 0 11 0.0000 2 120 270 3000 1725 free\001 17 13 -6 18 14 6 2400 2100 2700 2700 … … 21 17 2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5 22 18 2700 2100 2700 2400 2400 2400 2400 2100 2700 2100 19 -6 20 6 2325 2850 3000 3225 21 4 2 0 50 -1 0 11 0.0000 2 135 600 3000 3000 reserved\001 22 4 2 0 50 -1 0 11 0.0000 2 135 585 3000 3150 memory\001 23 23 -6 24 24 2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5 … … 46 46 2 2 0 1 0 7 60 -1 13 0.000 0 0 -1 0 0 5 47 47 3300 2700 6300 2700 6300 3300 3300 3300 3300 2700 48 2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 549 3300 3300 3600 3300 3600 3600 3300 3600 3300 330050 2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 551 4500 3300 5100 3300 5100 3600 4500 3600 4500 330052 2 2 0 1 0 7 50 -1 17 0.000 0 0 -1 0 0 553 5100 3300 6300 3300 6300 3600 5100 3600 5100 330054 2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 555 3300 3600 4500 3600 4500 3900 3300 3900 3300 360056 2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 557 4500 3600 5400 3600 5400 3900 4500 3900 4500 360058 2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 559 5400 3600 6300 3600 6300 3900 5400 3900 5400 360060 2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 561 3300 3900 3900 3900 3900 4200 3300 4200 3300 390062 2 2 0 1 0 7 50 -1 17 0.000 0 0 -1 0 0 563 5100 3900 5700 3900 5700 4200 5100 4200 5100 390064 2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 565 3900 3900 5100 3900 5100 4200 3900 4200 3900 390066 2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 567 5700 3900 6300 3900 6300 4200 5700 4200 5700 390068 48 2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 1 0 2 69 49 1 1 1.00 45.00 90.00 … … 80 60 2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 1 0 2 81 61 1 1 1.00 45.00 90.00 82 5850 1950 5100 330083 2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 1 0 284 1 1 1.00 45.00 90.0085 5700 3450 5100 390086 2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 1 0 287 1 1 1.00 45.00 90.0088 62 2550 2250 3300 1800 89 63 2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 1 0 2 … … 91 65 2550 2550 3300 2700 92 66 2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2 93 3150 1275 3150 420094 4 1 0 50 -1 0 11 0.0000 2 1 35 885 2325 1425 Static Zone\00195 4 1 0 50 -1 0 11 0.0000 2 1 80 19504800 1425 Dynamic-Allocation Zone\00196 4 0 0 50 -1 2 11 0.0000 2 1 80 645 3300 1725 Storage\00197 4 2 0 50 -1 2 11 0.0000 2 1 80 10502325 2475 Management\00167 3150 1275 3150 3300 68 4 1 0 50 -1 0 11 0.0000 2 120 795 2325 1425 Static Zone\001 69 4 1 0 50 -1 0 11 0.0000 2 165 1845 4800 1425 Dynamic-Allocation Zone\001 70 4 0 0 50 -1 2 11 0.0000 2 165 585 3300 1725 Storage\001 71 4 2 0 50 -1 2 11 0.0000 2 165 1005 2325 2475 Management\001 -
doc/theses/mubeen_zulfiqar_MMath/intro.tex
rf53afafb r3a038fa 1 1 \chapter{Introduction} 2 3 4 \section{Introduction}5 2 6 3 % Shared-memory multi-processor computers are ubiquitous and important for improving application performance. … … 9 6 % Therefore, providing high-performance, scalable memory-management is important for virtually all shared-memory multi-threaded programs. 10 7 8 \vspace*{-23pt} 11 9 Memory management takes a sequence of program generated allocation/deallocation requests and attempts to satisfy them within a fixed-sized block of memory while minimizing the total amount of memory used. 12 10 A general-purpose dynamic-allocation algorithm cannot anticipate future allocation requests so its output is rarely optimal. … … 16 14 17 15 18 \s ubsection{Memory Structure}16 \section{Memory Structure} 19 17 \label{s:MemoryStructure} 20 18 … … 25 23 Dynamic code/data memory is managed by the dynamic loader for libraries loaded at runtime, which is complex especially in a multi-threaded program~\cite{Huang06}. 26 24 However, changes to the dynamic code/data space are typically infrequent, many occurring at program startup, and are largely outside of a program's control. 27 Stack memory is managed by the program call-mechanism using simple LIFO management, which works well for sequential programs.25 Stack memory is managed by the program call-mechanism using a simple LIFO technique, which works well for sequential programs. 28 26 For multi-threaded programs (and coroutines), a new stack is created for each thread; 29 27 these thread stacks are commonly created in dynamic-allocation memory. … … 39 37 40 38 41 \s ubsection{Dynamic Memory-Management}39 \section{Dynamic Memory-Management} 42 40 \label{s:DynamicMemoryManagement} 43 41 44 42 Modern programming languages manage dynamic-allocation memory in different ways. 45 Some languages, such as Lisp~\cite{CommonLisp}, Java~\cite{Java}, Go~\cite{Go}, Haskell~\cite{Haskell}, provide explicit allocation but \emph{implicit} deallocation of data through garbage collection~\cite{Wilson92}.43 Some languages, such as Lisp~\cite{CommonLisp}, Java~\cite{Java}, Haskell~\cite{Haskell}, Go~\cite{Go}, provide explicit allocation but \emph{implicit} deallocation of data through garbage collection~\cite{Wilson92}. 46 44 In general, garbage collection supports memory compaction, where dynamic (live) data is moved during runtime to better utilize space. 47 45 However, moving data requires finding pointers to it and updating them to reflect new data locations. … … 58 56 However, high-performance memory-allocators for kernel and user multi-threaded programs are still being designed and improved. 59 57 For this reason, several alternative general-purpose allocators have been written for C/\CC with the goal of scaling in a multi-threaded program~\cite{Berger00,mtmalloc,streamflow,tcmalloc}. 60 This workexamines the design of high-performance allocators for use by kernel and user multi-threaded applications written in C/\CC.61 62 63 \s ubsection{Contributions}58 This thesis examines the design of high-performance allocators for use by kernel and user multi-threaded applications written in C/\CC. 59 60 61 \section{Contributions} 64 62 \label{s:Contributions} 65 63 66 64 This work provides the following contributions in the area of concurrent dynamic allocation: 67 \begin{enumerate} 68 \item 69 Implementation of a new stand-lone concurrent memory allocator ($\approx$1,200 lines of code) for C/\CC programs using kernel threads (1:1 threading), and specialized versions of the allocator for programming languages \uC and \CFA using user-level threads running over multiple kernel threads (M:N threading). 70 71 \item 72 Adopt the return of @nullptr@ for a zero-sized allocation, rather than an actual memory address, both of which can be passed to @free@. 73 Most allocators use @nullptr@ to indicate an allocation failure, such as full memory; 74 hence the need to return an alternate value for a zero-sized allocation. 75 The alternative is to abort the program on allocation failure. 76 In theory, notifying the programmer of a failure allows recovery; 77 in practice, it is almost impossible to gracefully recover from allocation failure, especially full memory, so adopting the cheaper return @nullptr@ for a zero-sized allocation is chosen. 78 79 \item 80 Extended the standard C heap functionality by preserving with each allocation its original request size versus the amount allocated due to bucketing, if an allocation is zero fill, and the allocation alignment. 65 \begin{enumerate}[leftmargin=*] 66 \item 67 Implementation of a new stand-lone concurrent low-latency memory-allocator ($\approx$1,200 lines of code) for C/\CC programs using kernel threads (1:1 threading), and specialized versions of the allocator for programming languages \uC and \CFA using user-level threads running over multiple kernel threads (M:N threading). 68 69 \item 70 Adopt returning of @nullptr@ for a zero-sized allocation, rather than an actual memory address, both of which can be passed to @free@. 71 72 \item 73 Extended the standard C heap functionality by preserving with each allocation its original request size versus the amount allocated, if an allocation is zero fill, and the allocation alignment. 81 74 82 75 \item … … 117 110 118 111 \item 119 Provide complete and fast allocation statistics to help understand program behaviour: 112 Provide mostly contention-free allocation and free operations via a heap-per-kernel-thread implementation. 113 114 \item 115 Provide complete, fast, and contention-free allocation statistics to help understand program behaviour: 120 116 \begin{itemize} 121 117 \item … … 129 125 130 126 \item 131 Provide mostly contention-free allocation and free operations via a heap-per-kernel-thread implementation. 132 133 \item 134 Provide extensive contention-free runtime checks to valid allocation operations and identify the amount of unfreed storage at program termination. 127 Provide extensive runtime checks to valid allocation operations and identify the amount of unfreed storage at program termination. 135 128 136 129 \item
Note: See TracChangeset
for help on using the changeset viewer.