Changes in / [13cdc8c:bdfd0bd]
- Location:
- doc/theses/mubeen_zulfiqar_MMath
- Files:
-
- 1 deleted
- 3 edited
-
Makefile (modified) (1 diff)
-
background.tex (modified) (24 diffs)
-
figures/UserKernelHeaps.fig (deleted)
-
intro.tex (modified) (5 diffs)
Legend:
- Unmodified
- Added
- Removed
-
doc/theses/mubeen_zulfiqar_MMath/Makefile
r13cdc8c rbdfd0bd 36 36 # File Dependencies # 37 37 38 %.dvi : ${TeXSRC} ${FigSRC:%.fig=%.tex} ${PicSRC:%.fig=%.pstex} ${BIBSRC} Makefile | ${Build}38 ${Build}/%.dvi : ${TeXSRC} ${FigSRC:%.fig=%.tex} ${PicSRC:%.fig=%.pstex} ${BIBSRC} Makefile | ${Build} 39 39 ${LaTeX} ${BASE} 40 40 ${BibTeX} ${Build}/${BASE} -
doc/theses/mubeen_zulfiqar_MMath/background.tex
r13cdc8c rbdfd0bd 34 34 \VRef[Figure]{f:AllocatorComponents} shows the two important data components for a memory allocator, management and storage, collectively called the \newterm{heap}. 35 35 The \newterm{management data} is a data structure located at a known memory address and contains all information necessary to manage the storage data. 36 The management data starts with fixed-sized information in the static-data memory that references components inthe dynamic-allocation memory.36 The management data starts with fixed-sized information in the static-data memory that flows into the dynamic-allocation memory. 37 37 The \newterm{storage data} is composed of allocated and freed objects, and \newterm{reserved memory}. 38 38 Allocated objects (white) are variable sized, and allocated and maintained by the program; … … 44 44 \label{f:AllocatorComponents} 45 45 \end{figure} 46 Freed objects (light grey) represent memory deallocated by the program, which are linked into one or more lists facilitating easy location ofnew allocations.46 Freed objects (light grey) are memory deallocated by the program, which are linked into one or more lists facilitating easy location for new allocations. 47 47 Often the free list is chained internally so it does not consume additional storage, \ie the link fields are placed at known locations in the unused memory blocks. 48 48 Reserved memory (dark grey) is one or more blocks of memory obtained from the operating system but not yet allocated to the program; … … 81 81 Fragmentation is memory requested from the operating system but not used by the program; 82 82 hence, allocated objects are not fragmentation. 83 \VRef[Figure]{f:InternalExternalFragmentation} shows fragmentation is divided into two forms: internal or external.83 \VRef[Figure]{f:InternalExternalFragmentation}) shows fragmentation is divided into two forms: internal or external. 84 84 85 85 \begin{figure} … … 96 96 An allocator should strive to keep internal management information to a minimum. 97 97 98 \newterm{External fragmentation} is all memory space reserved from the operating system but not allocated to the program~\cite{Wilson95,Lim98,Siebert00}, which includes all external management data, freed objects, and reserved memory.98 \newterm{External fragmentation} is all memory space reserved from the operating system but not allocated to the program~\cite{Wilson95,Lim98,Siebert00}, which includes freed objects, all external management data, and reserved memory. 99 99 This memory is problematic in two ways: heap blowup and highly fragmented memory. 100 100 \newterm{Heap blowup} occurs when memory freed by the program is not reused for future allocations leading to potentially unbounded external fragmentation growth~\cite{Berger00}. … … 125 125 \end{figure} 126 126 127 For a single-threaded memory allocator, three basic approaches for controlling fragmentation areidentified~\cite{Johnstone99}.127 For a single-threaded memory allocator, three basic approaches for controlling fragmentation have been identified~\cite{Johnstone99}. 128 128 The first approach is a \newterm{sequential-fit algorithm} with one list of free objects that is searched for a block large enough to fit a requested object size. 129 129 Different search policies determine the free object selected, \eg the first free object large enough or closest to the requested size. … … 132 132 133 133 The second approach is a \newterm{segregated} or \newterm{binning algorithm} with a set of lists for different sized freed objects. 134 When an object is allocated, the requested size is rounded up to the nearest bin-size, often leading tospacing after the object.134 When an object is allocated, the requested size is rounded up to the nearest bin-size, possibly with spacing after the object. 135 135 A binning algorithm is fast at finding free memory of the appropriate size and allocating it, since the first free object on the free list is used. 136 136 The fewer bin-sizes, the fewer lists need to be searched and maintained; … … 158 158 Temporal locality commonly occurs during an iterative computation with a fix set of disjoint variables, while spatial locality commonly occurs when traversing an array. 159 159 160 Hardware takes advantage of temporal and spatial locality through multiple levels of caching , \ie memory hierarchy.160 Hardware takes advantage of temporal and spatial locality through multiple levels of caching (\ie memory hierarchy). 161 161 When an object is accessed, the memory physically located around the object is also cached with the expectation that the current and nearby objects will be referenced within a short period of time. 162 162 For example, entire cache lines are transferred between memory and cache and entire virtual-memory pages are transferred between disk and memory. … … 171 171 172 172 There are a number of ways a memory allocator can degrade locality by increasing the working set. 173 For example, a memory allocator may access multiple free objects before finding one to satisfy an allocation request , \eg sequential-fit algorithm.173 For example, a memory allocator may access multiple free objects before finding one to satisfy an allocation request (\eg sequential-fit algorithm). 174 174 If there are a (large) number of objects accessed in very different areas of memory, the allocator may perturb the program's memory hierarchy causing multiple cache or page misses~\cite{Grunwald93}. 175 175 Another way locality can be degraded is by spatially separating related data. … … 181 181 182 182 A multi-threaded memory-allocator does not run any threads itself, but is used by a multi-threaded program. 183 In addition to single-threaded design issues of fragmentation and locality, a multi-threaded allocator issimultaneously accessed by multiple threads, and hence, must deal with concurrency issues such as mutual exclusion, false sharing, and additional forms of heap blowup.183 In addition to single-threaded design issues of locality and fragmentation, a multi-threaded allocator may be simultaneously accessed by multiple threads, and hence, must deal with concurrency issues such as mutual exclusion, false sharing, and additional forms of heap blowup. 184 184 185 185 … … 192 192 Second is when multiple threads contend for a shared resource simultaneously, and hence, some threads must wait until the resource is released. 193 193 Contention can be reduced in a number of ways: 194 \begin{itemize}[itemsep=0pt]195 \item196 194 using multiple fine-grained locks versus a single lock, spreading the contention across a number of locks; 197 \item198 195 using trylock and generating new storage if the lock is busy, yielding a classic space versus time tradeoff; 199 \item200 196 using one of the many lock-free approaches for reducing contention on basic data-structure operations~\cite{Oyama99}. 201 \end{itemize} 202 However, all of these approaches have degenerate cases where program contention is high, which occurs outside of the allocator. 197 However, all of these approaches have degenerate cases where contention occurs. 203 198 204 199 … … 280 275 \label{s:MultipleHeaps} 281 276 282 A multi-threaded allocator has potentially multiple threads and heaps.277 A single-threaded allocator has at most one thread and heap, while a multi-threaded allocator has potentially multiple threads and heaps. 283 278 The multiple threads cause complexity, and multiple heaps are a mechanism for dealing with the complexity. 284 279 The spectrum ranges from multiple threads using a single heap, denoted as T:1 (see \VRef[Figure]{f:SingleHeap}), to multiple threads sharing multiple heaps, denoted as T:H (see \VRef[Figure]{f:SharedHeaps}), to one thread per heap, denoted as 1:1 (see \VRef[Figure]{f:PerThreadHeap}), which is almost back to a single-threaded allocator. … … 344 339 An alternative implementation is for all heaps to share one reserved memory, which requires a separate lock for the reserved storage to ensure mutual exclusion when acquiring new memory. 345 340 Because multiple threads can allocate/free/reallocate adjacent storage, all forms of false sharing may occur. 346 Other storage-management options are to use @mmap@ to set aside (large) areas of virtual memory for each heap and suballocate each heap's storage within that area , pushing part of the storage management complexity back to the operating system.341 Other storage-management options are to use @mmap@ to set aside (large) areas of virtual memory for each heap and suballocate each heap's storage within that area. 347 342 348 343 \begin{figure} … … 373 368 374 369 375 \paragraph{1:1 model (thread heaps)} where each thread has its own heap eliminating most contention and locking because threads seldom access another thread's heap (see ownership in \VRef{s:Ownership}).370 \paragraph{1:1 model (thread heaps)} where each thread has its own heap, which eliminates most contention and locking because threads seldom accesses another thread's heap (see ownership in \VRef{s:Ownership}). 376 371 An additional benefit of thread heaps is improved locality due to better memory layout. 377 372 As each thread only allocates from its heap, all objects for a thread are consolidated in the storage area for that heap, better utilizing each CPUs cache and accessing fewer pages. … … 385 380 Second is to place the thread heap on a list of available heaps and reuse it for a new thread in the future. 386 381 Destroying the thread heap immediately may reduce external fragmentation sooner, since all free objects are freed to the global heap and may be reused by other threads. 387 Alternatively, reusing thread heaps may improve performance if the inheriting thread makes similar allocation requests as the thread that previously held the thread heap because any unfreed storage is immediately accessible..382 Alternatively, reusing thread heaps may improve performance if the inheriting thread makes similar allocation requests as the thread that previously held the thread heap. 388 383 389 384 … … 393 388 However, an important goal of user-level threading is for fast operations (creation/termination/context-switching) by not interacting with the operating system, which allows the ability to create large numbers of high-performance interacting threads ($>$ 10,000). 394 389 It is difficult to retain this goal, if the user-threading model is directly involved with the heap model. 395 \VRef[Figure]{f:UserLevelKernelHeaps} shows that virtually all user-level threading systems use whatever kernel-level heap-model isprovided by the language runtime.390 \VRef[Figure]{f:UserLevelKernelHeaps} shows that virtually all user-level threading systems use whatever kernel-level heap-model provided by the language runtime. 396 391 Hence, a user thread allocates/deallocates from/to the heap of the kernel thread on which it is currently executing. 397 392 … … 405 400 Adopting this model results in a subtle problem with shared heaps. 406 401 With kernel threading, an operation that is started by a kernel thread is always completed by that thread. 407 For example, if a kernel thread starts an allocation/deallocation on a shared heap, it always completes that operation with that heap even if preempted, \ie any locking correctness associated with the shared heap is preserved across preemption. 402 For example, if a kernel thread starts an allocation/deallocation on a shared heap, it always completes that operation with that heap even if preempted. 403 Any correctness locking associated with the shared heap is preserved across preemption. 408 404 409 405 However, this correctness property is not preserved for user-level threading. … … 413 409 However, eagerly disabling/enabling time-slicing on the allocation/deallocation fast path is expensive, because preemption is rare (10--100 milliseconds). 414 410 Instead, techniques exist to lazily detect this case in the interrupt handler, abort the preemption, and return to the operation so it can complete atomically. 415 Occasionally ignoring a preemption should be benign , but a persistent lack of preemption can result in both short and long term starvation.411 Occasionally ignoring a preemption should be benign. 416 412 417 413 … … 434 430 435 431 \newterm{Ownership} defines which heap an object is returned-to on deallocation. 436 If a thread returns an object to the heap it was originally allocated from, aheap has ownership of its objects.437 Alternatively, a thread can return an object to the heap it is currently a ssociated with, which can be any heap accessible during a thread's lifetime.432 If a thread returns an object to the heap it was originally allocated from, the heap has ownership of its objects. 433 Alternatively, a thread can return an object to the heap it is currently allocating from, which can be any heap accessible during a thread's lifetime. 438 434 \VRef[Figure]{f:HeapsOwnership} shows an example of multiple heaps (minus the global heap) with and without ownership. 439 435 Again, the arrows indicate the direction memory conceptually moves for each kind of operation. … … 543 539 Only with the 1:1 model and ownership is active and passive false-sharing avoided (see \VRef{s:Ownership}). 544 540 Passive false-sharing may still occur, if delayed ownership is used. 545 Finally, a completely free container can become reserved storage and be reset to allocate objects of a new size or freed to the global heap.546 541 547 542 \begin{figure} … … 558 553 \caption{Free-list Structure with Container Ownership} 559 554 \end{figure} 555 556 A fragmented heap has multiple containers that may be partially or completely free. 557 A completely free container can become reserved storage and be reset to allocate objects of a new size. 558 When a heap reaches a threshold of free objects, it moves some free storage to the global heap for reuse to prevent heap blowup. 559 Without ownership, when a heap frees objects to the global heap, individual objects must be passed, and placed on the global-heap's free-list. 560 Containers cannot be freed to the global heap unless completely free because 560 561 561 562 When a container changes ownership, the ownership of all objects within it change as well. … … 568 569 Note, once the object is freed by Task$_1$, no more false sharing can occur until the container changes ownership again. 569 570 To prevent this form of false sharing, container movement may be restricted to when all objects in the container are free. 570 One implementation approach that increases the freedom to return a free container to the operating system involves allocating containers using a call like @mmap@, which allows memory at an arbitrary address to be returned versus only storage at the end of the contiguous @sbrk@ area , again pushing storage management complexity back to the operating system.571 One implementation approach that increases the freedom to return a free container to the operating system involves allocating containers using a call like @mmap@, which allows memory at an arbitrary address to be returned versus only storage at the end of the contiguous @sbrk@ area. 571 572 572 573 \begin{figure} … … 699 700 \end{figure} 700 701 701 As mentioned, an implementation may have only one heap interactwith the global heap, so the other heap can be simplified.702 As mentioned, an implementation may have only one heap deal with the global heap, so the other heap can be simplified. 702 703 For example, if only the private heap interacts with the global heap, the public heap can be reduced to a lock-protected free-list of objects deallocated by other threads due to ownership, called a \newterm{remote free-list}. 703 704 To avoid heap blowup, the private heap allocates from the remote free-list when it reaches some threshold or it has no free storage. … … 720 721 An allocation buffer is reserved memory (see~\VRef{s:AllocatorComponents}) not yet allocated to the program, and is used for allocating objects when the free list is empty. 721 722 That is, rather than requesting new storage for a single object, an entire buffer is requested from which multiple objects are allocated later. 722 Any heap may use an allocation buffer, resulting in allocation from the buffer before requesting objects (containers) from the global heap or operating system, respectively.723 Both any heap may use an allocation buffer, resulting in allocation from the buffer before requesting objects (containers) from the global heap or operating system, respectively. 723 724 The allocation buffer reduces contention and the number of global/operating-system calls. 724 725 For coalescing, a buffer is split into smaller objects by allocations, and recomposed into larger buffer areas during deallocations. 725 726 726 Allocation buffers are useful initially when there are no freed objects in a heap because many allocations usually occur when a thread starts (simple bump allocation).727 Allocation buffers are useful initially when there are no freed objects in a heap because many allocations usually occur when a thread starts. 727 728 Furthermore, to prevent heap blowup, objects should be reused before allocating a new allocation buffer. 728 Thus, allocation buffers are often allocated more frequently at program/thread start, and then allocations often diminish.729 Thus, allocation buffers are often allocated more frequently at program/thread start, and then their use often diminishes. 729 730 730 731 Using an allocation buffer with a thread heap avoids active false-sharing, since all objects in the allocation buffer are allocated to the same thread. … … 745 746 \label{s:LockFreeOperations} 746 747 747 A \newterm{lock-free algorithm} guarantees safe concurrent-access to a data structure, so that at least one thread makes progress, but an individual task has no execution bound andmay starve~\cite[pp.~745--746]{Herlihy93}.748 (A \newterm{wait-free algorithm} puts a bound on the number of steps any thread takes to complete an operation to prevent starvation.) 748 A lock-free algorithm guarantees safe concurrent-access to a data structure, so that at least one thread can make progress in the system, but an individual task has no bound to execution, and hence, may starve~\cite[pp.~745--746]{Herlihy93}. 749 % A wait-free algorithm puts a finite bound on the number of steps any thread takes to complete an operation, so an individual task cannot starve 749 750 Lock-free operations can be used in an allocator to reduce or eliminate the use of locks. 750 While locks and lock-free data-structures often have equal performance, lock-free has the advantage of not holding a lock across preemption so other threads can continue to make progress.751 With respect to the heap, these situations are unlikely unless all threads make extremely high use of dynamic-memory allocation, which can be an indication of poor design.751 Locks are a problem for high contention or if the thread holding the lock is preempted and other threads attempt to use that lock. 752 With respect to the heap, these situations are unlikely unless all threads makes extremely high use of dynamic-memory allocation, which can be an indication of poor design. 752 753 Nevertheless, lock-free algorithms can reduce the number of context switches, since a thread does not yield/block while waiting for a lock; 753 on the other hand, a thread may busy-wait for an unbounded period holding a processor.754 on the other hand, a thread may busy-wait for an unbounded period. 754 755 Finally, lock-free implementations have greater complexity and hardware dependency. 755 756 Lock-free algorithms can be applied most easily to simple free-lists, \eg remote free-list, to allow lock-free insertion and removal from the head of a stack. 756 Implementing lock-free operations for more complex data-structures (queue~\cite{Valois94}/deque~\cite{Sundell08}) is correspondinglyymore complex.757 Implementing lock-free operations for more complex data-structures (queue~\cite{Valois94}/deque~\cite{Sundell08}) is more complex. 757 758 Michael~\cite{Michael04} and Gidenstam \etal \cite{Gidenstam05} have created lock-free variations of the Hoard allocator. -
doc/theses/mubeen_zulfiqar_MMath/intro.tex
r13cdc8c rbdfd0bd 48 48 Attempts have been made to perform quasi garbage collection in C/\CC~\cite{Boehm88}, but it is a compromise. 49 49 This thesis only examines dynamic memory-management with \emph{explicit} deallocation. 50 While garbage collection and compaction are not part this work, many of the work'sresults are applicable to the allocation phase in any memory-management approach.50 While garbage collection and compaction are not part this work, many of the results are applicable to the allocation phase in any memory-management approach. 51 51 52 52 Most programs use a general-purpose allocator, often the one provided implicitly by the programming-language's runtime. … … 65 65 \begin{enumerate}[leftmargin=*] 66 66 \item 67 Implementation of a new stand-lone concurrent low-latency memory-allocator ($\approx$1,200 lines of code) for C/\CC programs using kernel threads (1:1 threading), and specialized versions of the allocator for the programming languages \uC and \CFA using user-level threads running over multiple kernel threads (M:N threading). 68 69 \item 70 Adopt @nullptr@ return for a zero-sized allocation, rather than an actual memory address, which can be passed to @free@. 71 72 \item 73 Extend the standard C heap functionality by preserving with each allocation: 74 \begin{itemize}[itemsep=0pt] 75 \item 76 its request size plus the amount allocated, 77 \item 78 whether an allocation is zero fill, 79 \item 80 and allocation alignment. 81 \end{itemize} 82 83 \item 84 Use the preserved zero fill and alignment as \emph{sticky} properties for @realloc@ to zero-fill and align when storage is extended or copied. 67 Implementation of a new stand-lone concurrent low-latency memory-allocator ($\approx$1,200 lines of code) for C/\CC programs using kernel threads (1:1 threading), and specialized versions of the allocator for programming languages \uC and \CFA using user-level threads running over multiple kernel threads (M:N threading). 68 69 \item 70 Adopt returning of @nullptr@ for a zero-sized allocation, rather than an actual memory address, both of which can be passed to @free@. 71 72 \item 73 Extended the standard C heap functionality by preserving with each allocation its original request size versus the amount allocated, if an allocation is zero fill, and the allocation alignment. 74 75 \item 76 Use the zero fill and alignment as \emph{sticky} properties for @realloc@, to realign existing storage, or preserve existing zero-fill and alignment when storage is copied. 85 77 Without this extension, it is unsafe to @realloc@ storage initially allocated with zero-fill/alignment as these properties are not preserved when copying. 86 78 This silent generation of a problem is unintuitive to programmers and difficult to locate because it is transient. … … 94 86 @resize( oaddr, alignment, size )@ re-purpose an old allocation with new alignment but \emph{without} preserving fill. 95 87 \item 96 @realloc( oaddr, alignment, size )@ same as @realloc@ but adding or changing alignment.88 @realloc( oaddr, alignment, size )@ same as previous @realloc@ but adding or changing alignment. 97 89 \item 98 90 @aalloc( dim, elemSize )@ same as @calloc@ except memory is \emph{not} zero filled. … … 104 96 105 97 \item 106 Provide additional heap wrapper functions in \CFA creating anorthogonal set of allocation operations and properties.98 Provide additional heap wrapper functions in \CFA to provide a complete orthogonal set of allocation operations and properties. 107 99 108 100 \item … … 117 109 @malloc_size( addr )@ returns the size of the memory allocation pointed-to by @addr@. 118 110 \item 119 @malloc_usable_size( addr )@ returns the usable (total)size of the memory pointed-to by @addr@, i.e., the bin size containing the allocation, where @malloc_size( addr )@ $\le$ @malloc_usable_size( addr )@.111 @malloc_usable_size( addr )@ returns the usable size of the memory pointed-to by @addr@, i.e., the bin size containing the allocation, where @malloc_size( addr )@ $\le$ @malloc_usable_size( addr )@. 120 112 \end{itemize} 121 113
Note:
See TracChangeset
for help on using the changeset viewer.