Changeset a114743 for doc/theses


Ignore:
Timestamp:
Mar 30, 2022, 10:11:02 PM (2 years ago)
Author:
Peter A. Buhr <pabuhr@…>
Branches:
ADT, ast-experimental, enum, master, pthread-emulation, qualifiedEnum
Children:
ee3da78
Parents:
65781a8
Message:

proofread intro and background chapters

Location:
doc/theses/mubeen_zulfiqar_MMath
Files:
2 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/mubeen_zulfiqar_MMath/background.tex

    r65781a8 ra114743  
    3434\VRef[Figure]{f:AllocatorComponents} shows the two important data components for a memory allocator, management and storage, collectively called the \newterm{heap}.
    3535The \newterm{management data} is a data structure located at a known memory address and contains all information necessary to manage the storage data.
    36 The management data starts with fixed-sized information in the static-data memory that flows into the dynamic-allocation memory.
     36The management data starts with fixed-sized information in the static-data memory that references components in the dynamic-allocation memory.
    3737The \newterm{storage data} is composed of allocated and freed objects, and \newterm{reserved memory}.
    3838Allocated objects (white) are variable sized, and allocated and maintained by the program;
     
    4444\label{f:AllocatorComponents}
    4545\end{figure}
    46 Freed objects (light grey) are memory deallocated by the program, which are linked into one or more lists facilitating easy location for new allocations.
     46Freed objects (light grey) represent memory deallocated by the program, which are linked into one or more lists facilitating easy location of new allocations.
    4747Often the free list is chained internally so it does not consume additional storage, \ie the link fields are placed at known locations in the unused memory blocks.
    4848Reserved memory (dark grey) is one or more blocks of memory obtained from the operating system but not yet allocated to the program;
     
    8181Fragmentation is memory requested from the operating system but not used by the program;
    8282hence, allocated objects are not fragmentation.
    83 \VRef[Figure]{f:InternalExternalFragmentation}) shows fragmentation is divided into two forms: internal or external.
     83\VRef[Figure]{f:InternalExternalFragmentation} shows fragmentation is divided into two forms: internal or external.
    8484
    8585\begin{figure}
     
    9696An allocator should strive to keep internal management information to a minimum.
    9797
    98 \newterm{External fragmentation} is all memory space reserved from the operating system but not allocated to the program~\cite{Wilson95,Lim98,Siebert00}, which includes freed objects, all external management data, and reserved memory.
     98\newterm{External fragmentation} is all memory space reserved from the operating system but not allocated to the program~\cite{Wilson95,Lim98,Siebert00}, which includes all external management data, freed objects, and reserved memory.
    9999This memory is problematic in two ways: heap blowup and highly fragmented memory.
    100100\newterm{Heap blowup} occurs when memory freed by the program is not reused for future allocations leading to potentially unbounded external fragmentation growth~\cite{Berger00}.
     
    125125\end{figure}
    126126
    127 For a single-threaded memory allocator, three basic approaches for controlling fragmentation have been identified~\cite{Johnstone99}.
     127For a single-threaded memory allocator, three basic approaches for controlling fragmentation are identified~\cite{Johnstone99}.
    128128The first approach is a \newterm{sequential-fit algorithm} with one list of free objects that is searched for a block large enough to fit a requested object size.
    129129Different search policies determine the free object selected, \eg the first free object large enough or closest to the requested size.
     
    132132
    133133The second approach is a \newterm{segregated} or \newterm{binning algorithm} with a set of lists for different sized freed objects.
    134 When an object is allocated, the requested size is rounded up to the nearest bin-size, possibly with spacing after the object.
     134When an object is allocated, the requested size is rounded up to the nearest bin-size, often leading to spacing after the object.
    135135A binning algorithm is fast at finding free memory of the appropriate size and allocating it, since the first free object on the free list is used.
    136136The fewer bin-sizes, the fewer lists need to be searched and maintained;
     
    158158Temporal locality commonly occurs during an iterative computation with a fix set of disjoint variables, while spatial locality commonly occurs when traversing an array.
    159159
    160 Hardware takes advantage of temporal and spatial locality through multiple levels of caching (\ie memory hierarchy).
     160Hardware takes advantage of temporal and spatial locality through multiple levels of caching, \ie memory hierarchy.
    161161When an object is accessed, the memory physically located around the object is also cached with the expectation that the current and nearby objects will be referenced within a short period of time.
    162162For example, entire cache lines are transferred between memory and cache and entire virtual-memory pages are transferred between disk and memory.
     
    171171
    172172There are a number of ways a memory allocator can degrade locality by increasing the working set.
    173 For example, a memory allocator may access multiple free objects before finding one to satisfy an allocation request (\eg sequential-fit algorithm).
     173For example, a memory allocator may access multiple free objects before finding one to satisfy an allocation request, \eg sequential-fit algorithm.
    174174If there are a (large) number of objects accessed in very different areas of memory, the allocator may perturb the program's memory hierarchy causing multiple cache or page misses~\cite{Grunwald93}.
    175175Another way locality can be degraded is by spatially separating related data.
     
    181181
    182182A multi-threaded memory-allocator does not run any threads itself, but is used by a multi-threaded program.
    183 In addition to single-threaded design issues of locality and fragmentation, a multi-threaded allocator may be simultaneously accessed by multiple threads, and hence, must deal with concurrency issues such as mutual exclusion, false sharing, and additional forms of heap blowup.
     183In addition to single-threaded design issues of fragmentation and locality, a multi-threaded allocator is simultaneously accessed by multiple threads, and hence, must deal with concurrency issues such as mutual exclusion, false sharing, and additional forms of heap blowup.
    184184
    185185
     
    192192Second is when multiple threads contend for a shared resource simultaneously, and hence, some threads must wait until the resource is released.
    193193Contention can be reduced in a number of ways:
     194\begin{itemize}[itemsep=0pt]
     195\item
    194196using multiple fine-grained locks versus a single lock, spreading the contention across a number of locks;
     197\item
    195198using trylock and generating new storage if the lock is busy, yielding a classic space versus time tradeoff;
     199\item
    196200using one of the many lock-free approaches for reducing contention on basic data-structure operations~\cite{Oyama99}.
    197 However, all of these approaches have degenerate cases where contention occurs.
     201\end{itemize}
     202However, all of these approaches have degenerate cases where program contention is high, which occurs outside of the allocator.
    198203
    199204
     
    275280\label{s:MultipleHeaps}
    276281
    277 A single-threaded allocator has at most one thread and heap, while a multi-threaded allocator has potentially multiple threads and heaps.
     282A multi-threaded allocator has potentially multiple threads and heaps.
    278283The multiple threads cause complexity, and multiple heaps are a mechanism for dealing with the complexity.
    279284The spectrum ranges from multiple threads using a single heap, denoted as T:1 (see \VRef[Figure]{f:SingleHeap}), to multiple threads sharing multiple heaps, denoted as T:H (see \VRef[Figure]{f:SharedHeaps}), to one thread per heap, denoted as 1:1 (see \VRef[Figure]{f:PerThreadHeap}), which is almost back to a single-threaded allocator.
     
    339344An alternative implementation is for all heaps to share one reserved memory, which requires a separate lock for the reserved storage to ensure mutual exclusion when acquiring new memory.
    340345Because multiple threads can allocate/free/reallocate adjacent storage, all forms of false sharing may occur.
    341 Other storage-management options are to use @mmap@ to set aside (large) areas of virtual memory for each heap and suballocate each heap's storage within that area.
     346Other storage-management options are to use @mmap@ to set aside (large) areas of virtual memory for each heap and suballocate each heap's storage within that area, pushing part of the storage management complexity back to the operating system.
    342347
    343348\begin{figure}
     
    368373
    369374
    370 \paragraph{1:1 model (thread heaps)} where each thread has its own heap, which eliminates most contention and locking because threads seldom accesses another thread's heap (see ownership in \VRef{s:Ownership}).
     375\paragraph{1:1 model (thread heaps)} where each thread has its own heap eliminating most contention and locking because threads seldom access another thread's heap (see ownership in \VRef{s:Ownership}).
    371376An additional benefit of thread heaps is improved locality due to better memory layout.
    372377As each thread only allocates from its heap, all objects for a thread are consolidated in the storage area for that heap, better utilizing each CPUs cache and accessing fewer pages.
     
    380385Second is to place the thread heap on a list of available heaps and reuse it for a new thread in the future.
    381386Destroying the thread heap immediately may reduce external fragmentation sooner, since all free objects are freed to the global heap and may be reused by other threads.
    382 Alternatively, reusing thread heaps may improve performance if the inheriting thread makes similar allocation requests as the thread that previously held the thread heap.
     387Alternatively, reusing thread heaps may improve performance if the inheriting thread makes similar allocation requests as the thread that previously held the thread heap because any unfreed storage is immediately accessible..
    383388
    384389
     
    388393However, an important goal of user-level threading is for fast operations (creation/termination/context-switching) by not interacting with the operating system, which allows the ability to create large numbers of high-performance interacting threads ($>$ 10,000).
    389394It is difficult to retain this goal, if the user-threading model is directly involved with the heap model.
    390 \VRef[Figure]{f:UserLevelKernelHeaps} shows that virtually all user-level threading systems use whatever kernel-level heap-model provided by the language runtime.
     395\VRef[Figure]{f:UserLevelKernelHeaps} shows that virtually all user-level threading systems use whatever kernel-level heap-model is provided by the language runtime.
    391396Hence, a user thread allocates/deallocates from/to the heap of the kernel thread on which it is currently executing.
    392397
     
    400405Adopting this model results in a subtle problem with shared heaps.
    401406With kernel threading, an operation that is started by a kernel thread is always completed by that thread.
    402 For example, if a kernel thread starts an allocation/deallocation on a shared heap, it always completes that operation with that heap even if preempted.
    403 Any correctness locking associated with the shared heap is preserved across preemption.
     407For example, if a kernel thread starts an allocation/deallocation on a shared heap, it always completes that operation with that heap even if preempted, \ie any locking correctness associated with the shared heap is preserved across preemption.
    404408
    405409However, this correctness property is not preserved for user-level threading.
     
    409413However, eagerly disabling/enabling time-slicing on the allocation/deallocation fast path is expensive, because preemption is rare (10--100 milliseconds).
    410414Instead, techniques exist to lazily detect this case in the interrupt handler, abort the preemption, and return to the operation so it can complete atomically.
    411 Occasionally ignoring a preemption should be benign.
     415Occasionally ignoring a preemption should be benign, but a persistent lack of preemption can result in both short and long term starvation.
    412416
    413417
     
    430434
    431435\newterm{Ownership} defines which heap an object is returned-to on deallocation.
    432 If a thread returns an object to the heap it was originally allocated from, the heap has ownership of its objects.
    433 Alternatively, a thread can return an object to the heap it is currently allocating from, which can be any heap accessible during a thread's lifetime.
     436If a thread returns an object to the heap it was originally allocated from, a heap has ownership of its objects.
     437Alternatively, a thread can return an object to the heap it is currently associated with, which can be any heap accessible during a thread's lifetime.
    434438\VRef[Figure]{f:HeapsOwnership} shows an example of multiple heaps (minus the global heap) with and without ownership.
    435439Again, the arrows indicate the direction memory conceptually moves for each kind of operation.
     
    539543Only with the 1:1 model and ownership is active and passive false-sharing avoided (see \VRef{s:Ownership}).
    540544Passive false-sharing may still occur, if delayed ownership is used.
     545Finally, a completely free container can become reserved storage and be reset to allocate objects of a new size or freed to the global heap.
    541546
    542547\begin{figure}
     
    553558\caption{Free-list Structure with Container Ownership}
    554559\end{figure}
    555 
    556 A fragmented heap has multiple containers that may be partially or completely free.
    557 A completely free container can become reserved storage and be reset to allocate objects of a new size.
    558 When a heap reaches a threshold of free objects, it moves some free storage to the global heap for reuse to prevent heap blowup.
    559 Without ownership, when a heap frees objects to the global heap, individual objects must be passed, and placed on the global-heap's free-list.
    560 Containers cannot be freed to the global heap unless completely free because
    561560
    562561When a container changes ownership, the ownership of all objects within it change as well.
     
    569568Note, once the object is freed by Task$_1$, no more false sharing can occur until the container changes ownership again.
    570569To prevent this form of false sharing, container movement may be restricted to when all objects in the container are free.
    571 One implementation approach that increases the freedom to return a free container to the operating system involves allocating containers using a call like @mmap@, which allows memory at an arbitrary address to be returned versus only storage at the end of the contiguous @sbrk@ area.
     570One implementation approach that increases the freedom to return a free container to the operating system involves allocating containers using a call like @mmap@, which allows memory at an arbitrary address to be returned versus only storage at the end of the contiguous @sbrk@ area, again pushing storage management complexity back to the operating system.
    572571
    573572\begin{figure}
     
    700699\end{figure}
    701700
    702 As mentioned, an implementation may have only one heap deal with the global heap, so the other heap can be simplified.
     701As mentioned, an implementation may have only one heap interact with the global heap, so the other heap can be simplified.
    703702For example, if only the private heap interacts with the global heap, the public heap can be reduced to a lock-protected free-list of objects deallocated by other threads due to ownership, called a \newterm{remote free-list}.
    704703To avoid heap blowup, the private heap allocates from the remote free-list when it reaches some threshold or it has no free storage.
     
    721720An allocation buffer is reserved memory (see~\VRef{s:AllocatorComponents}) not yet allocated to the program, and is used for allocating objects when the free list is empty.
    722721That is, rather than requesting new storage for a single object, an entire buffer is requested from which multiple objects are allocated later.
    723 Both any heap may use an allocation buffer, resulting in allocation from the buffer before requesting objects (containers) from the global heap or operating system, respectively.
     722Any heap may use an allocation buffer, resulting in allocation from the buffer before requesting objects (containers) from the global heap or operating system, respectively.
    724723The allocation buffer reduces contention and the number of global/operating-system calls.
    725724For coalescing, a buffer is split into smaller objects by allocations, and recomposed into larger buffer areas during deallocations.
    726725
    727 Allocation buffers are useful initially when there are no freed objects in a heap because many allocations usually occur when a thread starts.
     726Allocation buffers are useful initially when there are no freed objects in a heap because many allocations usually occur when a thread starts (simple bump allocation).
    728727Furthermore, to prevent heap blowup, objects should be reused before allocating a new allocation buffer.
    729 Thus, allocation buffers are often allocated more frequently at program/thread start, and then their use often diminishes.
     728Thus, allocation buffers are often allocated more frequently at program/thread start, and then allocations often diminish.
    730729
    731730Using an allocation buffer with a thread heap avoids active false-sharing, since all objects in the allocation buffer are allocated to the same thread.
     
    746745\label{s:LockFreeOperations}
    747746
    748 A lock-free algorithm guarantees safe concurrent-access to a data structure, so that at least one thread can make progress in the system, but an individual task has no bound to execution, and hence, may starve~\cite[pp.~745--746]{Herlihy93}.
    749 % A wait-free algorithm puts a finite bound on the number of steps any thread takes to complete an operation, so an individual task cannot starve
     747A \newterm{lock-free algorithm} guarantees safe concurrent-access to a data structure, so that at least one thread makes progress, but an individual task has no execution bound and may starve~\cite[pp.~745--746]{Herlihy93}.
     748(A \newterm{wait-free algorithm} puts a bound on the number of steps any thread takes to complete an operation to prevent starvation.)
    750749Lock-free operations can be used in an allocator to reduce or eliminate the use of locks.
    751 Locks are a problem for high contention or if the thread holding the lock is preempted and other threads attempt to use that lock.
    752 With respect to the heap, these situations are unlikely unless all threads makes extremely high use of dynamic-memory allocation, which can be an indication of poor design.
     750While locks and lock-free data-structures often have equal performance, lock-free has the advantage of not holding a lock across preemption so other threads can continue to make progress.
     751With respect to the heap, these situations are unlikely unless all threads make extremely high use of dynamic-memory allocation, which can be an indication of poor design.
    753752Nevertheless, lock-free algorithms can reduce the number of context switches, since a thread does not yield/block while waiting for a lock;
    754 on the other hand, a thread may busy-wait for an unbounded period.
     753on the other hand, a thread may busy-wait for an unbounded period holding a processor.
    755754Finally, lock-free implementations have greater complexity and hardware dependency.
    756755Lock-free algorithms can be applied most easily to simple free-lists, \eg remote free-list, to allow lock-free insertion and removal from the head of a stack.
    757 Implementing lock-free operations for more complex data-structures (queue~\cite{Valois94}/deque~\cite{Sundell08}) is more complex.
     756Implementing lock-free operations for more complex data-structures (queue~\cite{Valois94}/deque~\cite{Sundell08}) is correspondinglyy more complex.
    758757Michael~\cite{Michael04} and Gidenstam \etal \cite{Gidenstam05} have created lock-free variations of the Hoard allocator.
  • doc/theses/mubeen_zulfiqar_MMath/intro.tex

    r65781a8 ra114743  
    4848Attempts have been made to perform quasi garbage collection in C/\CC~\cite{Boehm88}, but it is a compromise.
    4949This thesis only examines dynamic memory-management with \emph{explicit} deallocation.
    50 While garbage collection and compaction are not part this work, many of the results are applicable to the allocation phase in any memory-management approach.
     50While garbage collection and compaction are not part this work, many of the work's results are applicable to the allocation phase in any memory-management approach.
    5151
    5252Most programs use a general-purpose allocator, often the one provided implicitly by the programming-language's runtime.
     
    6565\begin{enumerate}[leftmargin=*]
    6666\item
    67 Implementation of a new stand-lone concurrent low-latency memory-allocator ($\approx$1,200 lines of code) for C/\CC programs using kernel threads (1:1 threading), and specialized versions of the allocator for programming languages \uC and \CFA using user-level threads running over multiple kernel threads (M:N threading).
    68 
    69 \item
    70 Adopt returning of @nullptr@ for a zero-sized allocation, rather than an actual memory address, both of which can be passed to @free@.
    71 
    72 \item
    73 Extended the standard C heap functionality by preserving with each allocation its original request size versus the amount allocated, if an allocation is zero fill, and the allocation alignment.
    74 
    75 \item
    76 Use the zero fill and alignment as \emph{sticky} properties for @realloc@, to realign existing storage, or preserve existing zero-fill and alignment when storage is copied.
     67Implementation of a new stand-lone concurrent low-latency memory-allocator ($\approx$1,200 lines of code) for C/\CC programs using kernel threads (1:1 threading), and specialized versions of the allocator for the programming languages \uC and \CFA using user-level threads running over multiple kernel threads (M:N threading).
     68
     69\item
     70Adopt @nullptr@ return for a zero-sized allocation, rather than an actual memory address, which can be passed to @free@.
     71
     72\item
     73Extend the standard C heap functionality by preserving with each allocation:
     74\begin{itemize}[itemsep=0pt]
     75\item
     76its request size plus the amount allocated,
     77\item
     78whether an allocation is zero fill,
     79\item
     80and allocation alignment.
     81\end{itemize}
     82
     83\item
     84Use the preserved zero fill and alignment as \emph{sticky} properties for @realloc@ to zero-fill and align when storage is extended or copied.
    7785Without this extension, it is unsafe to @realloc@ storage initially allocated with zero-fill/alignment as these properties are not preserved when copying.
    7886This silent generation of a problem is unintuitive to programmers and difficult to locate because it is transient.
     
    8694@resize( oaddr, alignment, size )@ re-purpose an old allocation with new alignment but \emph{without} preserving fill.
    8795\item
    88 @realloc( oaddr, alignment, size )@ same as previous @realloc@ but adding or changing alignment.
     96@realloc( oaddr, alignment, size )@ same as @realloc@ but adding or changing alignment.
    8997\item
    9098@aalloc( dim, elemSize )@ same as @calloc@ except memory is \emph{not} zero filled.
     
    96104
    97105\item
    98 Provide additional heap wrapper functions in \CFA to provide a complete orthogonal set of allocation operations and properties.
     106Provide additional heap wrapper functions in \CFA creating an orthogonal set of allocation operations and properties.
    99107
    100108\item
     
    109117@malloc_size( addr )@ returns the size of the memory allocation pointed-to by @addr@.
    110118\item
    111 @malloc_usable_size( addr )@ returns the usable size of the memory pointed-to by @addr@, i.e., the bin size containing the allocation, where @malloc_size( addr )@ $\le$ @malloc_usable_size( addr )@.
     119@malloc_usable_size( addr )@ returns the usable (total) size of the memory pointed-to by @addr@, i.e., the bin size containing the allocation, where @malloc_size( addr )@ $\le$ @malloc_usable_size( addr )@.
    112120\end{itemize}
    113121
Note: See TracChangeset for help on using the changeset viewer.