Context Navigation

← Previous Changeset
Next Changeset →

Changeset 187570f

Timestamp:

May 11, 2023, 10:27:52 AM (13 months ago)

Author:

caparsons <caparson@…>

Branches:

ADT, ast-experimental, master

Children:

c5a2c96

Parents:

c34a1a4 (diff), d697527 (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the (diff) links above to see all the changes relative to each parent.

Message:

Merge branch 'master' of plg.uwaterloo.ca:software/cfa/cfa-cc

Location:

doc/papers/llheap

Files:

: 1 added
: 1 edited

Makefile (added)
Paper.tex (modified) (19 diffs)

Legend:

: Unmodified
: Added
: Removed

doc/papers/llheap/Paper.tex

-                      rc34a1a4
+                      r187570f
 The principle of locality recognizes that programs tend to reference a small set of data, called a working set, for a certain period of time, where a working set is composed of temporal and spatial accesses~\cite{Denning05}.
+Temporal clustering implies a group of objects are accessed repeatedly within a short time period, while spatial clustering implies a group of objects physically close together (nearby addresses) are accessed repeatedly within a short time period.
+Temporal locality commonly occurs during an iterative computation with a fixed set of disjoint variables, while spatial locality commonly occurs when traversing an array.
+% Temporal clustering implies a group of objects are accessed repeatedly within a short time period, while spatial clustering implies a group of objects physically close together (nearby addresses) are accessed repeatedly within a short time period.
+% Temporal locality commonly occurs during an iterative computation with a fixed set of disjoint variables, while spatial locality commonly occurs when traversing an array.
 Hardware takes advantage of temporal and spatial locality through multiple levels of caching, \ie memory hierarchy.
 When an object is accessed, the memory physically located around the object is also cached with the expectation that the current and nearby objects will be referenced within a short period of time.
+% When an object is accessed, the memory physically located around the object is also cached with the expectation that the current and nearby objects will be referenced within a short period of time.
 For example, entire cache lines are transferred between memory and cache and entire virtual-memory pages are transferred between disk and memory.
 A program exhibiting good locality has better performance due to fewer cache misses and page faults\footnote{With the advent of large RAM memory, paging is becoming less of an issue in modern programming.}.
+% A program exhibiting good locality has better performance due to fewer cache misses and page faults\footnote{With the advent of large RAM memory, paging is becoming less of an issue in modern programming.}.
 Temporal locality is largely controlled by how a program accesses its variables~\cite{Feng05}.
 …
 For temporal locality, an allocator can return storage for new allocations that was just freed as these memory locations are still \emph{warm} in the memory hierarchy.
 For spatial locality, an allocator can place objects used together close together in memory, so the working set of the program fits into the fewest possible cache lines and pages.
 However, usage patterns are different for every program as is the underlying hardware memory architecture;
 hence, no general-purpose memory-allocator can provide ideal locality for every program on every computer.
+% However, usage patterns are different for every program as is the underlying hardware memory architecture;
+% hence, no general-purpose memory-allocator can provide ideal locality for every program on every computer.
 There are a number of ways a memory allocator can degrade locality by increasing the working set.
+For example, a memory allocator may access multiple free objects before finding one to satisfy an allocation request, \eg sequential-fit algorithm.
+If there are a (large) number of objects accessed in very different areas of memory, the allocator may perturb the program's memory hierarchy causing multiple cache or page misses~\cite{Grunwald93}.
+For example, a memory allocator may access multiple free objects before finding one to satisfy an allocation request, \eg sequential-fit algorithm, which can perturb the program's memory hierarchy causing multiple cache or page misses~\cite{Grunwald93}.
 Another way locality can be degraded is by spatially separating related data.
 For example, in a binning allocator, objects of different sizes are allocated from different bins that may be located in different pages of memory.
 …
 Second is when multiple threads contend for a shared resource simultaneously, and hence, some threads must wait until the resource is released.
 Contention can be reduced in a number of ways:
+\begin{itemize}[itemsep=0pt]
+\item
+using multiple fine-grained locks versus a single lock, spreading the contention across a number of locks;
+\item
+using trylock and generating new storage if the lock is busy, yielding a classic space versus time tradeoff;
+\item
+using one of the many lock-free approaches for reducing contention on basic data-structure operations~\cite{Oyama99}.
+\end{itemize}
+) Using multiple fine-grained locks versus a single lock, spreading the contention across a number of locks.
+) Using trylock and generating new storage if the lock is busy, yielding a classic space versus time tradeoff.
+) Using one of the many lock-free approaches for reducing contention on basic data-structure operations~\cite{Oyama99}.
 However, all of these approaches have degenerate cases where program contention is high, which occurs outside of the allocator.
 …
 a memory allocator can only affect the latter two.
+\paragraph{Program-induced false-sharing}
+occurs when one thread passes an object sharing a cache line to another thread, and both threads modify the respective objects.
+Figure~\ref{f:ProgramInducedFalseSharing} shows when Thread$_1$ passes Object$_2$ to Thread$_2$, a false-sharing situation forms when Thread$_1$ modifies Object$_1$ and Thread$_2$ modifies Object$_2$.
+Changes to Object$_1$ invalidate CPU$_2$'s cache line, and changes to Object$_2$ invalidate CPU$_1$'s cache line.
+\begin{figure}
+\centering
+\subfloat[Program-Induced False-Sharing]{
+        \input{ProgramFalseSharing}
+        \label{f:ProgramInducedFalseSharing}
+} \\
+\vspace{5pt}
+\subfloat[Allocator-Induced Active False-Sharing]{
+        \input{AllocInducedActiveFalseSharing}
+        \label{f:AllocatorInducedActiveFalseSharing}
+} \\
+\vspace{5pt}
+\subfloat[Allocator-Induced Passive False-Sharing]{
+        \input{AllocInducedPassiveFalseSharing}
+        \label{f:AllocatorInducedPassiveFalseSharing}
+} % subfloat
+\caption{False Sharing}
+\label{f:FalseSharing}
+\end{figure}
+\paragraph{Allocator-induced active false-sharing}
+\label{s:AllocatorInducedActiveFalseSharing}
+occurs when objects are allocated within the same cache line but to different threads.
+For example, in Figure~\ref{f:AllocatorInducedActiveFalseSharing}, each thread allocates an object and loads a cache-line of memory into its associated cache.
+Again, changes to Object$_1$ invalidate CPU$_2$'s cache line, and changes to Object$_2$ invalidate CPU$_1$'s cache line.
+\paragraph{Allocator-induced passive false-sharing}
+\label{s:AllocatorInducedPassiveFalseSharing}
+is another form of allocator-induced false-sharing caused by program-induced false-sharing.
+When an object in a program-induced false-sharing situation is deallocated, a future allocation of that object may cause passive false-sharing.
+For example, in Figure~\ref{f:AllocatorInducedPassiveFalseSharing}, Thread$_1$ passes Object$_2$ to Thread$_2$, and Thread$_2$ subsequently deallocates Object$_2$.
+Allocator-induced passive false-sharing occurs when Object$_2$ is reallocated to Thread$_2$ while Thread$_1$ is still using Object$_1$.
+Assume two objects, object$_1$ and object$_2$, share a cache line.
+\newterm{Program-induced false-sharing} occurs when thread$_1$ passes a reference to object$_2$ to thread$_2$, and then threads$_1$ modifies object$_1$ while thread$_2$ modifies object$_2$.
+% Figure~\ref{f:ProgramInducedFalseSharing} shows when Thread$_1$ passes Object$_2$ to Thread$_2$, a false-sharing situation forms when Thread$_1$ modifies Object$_1$ and Thread$_2$ modifies Object$_2$.
+% Changes to Object$_1$ invalidate CPU$_2$'s cache line, and changes to Object$_2$ invalidate CPU$_1$'s cache line.
+% \begin{figure}
+% \centering
+% \subfloat[Program-Induced False-Sharing]{
+%       \input{ProgramFalseSharing}
+%       \label{f:ProgramInducedFalseSharing}
+% } \\
+% \vspace{5pt}
+% \subfloat[Allocator-Induced Active False-Sharing]{
+%       \input{AllocInducedActiveFalseSharing}
+%       \label{f:AllocatorInducedActiveFalseSharing}
+% } \\
+% \vspace{5pt}
+% \subfloat[Allocator-Induced Passive False-Sharing]{
+%       \input{AllocInducedPassiveFalseSharing}
+%       \label{f:AllocatorInducedPassiveFalseSharing}
+% } subfloat
+% \caption{False Sharing}
+% \label{f:FalseSharing}
+% \end{figure}
+\newterm{Allocator-induced active false-sharing}\label{s:AllocatorInducedActiveFalseSharing} occurs when object$_1$ and object$_2$ are heap allocated and their references are passed to thread$_1$ and thread$_2$, which modify the objects.
+% For example, in Figure~\ref{f:AllocatorInducedActiveFalseSharing}, each thread allocates an object and loads a cache-line of memory into its associated cache.
+% Again, changes to Object$_1$ invalidate CPU$_2$'s cache line, and changes to Object$_2$ invalidate CPU$_1$'s cache line.
+\newterm{Allocator-induced passive false-sharing}\label{s:AllocatorInducedPassiveFalseSharing} occurs
+% is another form of allocator-induced false-sharing caused by program-induced false-sharing.
+% When an object in a program-induced false-sharing situation is deallocated, a future allocation of that object may cause passive false-sharing.
+when thread$_1$ passes object$_2$ to thread$_2$, and thread$_2$ subsequently deallocates object$_2$, and then object$_2$ is reallocated to thread$_2$ while thread$_1$ is still using object$_1$.
 …
 The following features are used in the construction of multi-threaded memory-allocators:
+\begin{list}{\arabic{enumi}.}{\usecounter{enumi}\topsep=0.5ex\parsep=0pt\itemsep=0pt}
+\item multiple heaps
+\begin{list}{\alph{enumii})}{\usecounter{enumii}\topsep=0.5ex\parsep=0pt\itemsep=0pt}
+\item with or without a global heap
+\item with or without ownership
+\end{list}
+\item object containers
+\begin{list}{\alph{enumii})}{\usecounter{enumii}\topsep=0.5ex\parsep=0pt\itemsep=0pt}
+\item with or without ownership
+\item fixed or variable sized
+\item global or local free-lists
+\end{list}
+\begin{enumerate}[itemsep=0pt]
+\item multiple heaps: with or without a global heap, or with or without heap ownership.
+\item object containers: with or without ownership, fixed or variable sized, global or local free-lists.
 \item hybrid private/public heap
 \item allocation buffer
 \item lock-free operations
 \end{list}
+\end{enumerate}
 The first feature, multiple heaps, pertains to different kinds of heaps.
 The second feature, object containers, pertains to the organization of objects within the storage area.
 …
 The multiple threads cause complexity, and multiple heaps are a mechanism for dealing with the complexity.
 The spectrum ranges from multiple threads using a single heap, denoted as T:1 (see Figure~\ref{f:SingleHeap}), to multiple threads sharing multiple heaps, denoted as T:H (see Figure~\ref{f:SharedHeaps}), to one thread per heap, denoted as 1:1 (see Figure~\ref{f:PerThreadHeap}), which is almost back to a single-threaded allocator.
-\paragraph{T:1 model} where all threads allocate and deallocate objects from one heap.
-Memory is obtained from the freed objects, or reserved memory in the heap, or from the operating system (OS);
-the heap may also return freed memory to the operating system.
-The arrows indicate the direction memory conceptually moves for each kind of operation: allocation moves memory along the path from the heap/operating-system to the user application, while deallocation moves memory along the path from the application back to the heap/operating-system.
-To safely handle concurrency, a single heap uses locking to provide mutual exclusion.
-Whether using a single lock for all heap operations or fine-grained locking for different operations, a single heap may be a significant source of contention for programs with a large amount of memory allocation.
 \begin{figure}
 …
 \end{figure}
+\paragraph{T:1 model} where all threads allocate and deallocate objects from one heap.
+Memory is obtained from the freed objects, or reserved memory in the heap, or from the operating system (OS);
+the heap may also return freed memory to the operating system.
+The arrows indicate the direction memory conceptually moves for each kind of operation: allocation moves memory along the path from the heap/operating-system to the user application, while deallocation moves memory along the path from the application back to the heap/operating-system.
+To safely handle concurrency, a single lock may be used for all heap operations or fine-grained locking for different operations.
+Regardless, a single heap may be a significant source of contention for programs with a large amount of memory allocation.
 \paragraph{T:H model} where each thread allocates storage from several heaps depending on certain criteria, with the goal of reducing contention by spreading allocations/deallocations across the heaps.
 The decision on when to create a new heap and which heap a thread allocates from depends on the allocator design.
+To determine which heap to access, each thread must point to its associated heap in some way.
 The performance goal is to reduce the ratio of heaps to threads.
+In general, locking is required, since more than one thread may concurrently access a heap during its lifetime, but contention is reduced because fewer threads access a specific heap.
+For example, multiple heaps are managed in a pool, starting with a single or a fixed number of heaps that increase\-/decrease depending on contention\-/space issues.
+At creation, a thread is associated with a heap from the pool.
+In some implementations of this model, when the thread attempts an allocation and its associated heap is locked (contention), it scans for an unlocked heap in the pool.
+If an unlocked heap is found, the thread changes its association and uses that heap.
+If all heaps are locked, the thread may create a new heap, use it, and then place the new heap into the pool;
+or the thread can block waiting for a heap to become available.
+While the heap-pool approach often minimizes the number of extant heaps, the worse case can result in more heaps than threads;
+\eg if the number of threads is large at startup with many allocations creating a large number of heaps and then the number of threads reduces.
+Threads using multiple heaps need to determine the specific heap to access for an allocation/deallocation, \ie association of thread to heap.
+A number of techniques are used to establish this association.
+The simplest approach is for each thread to have a pointer to its associated heap (or to administrative information that points to the heap), and this pointer changes if the association changes.
+For threading systems with thread-local storage, the heap pointer is created using this mechanism;
+otherwise, the heap routines must simulate thread-local storage using approaches like hashing the thread's stack-pointer or thread-id to find its associated heap.
+The storage management for multiple heaps is more complex than for a single heap (see Figure~\ref{f:AllocatorComponents}).
+Figure~\ref{f:MultipleHeapStorage} illustrates the general storage layout for multiple heaps.
+Allocated and free objects are labelled by the thread or heap they are associated with.
+(Links between free objects are removed for simplicity.)
+The management information in the static zone must be able to locate all heaps in the dynamic zone.
+However, the worse case can result in more heaps than threads, \eg if the number of threads is large at startup with many allocations creating a large number of heaps and then the number of threads reduces.
+Locking is required, since more than one thread may concurrently access a heap during its lifetime, but contention is reduced because fewer threads access a specific heap.
+% For example, multiple heaps are managed in a pool, starting with a single or a fixed number of heaps that increase\-/decrease depending on contention\-/space issues.
+% At creation, a thread is associated with a heap from the pool.
+% In some implementations of this model, when the thread attempts an allocation and its associated heap is locked (contention), it scans for an unlocked heap in the pool.
+% If an unlocked heap is found, the thread changes its association and uses that heap.
+% If all heaps are locked, the thread may create a new heap, use it, and then place the new heap into the pool;
+% or the thread can block waiting for a heap to become available.
+% While the heap-pool approach often minimizes the number of extant heaps, the worse case can result in more heaps than threads;
+% \eg if the number of threads is large at startup with many allocations creating a large number of heaps and then the number of threads reduces.
+% Threads using multiple heaps need to determine the specific heap to access for an allocation/deallocation, \ie association of thread to heap.
+% A number of techniques are used to establish this association.
+% The simplest approach is for each thread to have a pointer to its associated heap (or to administrative information that points to the heap), and this pointer changes if the association changes.
+% For threading systems with thread-local storage, the heap pointer is created using this mechanism;
+% otherwise, the heap routines must simulate thread-local storage using approaches like hashing the thread's stack-pointer or thread-id to find its associated heap.
+% The storage management for multiple heaps is more complex than for a single heap (see Figure~\ref{f:AllocatorComponents}).
+% Figure~\ref{f:MultipleHeapStorage} illustrates the general storage layout for multiple heaps.
+% Allocated and free objects are labelled by the thread or heap they are associated with.
+% (Links between free objects are removed for simplicity.)
+The management information for multiple heaps in the static zone must be able to locate all heaps.
 The management information for the heaps must reside in the dynamic-allocation zone if there are a variable number.
 Each heap in the dynamic zone is composed of a list of free objects and a pointer to its reserved memory.
 …
 Other storage-management options are to use @mmap@ to set aside (large) areas of virtual memory for each heap and suballocate each heap's storage within that area, pushing part of the storage management complexity back to the operating system.
 \begin{figure}
 \centering
 \input{MultipleHeapsStorage}
 \caption{Multiple-Heap Storage}
 \label{f:MultipleHeapStorage}
 \end{figure}
+% \begin{figure}
+% \centering
+% \input{MultipleHeapsStorage}
+% \caption{Multiple-Heap Storage}
+% \label{f:MultipleHeapStorage}
+% \end{figure}
 Multiple heaps increase external fragmentation as the ratio of heaps to threads increases, which can lead to heap blowup.
 …
 \paragraph{1:1 model (thread heaps)} where each thread has its own heap eliminating most contention and locking because threads seldom access another thread's heap (see ownership in Section~\ref{s:Ownership}).
 An additional benefit of thread heaps is improved locality due to better memory layout.
 As each thread only allocates from its heap, all objects for a thread are consolidated in the storage area for that heap, better utilizing each CPUs cache and accessing fewer pages.
+As each thread only allocates from its heap, all objects are consolidated in the storage area for that heap, better utilizing each CPUs cache and accessing fewer pages.
 In contrast, the T:H model spreads each thread's objects over a larger area in different heaps.
 Thread heaps can also eliminate allocator-induced active false-sharing, if memory is acquired so it does not overlap at crucial boundaries with memory for another thread's heap.
 For example, assume page boundaries coincide with cache line boundaries, if a thread heap always acquires pages of memory then no two threads share a page or cache line unless pointers are passed among them.
 Hence, allocator-induced active false-sharing in Figure~\ref{f:AllocatorInducedActiveFalseSharing} cannot occur because the memory for thread heaps never overlaps.
+Hence, allocator-induced active false-sharing cannot occur because the memory for thread heaps never overlaps.
 When a thread terminates, there are two options for handling its thread heap.
 …
 With kernel threading, an operation that is started by a kernel thread is always completed by that thread.
 For example, if a kernel thread starts an allocation/deallocation on a shared heap, it always completes that operation with that heap even if preempted, \ie any locking correctness associated with the shared heap is preserved across preemption.
 However, this correctness property is not preserved for user-level threading.
 A user thread can start an allocation/deallocation on one kernel thread, be preempted (time slice), and continue running on a different kernel thread to complete the operation~\cite{Dice02}.
 …
 However, eagerly disabling/enabling time-slicing on the allocation/deallocation fast path is expensive, because preemption does not happen that frequently.
 Instead, techniques exist to lazily detect this case in the interrupt handler, abort the preemption, and return to the operation so it can complete atomically.
+Occasionally ignoring a preemption should be benign, but a persistent lack of preemption can result in both short and long term starvation.
+Occasionally ignoring a preemption should be benign, but a persistent lack of preemption can result in both short and long term starvation;
+techniques like rollforward can be used to force an eventual preemption.
 …
 \end{figure}
+Figure~\ref{f:MultipleHeapStorageOwnership} shows the effect of ownership on storage layout.
+(For simplicity, assume the heaps all use the same size of reserves storage.)
+In contrast to Figure~\ref{f:MultipleHeapStorage}, each reserved area used by a heap only contains free storage for that particular heap because threads must return free objects back to the owner heap.
+Again, because multiple threads can allocate/free/reallocate adjacent storage in the same heap, all forms of false sharing may occur.
+The exception is for the 1:1 model if reserved memory does not overlap a cache-line because all allocated storage within a used area is associated with a single thread.
+In this case, there is no allocator-induced active false-sharing (see Figure~\ref{f:AllocatorInducedActiveFalseSharing}) because two adjacent allocated objects used by different threads cannot share a cache-line.
+As well, there is no allocator-induced passive false-sharing (see Figure~\ref{f:AllocatorInducedActiveFalseSharing}) because two adjacent allocated objects used by different threads cannot occur because free objects are returned to the owner heap.
+% Figure~\ref{f:MultipleHeapStorageOwnership} shows the effect of ownership on storage layout.
+% (For simplicity, assume the heaps all use the same size of reserves storage.)
+% In contrast to Figure~\ref{f:MultipleHeapStorage}, each reserved area used by a heap only contains free storage for that particular heap because threads must return free objects back to the owner heap.
 % Passive false-sharing may still occur, if delayed ownership is used (see below).
 \begin{figure}
 \centering
 \input{MultipleHeapsOwnershipStorage.pstex_t}
 \caption{Multiple-Heap Storage with Ownership}
 \label{f:MultipleHeapStorageOwnership}
 \end{figure}
+% \begin{figure}
+% \centering
+% \input{MultipleHeapsOwnershipStorage.pstex_t}
+% \caption{Multiple-Heap Storage with Ownership}
+% \label{f:MultipleHeapStorageOwnership}
+% \end{figure}
 The main advantage of ownership is preventing heap blowup by returning storage for reuse by the owner heap.
 Ownership prevents the classical problem where one thread performs allocations from one heap, passes the object to another thread, and the receiving thread deallocates the object to another heap, hence draining the initial heap of storage.
+As well, allocator-induced passive false-sharing is eliminated because returning an object to its owner heap means it can never be allocated to another thread.
+For example, in Figure~\ref{f:AllocatorInducedPassiveFalseSharing}, the deallocation by Thread$_2$ returns Object$_2$ back to Thread$_1$'s heap;
+hence a subsequent allocation by Thread$_2$ cannot return this storage.
+Because multiple threads can allocate/free/reallocate adjacent storage in the same heap, all forms of false sharing may occur.
+The exception is for the 1:1 model if reserved memory does not overlap a cache-line because all allocated storage within a used area is associated with a single thread.
+In this case, there is no allocator-induced active false-sharing because two adjacent allocated objects used by different threads cannot share a cache-line.
+Finally, there is no allocator-induced passive false-sharing because two adjacent allocated objects used by different threads cannot occur as free objects are returned to the owner heap.
+% For example, in Figure~\ref{f:AllocatorInducedPassiveFalseSharing}, the deallocation by Thread$_2$ returns Object$_2$ back to Thread$_1$'s heap;
+% hence a subsequent allocation by Thread$_2$ cannot return this storage.
 The disadvantage of ownership is deallocating to another thread's heap so heaps are no longer private and require locks to provide safe concurrent access.
 …
 It is better for returning threads to immediately return to the receiving thread's batch list as the receiving thread has better knowledge when to incorporate the batch list into its free pool.
 Batching leverages the fact that most allocation patterns use the contention-free fast-path, so locking on the batch list is rare for both the returning and receiving threads.
+It is possible for heaps to steal objects rather than return them and then reallocate these objects again when storage runs out on a heap.
+However, stealing can result in passive false-sharing.
+For example, in Figure~\ref{f:AllocatorInducedPassiveFalseSharing}, Object$_2$ may be deallocated to Thread$_2$'s heap initially.
+If Thread$_2$ reallocates Object$_2$ before it is returned to its owner heap, then passive false-sharing may occur.
+\subsection{Object Containers}
+\label{s:ObjectContainers}
+Bracketing every allocation with headers/trailers can result in significant internal fragmentation, as shown in Figure~\ref{f:ObjectHeaders}.
+Especially if the headers contain redundant management information, then storing that information is a waste of storage, \eg object size may be the same for many objects because programs only allocate a small set of object sizes.
+As well, it can result in poor cache usage, since only a portion of the cache line is holding useful information from the program's perspective.
+Spatial locality can also be negatively affected leading to poor cache locality~\cite{Feng05}:
+while the header and object are together in memory, they are generally not accessed together;
+\eg the object is accessed by the program when it is allocated, while the header is accessed by the allocator when the object is free.
+Finally, it is possible for heaps to temporarily steal owned objects rather than return them immediately and then reallocate these objects again.
+It is unclear whether the complexity of this approach is worthwhile.
+% However, stealing can result in passive false-sharing.
+% For example, in Figure~\ref{f:AllocatorInducedPassiveFalseSharing}, Object$_2$ may be deallocated to Thread$_2$'s heap initially.
+% If Thread$_2$ reallocates Object$_2$ before it is returned to its owner heap, then passive false-sharing may occur.
 \begin{figure}
 …
 \end{figure}
+An alternative approach factors common header/trailer information to a separate location in memory and organizes associated free storage into blocks called \newterm{object containers} (\newterm{superblocks} in~\cite{Berger00}), as in Figure~\ref{f:ObjectContainer}.
+\subsection{Object Containers}
+\label{s:ObjectContainers}
+Associating header data with every allocation can result in significant internal fragmentation, as shown in Figure~\ref{f:ObjectHeaders}.
+Especially if the headers contain redundant data, \eg object size may be the same for many objects because programs only allocate a small set of object sizes.
+As well, the redundant data can result in poor cache usage, since only a portion of the cache line is holding useful data from the program's perspective.
+Spatial locality can also be negatively affected leading to poor cache locality~\cite{Feng05}.
+While the header and object are spatially together in memory, they are generally not accessed temporarily together;
+\eg an object is accessed by the program after it is allocated, while the header is accessed by the allocator after it is free.
+The alternative factors common header data to a separate location in memory and organizes associated free storage into blocks called \newterm{object containers} (\newterm{superblocks} in~\cite{Berger00}), as in Figure~\ref{f:ObjectContainer}.
 The header for the container holds information necessary for all objects in the container;
 a trailer may also be used at the end of the container.
 …
 The difficulty with object containers lies in finding the object header/trailer given only the object address, since that is normally the only information passed to the deallocation operation.
 One way to do this is to start containers on aligned addresses in memory, then truncate the lower bits of the object address to obtain the header address (or round up and subtract the trailer size to obtain the trailer address).
+One way is to start containers on aligned addresses in memory, then truncate the lower bits of the object address to obtain the header address (or round up and subtract the trailer size to obtain the trailer address).
 For example, if an object at address 0xFC28\,EF08 is freed and containers are aligned on 64\,KB (0x0001\,0000) addresses, then the container header is at 0xFC28\,0000.
 Normally, a container has homogeneous objects of fixed size, with fixed information in the header that applies to all container objects (\eg object size and ownership).
 This approach greatly reduces internal fragmentation since far fewer headers are required, and potentially increases spatial locality as a cache line or page holds more objects since the objects are closer together due to the lack of headers.
 However, although similar objects are close spatially within the same container, different sized objects are further apart in separate containers.
+Normally, a container has homogeneous objects, \eg object size and ownership.
+This approach greatly reduces internal fragmentation since far fewer headers are required, and potentially increases spatial locality as a cache line or page holds more objects since the objects are closer together.
+However, different sized objects are further apart in separate containers.
 Depending on the program, this may or may not improve locality.
 If the program uses several objects from a small number of containers in its working set, then locality is improved since fewer cache lines and pages are required.
 If the program uses many containers, there is poor locality, as both caching and paging increase.
 Another drawback is that external fragmentation may be increased since containers reserve space for objects that may never be allocated by the program, \ie there are often multiple containers for each size only partially full.
+Another drawback is that external fragmentation may be increased since containers reserve space for objects that may never be allocated, \ie there are often multiple containers for each size only partially full.
 However, external fragmentation can be reduced by using small containers.
 …
 Each object header stores the object's heterogeneous information, such as its size, while the container header stores the homogeneous information, such as the owner when using ownership.
 This approach allows containers to hold different types of objects, but does not completely separate headers from objects.
 The benefit of the container in this case is to reduce some redundant information that is factored into the container header.
 In summary, object containers trade off internal fragmentation for external fragmentation by isolating common administration information to remove/reduce internal fragmentation, but at the cost of external fragmentation as some portion of a container may not be used and this portion is unusable for other kinds of allocations.
 A consequence of this tradeoff is its effect on spatial locality, which can produce positive or negative results depending on program access-patterns.
+% The benefit of the container in this case is to reduce some redundant information that is factored into the container header.
+% In summary, object containers trade off internal fragmentation for external fragmentation by isolating common administration information to remove/reduce internal fragmentation, but at the cost of external fragmentation as some portion of a container may not be used and this portion is unusable for other kinds of allocations.
+% A consequence of this tradeoff is its effect on spatial locality, which can produce positive or negative results depending on program access-patterns.
 …
 Without ownership, objects in a container are deallocated to the heap currently associated with the thread that frees the object.
+Thus, different objects in a container may be on different heap free-lists (see Figure~\ref{f:ContainerNoOwnershipFreelist}).
+With ownership, all objects in a container belong to the same heap (see Figure~\ref{f:ContainerOwnershipFreelist}), so ownership of an object is determined by the container owner.
+Thus, different objects in a container may be on different heap free-lists. % (see Figure~\ref{f:ContainerNoOwnershipFreelist}).
+With ownership, all objects in a container belong to the same heap,
+% (see Figure~\ref{f:ContainerOwnershipFreelist}),
+so ownership of an object is determined by the container owner.
 If multiple threads can allocate/free/reallocate adjacent storage in the same heap, all forms of false sharing may occur.
 Only with the 1:1 model and ownership is active and passive false-sharing avoided (see Section~\ref{s:Ownership}).
 …
 Finally, a completely free container can become reserved storage and be reset to allocate objects of a new size or freed to the global heap.
 \begin{figure}
 \centering
 \subfloat[No Ownership]{
         \input{ContainerNoOwnershipFreelist}
         \label{f:ContainerNoOwnershipFreelist}
 } % subfloat
 \vrule
 \subfloat[Ownership]{
         \input{ContainerOwnershipFreelist}
         \label{f:ContainerOwnershipFreelist}
 } % subfloat
 \caption{Free-list Structure with Container Ownership}
 \end{figure}
+% \begin{figure}
+% \centering
+% \subfloat[No Ownership]{
+%       \input{ContainerNoOwnershipFreelist}
+%       \label{f:ContainerNoOwnershipFreelist}
+% } % subfloat
+% \vrule
+% \subfloat[Ownership]{
+%       \input{ContainerOwnershipFreelist}
+%       \label{f:ContainerOwnershipFreelist}
+% } % subfloat
+% \caption{Free-list Structure with Container Ownership}
+% \end{figure}
 When a container changes ownership, the ownership of all objects within it change as well.
 …
 Using containers with ownership increases external fragmentation since a new container for a requested object size must be allocated separately for each thread requesting it.
 In Figure~\ref{f:ExternalFragmentationContainerOwnership}, using object ownership allocates 80\% more space than without ownership.
 \begin{figure}
 \centering
 \subfloat[No Ownership]{
         \input{ContainerNoOwnership}
 } % subfloat
 \\
 \subfloat[Ownership]{
         \input{ContainerOwnership}
 } % subfloat
 \caption{External Fragmentation with Container Ownership}
 \label{f:ExternalFragmentationContainerOwnership}
 \end{figure}
+% In Figure~\ref{f:ExternalFragmentationContainerOwnership}, using object ownership allocates 80\% more space than without ownership.
+% \begin{figure}
+% \centering
+% \subfloat[No Ownership]{
+%       \input{ContainerNoOwnership}
+% } % subfloat
+% \\
+% \subfloat[Ownership]{
+%       \input{ContainerOwnership}
+% } % subfloat
+% \caption{External Fragmentation with Container Ownership}
+% \label{f:ExternalFragmentationContainerOwnership}
+% \end{figure}

Note: See TracChangeset for help on using the changeset viewer.

Context Navigation

Changeset 187570f

Legend:

doc/papers/llheap/Paper.tex

Download in other formats: