Changeset 187570f


Ignore:
Timestamp:
May 11, 2023, 10:27:52 AM (13 months ago)
Author:
caparsons <caparson@…>
Branches:
ADT, ast-experimental, master
Children:
c5a2c96
Parents:
c34a1a4 (diff), d697527 (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the (diff) links above to see all the changes relative to each parent.
Message:

Merge branch 'master' of plg.uwaterloo.ca:software/cfa/cfa-cc

Location:
doc/papers/llheap
Files:
1 added
1 edited

Legend:

Unmodified
Added
Removed
  • doc/papers/llheap/Paper.tex

    rc34a1a4 r187570f  
    502502
    503503The principle of locality recognizes that programs tend to reference a small set of data, called a working set, for a certain period of time, where a working set is composed of temporal and spatial accesses~\cite{Denning05}.
    504 Temporal clustering implies a group of objects are accessed repeatedly within a short time period, while spatial clustering implies a group of objects physically close together (nearby addresses) are accessed repeatedly within a short time period.
    505 Temporal locality commonly occurs during an iterative computation with a fixed set of disjoint variables, while spatial locality commonly occurs when traversing an array.
    506 
     504% Temporal clustering implies a group of objects are accessed repeatedly within a short time period, while spatial clustering implies a group of objects physically close together (nearby addresses) are accessed repeatedly within a short time period.
     505% Temporal locality commonly occurs during an iterative computation with a fixed set of disjoint variables, while spatial locality commonly occurs when traversing an array.
    507506Hardware takes advantage of temporal and spatial locality through multiple levels of caching, \ie memory hierarchy.
    508 When an object is accessed, the memory physically located around the object is also cached with the expectation that the current and nearby objects will be referenced within a short period of time.
     507% When an object is accessed, the memory physically located around the object is also cached with the expectation that the current and nearby objects will be referenced within a short period of time.
    509508For example, entire cache lines are transferred between memory and cache and entire virtual-memory pages are transferred between disk and memory.
    510 A program exhibiting good locality has better performance due to fewer cache misses and page faults\footnote{With the advent of large RAM memory, paging is becoming less of an issue in modern programming.}.
     509% A program exhibiting good locality has better performance due to fewer cache misses and page faults\footnote{With the advent of large RAM memory, paging is becoming less of an issue in modern programming.}.
    511510
    512511Temporal locality is largely controlled by how a program accesses its variables~\cite{Feng05}.
     
    514513For temporal locality, an allocator can return storage for new allocations that was just freed as these memory locations are still \emph{warm} in the memory hierarchy.
    515514For spatial locality, an allocator can place objects used together close together in memory, so the working set of the program fits into the fewest possible cache lines and pages.
    516 However, usage patterns are different for every program as is the underlying hardware memory architecture;
    517 hence, no general-purpose memory-allocator can provide ideal locality for every program on every computer.
     515% However, usage patterns are different for every program as is the underlying hardware memory architecture;
     516% hence, no general-purpose memory-allocator can provide ideal locality for every program on every computer.
    518517
    519518There are a number of ways a memory allocator can degrade locality by increasing the working set.
    520 For example, a memory allocator may access multiple free objects before finding one to satisfy an allocation request, \eg sequential-fit algorithm.
    521 If there are a (large) number of objects accessed in very different areas of memory, the allocator may perturb the program's memory hierarchy causing multiple cache or page misses~\cite{Grunwald93}.
     519For example, a memory allocator may access multiple free objects before finding one to satisfy an allocation request, \eg sequential-fit algorithm, which can perturb the program's memory hierarchy causing multiple cache or page misses~\cite{Grunwald93}.
    522520Another way locality can be degraded is by spatially separating related data.
    523521For example, in a binning allocator, objects of different sizes are allocated from different bins that may be located in different pages of memory.
     
    539537Second is when multiple threads contend for a shared resource simultaneously, and hence, some threads must wait until the resource is released.
    540538Contention can be reduced in a number of ways:
    541 \begin{itemize}[itemsep=0pt]
    542 \item
    543 using multiple fine-grained locks versus a single lock, spreading the contention across a number of locks;
    544 \item
    545 using trylock and generating new storage if the lock is busy, yielding a classic space versus time tradeoff;
    546 \item
    547 using one of the many lock-free approaches for reducing contention on basic data-structure operations~\cite{Oyama99}.
    548 \end{itemize}
     5391) Using multiple fine-grained locks versus a single lock, spreading the contention across a number of locks.
     5402) Using trylock and generating new storage if the lock is busy, yielding a classic space versus time tradeoff.
     5413) Using one of the many lock-free approaches for reducing contention on basic data-structure operations~\cite{Oyama99}.
    549542However, all of these approaches have degenerate cases where program contention is high, which occurs outside of the allocator.
    550543
     
    558551a memory allocator can only affect the latter two.
    559552
    560 \paragraph{Program-induced false-sharing}
    561 occurs when one thread passes an object sharing a cache line to another thread, and both threads modify the respective objects.
    562 Figure~\ref{f:ProgramInducedFalseSharing} shows when Thread$_1$ passes Object$_2$ to Thread$_2$, a false-sharing situation forms when Thread$_1$ modifies Object$_1$ and Thread$_2$ modifies Object$_2$.
    563 Changes to Object$_1$ invalidate CPU$_2$'s cache line, and changes to Object$_2$ invalidate CPU$_1$'s cache line.
    564 
    565 \begin{figure}
    566 \centering
    567 \subfloat[Program-Induced False-Sharing]{
    568         \input{ProgramFalseSharing}
    569         \label{f:ProgramInducedFalseSharing}
    570 } \\
    571 \vspace{5pt}
    572 \subfloat[Allocator-Induced Active False-Sharing]{
    573         \input{AllocInducedActiveFalseSharing}
    574         \label{f:AllocatorInducedActiveFalseSharing}
    575 } \\
    576 \vspace{5pt}
    577 \subfloat[Allocator-Induced Passive False-Sharing]{
    578         \input{AllocInducedPassiveFalseSharing}
    579         \label{f:AllocatorInducedPassiveFalseSharing}
    580 } % subfloat
    581 \caption{False Sharing}
    582 \label{f:FalseSharing}
    583 \end{figure}
    584 
    585 \paragraph{Allocator-induced active false-sharing}
    586 \label{s:AllocatorInducedActiveFalseSharing}
    587 occurs when objects are allocated within the same cache line but to different threads.
    588 For example, in Figure~\ref{f:AllocatorInducedActiveFalseSharing}, each thread allocates an object and loads a cache-line of memory into its associated cache.
    589 Again, changes to Object$_1$ invalidate CPU$_2$'s cache line, and changes to Object$_2$ invalidate CPU$_1$'s cache line.
    590 
    591 \paragraph{Allocator-induced passive false-sharing}
    592 \label{s:AllocatorInducedPassiveFalseSharing}
    593 is another form of allocator-induced false-sharing caused by program-induced false-sharing.
    594 When an object in a program-induced false-sharing situation is deallocated, a future allocation of that object may cause passive false-sharing.
    595 For example, in Figure~\ref{f:AllocatorInducedPassiveFalseSharing}, Thread$_1$ passes Object$_2$ to Thread$_2$, and Thread$_2$ subsequently deallocates Object$_2$.
    596 Allocator-induced passive false-sharing occurs when Object$_2$ is reallocated to Thread$_2$ while Thread$_1$ is still using Object$_1$.
     553Assume two objects, object$_1$ and object$_2$, share a cache line.
     554\newterm{Program-induced false-sharing} occurs when thread$_1$ passes a reference to object$_2$ to thread$_2$, and then threads$_1$ modifies object$_1$ while thread$_2$ modifies object$_2$.
     555% Figure~\ref{f:ProgramInducedFalseSharing} shows when Thread$_1$ passes Object$_2$ to Thread$_2$, a false-sharing situation forms when Thread$_1$ modifies Object$_1$ and Thread$_2$ modifies Object$_2$.
     556% Changes to Object$_1$ invalidate CPU$_2$'s cache line, and changes to Object$_2$ invalidate CPU$_1$'s cache line.
     557% \begin{figure}
     558% \centering
     559% \subfloat[Program-Induced False-Sharing]{
     560%       \input{ProgramFalseSharing}
     561%       \label{f:ProgramInducedFalseSharing}
     562% } \\
     563% \vspace{5pt}
     564% \subfloat[Allocator-Induced Active False-Sharing]{
     565%       \input{AllocInducedActiveFalseSharing}
     566%       \label{f:AllocatorInducedActiveFalseSharing}
     567% } \\
     568% \vspace{5pt}
     569% \subfloat[Allocator-Induced Passive False-Sharing]{
     570%       \input{AllocInducedPassiveFalseSharing}
     571%       \label{f:AllocatorInducedPassiveFalseSharing}
     572% } subfloat
     573% \caption{False Sharing}
     574% \label{f:FalseSharing}
     575% \end{figure}
     576\newterm{Allocator-induced active false-sharing}\label{s:AllocatorInducedActiveFalseSharing} occurs when object$_1$ and object$_2$ are heap allocated and their references are passed to thread$_1$ and thread$_2$, which modify the objects.
     577% For example, in Figure~\ref{f:AllocatorInducedActiveFalseSharing}, each thread allocates an object and loads a cache-line of memory into its associated cache.
     578% Again, changes to Object$_1$ invalidate CPU$_2$'s cache line, and changes to Object$_2$ invalidate CPU$_1$'s cache line.
     579\newterm{Allocator-induced passive false-sharing}\label{s:AllocatorInducedPassiveFalseSharing} occurs
     580% is another form of allocator-induced false-sharing caused by program-induced false-sharing.
     581% When an object in a program-induced false-sharing situation is deallocated, a future allocation of that object may cause passive false-sharing.
     582when thread$_1$ passes object$_2$ to thread$_2$, and thread$_2$ subsequently deallocates object$_2$, and then object$_2$ is reallocated to thread$_2$ while thread$_1$ is still using object$_1$.
    597583
    598584
     
    608594
    609595The following features are used in the construction of multi-threaded memory-allocators:
    610 \begin{list}{\arabic{enumi}.}{\usecounter{enumi}\topsep=0.5ex\parsep=0pt\itemsep=0pt}
    611 \item multiple heaps
    612 \begin{list}{\alph{enumii})}{\usecounter{enumii}\topsep=0.5ex\parsep=0pt\itemsep=0pt}
    613 \item with or without a global heap
    614 \item with or without ownership
    615 \end{list}
    616 \item object containers
    617 \begin{list}{\alph{enumii})}{\usecounter{enumii}\topsep=0.5ex\parsep=0pt\itemsep=0pt}
    618 \item with or without ownership
    619 \item fixed or variable sized
    620 \item global or local free-lists
    621 \end{list}
     596\begin{enumerate}[itemsep=0pt]
     597\item multiple heaps: with or without a global heap, or with or without heap ownership.
     598\item object containers: with or without ownership, fixed or variable sized, global or local free-lists.
    622599\item hybrid private/public heap
    623600\item allocation buffer
    624601\item lock-free operations
    625 \end{list}
     602\end{enumerate}
    626603The first feature, multiple heaps, pertains to different kinds of heaps.
    627604The second feature, object containers, pertains to the organization of objects within the storage area.
     
    635612The multiple threads cause complexity, and multiple heaps are a mechanism for dealing with the complexity.
    636613The spectrum ranges from multiple threads using a single heap, denoted as T:1 (see Figure~\ref{f:SingleHeap}), to multiple threads sharing multiple heaps, denoted as T:H (see Figure~\ref{f:SharedHeaps}), to one thread per heap, denoted as 1:1 (see Figure~\ref{f:PerThreadHeap}), which is almost back to a single-threaded allocator.
    637 
    638 
    639 \paragraph{T:1 model} where all threads allocate and deallocate objects from one heap.
    640 Memory is obtained from the freed objects, or reserved memory in the heap, or from the operating system (OS);
    641 the heap may also return freed memory to the operating system.
    642 The arrows indicate the direction memory conceptually moves for each kind of operation: allocation moves memory along the path from the heap/operating-system to the user application, while deallocation moves memory along the path from the application back to the heap/operating-system.
    643 To safely handle concurrency, a single heap uses locking to provide mutual exclusion.
    644 Whether using a single lock for all heap operations or fine-grained locking for different operations, a single heap may be a significant source of contention for programs with a large amount of memory allocation.
    645614
    646615\begin{figure}
     
    666635\end{figure}
    667636
     637\paragraph{T:1 model} where all threads allocate and deallocate objects from one heap.
     638Memory is obtained from the freed objects, or reserved memory in the heap, or from the operating system (OS);
     639the heap may also return freed memory to the operating system.
     640The arrows indicate the direction memory conceptually moves for each kind of operation: allocation moves memory along the path from the heap/operating-system to the user application, while deallocation moves memory along the path from the application back to the heap/operating-system.
     641To safely handle concurrency, a single lock may be used for all heap operations or fine-grained locking for different operations.
     642Regardless, a single heap may be a significant source of contention for programs with a large amount of memory allocation.
    668643
    669644\paragraph{T:H model} where each thread allocates storage from several heaps depending on certain criteria, with the goal of reducing contention by spreading allocations/deallocations across the heaps.
    670645The decision on when to create a new heap and which heap a thread allocates from depends on the allocator design.
     646To determine which heap to access, each thread must point to its associated heap in some way.
    671647The performance goal is to reduce the ratio of heaps to threads.
    672 In general, locking is required, since more than one thread may concurrently access a heap during its lifetime, but contention is reduced because fewer threads access a specific heap.
    673 
    674 For example, multiple heaps are managed in a pool, starting with a single or a fixed number of heaps that increase\-/decrease depending on contention\-/space issues.
    675 At creation, a thread is associated with a heap from the pool.
    676 In some implementations of this model, when the thread attempts an allocation and its associated heap is locked (contention), it scans for an unlocked heap in the pool.
    677 If an unlocked heap is found, the thread changes its association and uses that heap.
    678 If all heaps are locked, the thread may create a new heap, use it, and then place the new heap into the pool;
    679 or the thread can block waiting for a heap to become available.
    680 While the heap-pool approach often minimizes the number of extant heaps, the worse case can result in more heaps than threads;
    681 \eg if the number of threads is large at startup with many allocations creating a large number of heaps and then the number of threads reduces.
    682 
    683 Threads using multiple heaps need to determine the specific heap to access for an allocation/deallocation, \ie association of thread to heap.
    684 A number of techniques are used to establish this association.
    685 The simplest approach is for each thread to have a pointer to its associated heap (or to administrative information that points to the heap), and this pointer changes if the association changes.
    686 For threading systems with thread-local storage, the heap pointer is created using this mechanism;
    687 otherwise, the heap routines must simulate thread-local storage using approaches like hashing the thread's stack-pointer or thread-id to find its associated heap.
    688 
    689 The storage management for multiple heaps is more complex than for a single heap (see Figure~\ref{f:AllocatorComponents}).
    690 Figure~\ref{f:MultipleHeapStorage} illustrates the general storage layout for multiple heaps.
    691 Allocated and free objects are labelled by the thread or heap they are associated with.
    692 (Links between free objects are removed for simplicity.)
    693 The management information in the static zone must be able to locate all heaps in the dynamic zone.
     648However, the worse case can result in more heaps than threads, \eg if the number of threads is large at startup with many allocations creating a large number of heaps and then the number of threads reduces.
     649Locking is required, since more than one thread may concurrently access a heap during its lifetime, but contention is reduced because fewer threads access a specific heap.
     650
     651% For example, multiple heaps are managed in a pool, starting with a single or a fixed number of heaps that increase\-/decrease depending on contention\-/space issues.
     652% At creation, a thread is associated with a heap from the pool.
     653% In some implementations of this model, when the thread attempts an allocation and its associated heap is locked (contention), it scans for an unlocked heap in the pool.
     654% If an unlocked heap is found, the thread changes its association and uses that heap.
     655% If all heaps are locked, the thread may create a new heap, use it, and then place the new heap into the pool;
     656% or the thread can block waiting for a heap to become available.
     657% While the heap-pool approach often minimizes the number of extant heaps, the worse case can result in more heaps than threads;
     658% \eg if the number of threads is large at startup with many allocations creating a large number of heaps and then the number of threads reduces.
     659
     660% Threads using multiple heaps need to determine the specific heap to access for an allocation/deallocation, \ie association of thread to heap.
     661% A number of techniques are used to establish this association.
     662% The simplest approach is for each thread to have a pointer to its associated heap (or to administrative information that points to the heap), and this pointer changes if the association changes.
     663% For threading systems with thread-local storage, the heap pointer is created using this mechanism;
     664% otherwise, the heap routines must simulate thread-local storage using approaches like hashing the thread's stack-pointer or thread-id to find its associated heap.
     665
     666% The storage management for multiple heaps is more complex than for a single heap (see Figure~\ref{f:AllocatorComponents}).
     667% Figure~\ref{f:MultipleHeapStorage} illustrates the general storage layout for multiple heaps.
     668% Allocated and free objects are labelled by the thread or heap they are associated with.
     669% (Links between free objects are removed for simplicity.)
     670The management information for multiple heaps in the static zone must be able to locate all heaps.
    694671The management information for the heaps must reside in the dynamic-allocation zone if there are a variable number.
    695672Each heap in the dynamic zone is composed of a list of free objects and a pointer to its reserved memory.
     
    698675Other storage-management options are to use @mmap@ to set aside (large) areas of virtual memory for each heap and suballocate each heap's storage within that area, pushing part of the storage management complexity back to the operating system.
    699676
    700 \begin{figure}
    701 \centering
    702 \input{MultipleHeapsStorage}
    703 \caption{Multiple-Heap Storage}
    704 \label{f:MultipleHeapStorage}
    705 \end{figure}
     677% \begin{figure}
     678% \centering
     679% \input{MultipleHeapsStorage}
     680% \caption{Multiple-Heap Storage}
     681% \label{f:MultipleHeapStorage}
     682% \end{figure}
    706683
    707684Multiple heaps increase external fragmentation as the ratio of heaps to threads increases, which can lead to heap blowup.
     
    727704\paragraph{1:1 model (thread heaps)} where each thread has its own heap eliminating most contention and locking because threads seldom access another thread's heap (see ownership in Section~\ref{s:Ownership}).
    728705An additional benefit of thread heaps is improved locality due to better memory layout.
    729 As each thread only allocates from its heap, all objects for a thread are consolidated in the storage area for that heap, better utilizing each CPUs cache and accessing fewer pages.
     706As each thread only allocates from its heap, all objects are consolidated in the storage area for that heap, better utilizing each CPUs cache and accessing fewer pages.
    730707In contrast, the T:H model spreads each thread's objects over a larger area in different heaps.
    731708Thread heaps can also eliminate allocator-induced active false-sharing, if memory is acquired so it does not overlap at crucial boundaries with memory for another thread's heap.
    732709For example, assume page boundaries coincide with cache line boundaries, if a thread heap always acquires pages of memory then no two threads share a page or cache line unless pointers are passed among them.
    733 Hence, allocator-induced active false-sharing in Figure~\ref{f:AllocatorInducedActiveFalseSharing} cannot occur because the memory for thread heaps never overlaps.
     710Hence, allocator-induced active false-sharing cannot occur because the memory for thread heaps never overlaps.
    734711
    735712When a thread terminates, there are two options for handling its thread heap.
     
    758735With kernel threading, an operation that is started by a kernel thread is always completed by that thread.
    759736For example, if a kernel thread starts an allocation/deallocation on a shared heap, it always completes that operation with that heap even if preempted, \ie any locking correctness associated with the shared heap is preserved across preemption.
    760 
    761737However, this correctness property is not preserved for user-level threading.
    762738A user thread can start an allocation/deallocation on one kernel thread, be preempted (time slice), and continue running on a different kernel thread to complete the operation~\cite{Dice02}.
     
    765741However, eagerly disabling/enabling time-slicing on the allocation/deallocation fast path is expensive, because preemption does not happen that frequently.
    766742Instead, techniques exist to lazily detect this case in the interrupt handler, abort the preemption, and return to the operation so it can complete atomically.
    767 Occasionally ignoring a preemption should be benign, but a persistent lack of preemption can result in both short and long term starvation.
     743Occasionally ignoring a preemption should be benign, but a persistent lack of preemption can result in both short and long term starvation;
     744techniques like rollforward can be used to force an eventual preemption.
    768745
    769746
     
    792769\end{figure}
    793770
    794 Figure~\ref{f:MultipleHeapStorageOwnership} shows the effect of ownership on storage layout.
    795 (For simplicity, assume the heaps all use the same size of reserves storage.)
    796 In contrast to Figure~\ref{f:MultipleHeapStorage}, each reserved area used by a heap only contains free storage for that particular heap because threads must return free objects back to the owner heap.
    797 Again, because multiple threads can allocate/free/reallocate adjacent storage in the same heap, all forms of false sharing may occur.
    798 The exception is for the 1:1 model if reserved memory does not overlap a cache-line because all allocated storage within a used area is associated with a single thread.
    799 In this case, there is no allocator-induced active false-sharing (see Figure~\ref{f:AllocatorInducedActiveFalseSharing}) because two adjacent allocated objects used by different threads cannot share a cache-line.
    800 As well, there is no allocator-induced passive false-sharing (see Figure~\ref{f:AllocatorInducedActiveFalseSharing}) because two adjacent allocated objects used by different threads cannot occur because free objects are returned to the owner heap.
     771% Figure~\ref{f:MultipleHeapStorageOwnership} shows the effect of ownership on storage layout.
     772% (For simplicity, assume the heaps all use the same size of reserves storage.)
     773% In contrast to Figure~\ref{f:MultipleHeapStorage}, each reserved area used by a heap only contains free storage for that particular heap because threads must return free objects back to the owner heap.
    801774% Passive false-sharing may still occur, if delayed ownership is used (see below).
    802775
    803 \begin{figure}
    804 \centering
    805 \input{MultipleHeapsOwnershipStorage.pstex_t}
    806 \caption{Multiple-Heap Storage with Ownership}
    807 \label{f:MultipleHeapStorageOwnership}
    808 \end{figure}
     776% \begin{figure}
     777% \centering
     778% \input{MultipleHeapsOwnershipStorage.pstex_t}
     779% \caption{Multiple-Heap Storage with Ownership}
     780% \label{f:MultipleHeapStorageOwnership}
     781% \end{figure}
    809782
    810783The main advantage of ownership is preventing heap blowup by returning storage for reuse by the owner heap.
    811784Ownership prevents the classical problem where one thread performs allocations from one heap, passes the object to another thread, and the receiving thread deallocates the object to another heap, hence draining the initial heap of storage.
    812 As well, allocator-induced passive false-sharing is eliminated because returning an object to its owner heap means it can never be allocated to another thread.
    813 For example, in Figure~\ref{f:AllocatorInducedPassiveFalseSharing}, the deallocation by Thread$_2$ returns Object$_2$ back to Thread$_1$'s heap;
    814 hence a subsequent allocation by Thread$_2$ cannot return this storage.
     785Because multiple threads can allocate/free/reallocate adjacent storage in the same heap, all forms of false sharing may occur.
     786The exception is for the 1:1 model if reserved memory does not overlap a cache-line because all allocated storage within a used area is associated with a single thread.
     787In this case, there is no allocator-induced active false-sharing because two adjacent allocated objects used by different threads cannot share a cache-line.
     788Finally, there is no allocator-induced passive false-sharing because two adjacent allocated objects used by different threads cannot occur as free objects are returned to the owner heap.
     789% For example, in Figure~\ref{f:AllocatorInducedPassiveFalseSharing}, the deallocation by Thread$_2$ returns Object$_2$ back to Thread$_1$'s heap;
     790% hence a subsequent allocation by Thread$_2$ cannot return this storage.
    815791The disadvantage of ownership is deallocating to another thread's heap so heaps are no longer private and require locks to provide safe concurrent access.
    816792
     
    819795It is better for returning threads to immediately return to the receiving thread's batch list as the receiving thread has better knowledge when to incorporate the batch list into its free pool.
    820796Batching leverages the fact that most allocation patterns use the contention-free fast-path, so locking on the batch list is rare for both the returning and receiving threads.
    821 
    822 It is possible for heaps to steal objects rather than return them and then reallocate these objects again when storage runs out on a heap.
    823 However, stealing can result in passive false-sharing.
    824 For example, in Figure~\ref{f:AllocatorInducedPassiveFalseSharing}, Object$_2$ may be deallocated to Thread$_2$'s heap initially.
    825 If Thread$_2$ reallocates Object$_2$ before it is returned to its owner heap, then passive false-sharing may occur.
    826 
    827 
    828 \subsection{Object Containers}
    829 \label{s:ObjectContainers}
    830 
    831 Bracketing every allocation with headers/trailers can result in significant internal fragmentation, as shown in Figure~\ref{f:ObjectHeaders}.
    832 Especially if the headers contain redundant management information, then storing that information is a waste of storage, \eg object size may be the same for many objects because programs only allocate a small set of object sizes.
    833 As well, it can result in poor cache usage, since only a portion of the cache line is holding useful information from the program's perspective.
    834 Spatial locality can also be negatively affected leading to poor cache locality~\cite{Feng05}:
    835 while the header and object are together in memory, they are generally not accessed together;
    836 \eg the object is accessed by the program when it is allocated, while the header is accessed by the allocator when the object is free.
     797Finally, it is possible for heaps to temporarily steal owned objects rather than return them immediately and then reallocate these objects again.
     798It is unclear whether the complexity of this approach is worthwhile.
     799% However, stealing can result in passive false-sharing.
     800% For example, in Figure~\ref{f:AllocatorInducedPassiveFalseSharing}, Object$_2$ may be deallocated to Thread$_2$'s heap initially.
     801% If Thread$_2$ reallocates Object$_2$ before it is returned to its owner heap, then passive false-sharing may occur.
     802
    837803
    838804\begin{figure}
     
    850816\end{figure}
    851817
    852 An alternative approach factors common header/trailer information to a separate location in memory and organizes associated free storage into blocks called \newterm{object containers} (\newterm{superblocks} in~\cite{Berger00}), as in Figure~\ref{f:ObjectContainer}.
     818
     819\subsection{Object Containers}
     820\label{s:ObjectContainers}
     821
     822Associating header data with every allocation can result in significant internal fragmentation, as shown in Figure~\ref{f:ObjectHeaders}.
     823Especially if the headers contain redundant data, \eg object size may be the same for many objects because programs only allocate a small set of object sizes.
     824As well, the redundant data can result in poor cache usage, since only a portion of the cache line is holding useful data from the program's perspective.
     825Spatial locality can also be negatively affected leading to poor cache locality~\cite{Feng05}.
     826While the header and object are spatially together in memory, they are generally not accessed temporarily together;
     827\eg an object is accessed by the program after it is allocated, while the header is accessed by the allocator after it is free.
     828
     829The alternative factors common header data to a separate location in memory and organizes associated free storage into blocks called \newterm{object containers} (\newterm{superblocks} in~\cite{Berger00}), as in Figure~\ref{f:ObjectContainer}.
    853830The header for the container holds information necessary for all objects in the container;
    854831a trailer may also be used at the end of the container.
     
    856833
    857834The difficulty with object containers lies in finding the object header/trailer given only the object address, since that is normally the only information passed to the deallocation operation.
    858 One way to do this is to start containers on aligned addresses in memory, then truncate the lower bits of the object address to obtain the header address (or round up and subtract the trailer size to obtain the trailer address).
     835One way is to start containers on aligned addresses in memory, then truncate the lower bits of the object address to obtain the header address (or round up and subtract the trailer size to obtain the trailer address).
    859836For example, if an object at address 0xFC28\,EF08 is freed and containers are aligned on 64\,KB (0x0001\,0000) addresses, then the container header is at 0xFC28\,0000.
    860837
    861 Normally, a container has homogeneous objects of fixed size, with fixed information in the header that applies to all container objects (\eg object size and ownership).
    862 This approach greatly reduces internal fragmentation since far fewer headers are required, and potentially increases spatial locality as a cache line or page holds more objects since the objects are closer together due to the lack of headers.
    863 However, although similar objects are close spatially within the same container, different sized objects are further apart in separate containers.
     838Normally, a container has homogeneous objects, \eg object size and ownership.
     839This approach greatly reduces internal fragmentation since far fewer headers are required, and potentially increases spatial locality as a cache line or page holds more objects since the objects are closer together.
     840However, different sized objects are further apart in separate containers.
    864841Depending on the program, this may or may not improve locality.
    865842If the program uses several objects from a small number of containers in its working set, then locality is improved since fewer cache lines and pages are required.
    866843If the program uses many containers, there is poor locality, as both caching and paging increase.
    867 Another drawback is that external fragmentation may be increased since containers reserve space for objects that may never be allocated by the program, \ie there are often multiple containers for each size only partially full.
     844Another drawback is that external fragmentation may be increased since containers reserve space for objects that may never be allocated, \ie there are often multiple containers for each size only partially full.
    868845However, external fragmentation can be reduced by using small containers.
    869846
     
    879856Each object header stores the object's heterogeneous information, such as its size, while the container header stores the homogeneous information, such as the owner when using ownership.
    880857This approach allows containers to hold different types of objects, but does not completely separate headers from objects.
    881 The benefit of the container in this case is to reduce some redundant information that is factored into the container header.
    882 
    883 In summary, object containers trade off internal fragmentation for external fragmentation by isolating common administration information to remove/reduce internal fragmentation, but at the cost of external fragmentation as some portion of a container may not be used and this portion is unusable for other kinds of allocations.
    884 A consequence of this tradeoff is its effect on spatial locality, which can produce positive or negative results depending on program access-patterns.
     858% The benefit of the container in this case is to reduce some redundant information that is factored into the container header.
     859
     860% In summary, object containers trade off internal fragmentation for external fragmentation by isolating common administration information to remove/reduce internal fragmentation, but at the cost of external fragmentation as some portion of a container may not be used and this portion is unusable for other kinds of allocations.
     861% A consequence of this tradeoff is its effect on spatial locality, which can produce positive or negative results depending on program access-patterns.
    885862
    886863
     
    889866
    890867Without ownership, objects in a container are deallocated to the heap currently associated with the thread that frees the object.
    891 Thus, different objects in a container may be on different heap free-lists (see Figure~\ref{f:ContainerNoOwnershipFreelist}).
    892 With ownership, all objects in a container belong to the same heap (see Figure~\ref{f:ContainerOwnershipFreelist}), so ownership of an object is determined by the container owner.
     868Thus, different objects in a container may be on different heap free-lists. % (see Figure~\ref{f:ContainerNoOwnershipFreelist}).
     869With ownership, all objects in a container belong to the same heap,
     870% (see Figure~\ref{f:ContainerOwnershipFreelist}),
     871so ownership of an object is determined by the container owner.
    893872If multiple threads can allocate/free/reallocate adjacent storage in the same heap, all forms of false sharing may occur.
    894873Only with the 1:1 model and ownership is active and passive false-sharing avoided (see Section~\ref{s:Ownership}).
     
    896875Finally, a completely free container can become reserved storage and be reset to allocate objects of a new size or freed to the global heap.
    897876
    898 \begin{figure}
    899 \centering
    900 \subfloat[No Ownership]{
    901         \input{ContainerNoOwnershipFreelist}
    902         \label{f:ContainerNoOwnershipFreelist}
    903 } % subfloat
    904 \vrule
    905 \subfloat[Ownership]{
    906         \input{ContainerOwnershipFreelist}
    907         \label{f:ContainerOwnershipFreelist}
    908 } % subfloat
    909 \caption{Free-list Structure with Container Ownership}
    910 \end{figure}
     877% \begin{figure}
     878% \centering
     879% \subfloat[No Ownership]{
     880%       \input{ContainerNoOwnershipFreelist}
     881%       \label{f:ContainerNoOwnershipFreelist}
     882% } % subfloat
     883% \vrule
     884% \subfloat[Ownership]{
     885%       \input{ContainerOwnershipFreelist}
     886%       \label{f:ContainerOwnershipFreelist}
     887% } % subfloat
     888% \caption{Free-list Structure with Container Ownership}
     889% \end{figure}
    911890
    912891When a container changes ownership, the ownership of all objects within it change as well.
     
    936915
    937916Using containers with ownership increases external fragmentation since a new container for a requested object size must be allocated separately for each thread requesting it.
    938 In Figure~\ref{f:ExternalFragmentationContainerOwnership}, using object ownership allocates 80\% more space than without ownership.
    939 
    940 \begin{figure}
    941 \centering
    942 \subfloat[No Ownership]{
    943         \input{ContainerNoOwnership}
    944 } % subfloat
    945 \\
    946 \subfloat[Ownership]{
    947         \input{ContainerOwnership}
    948 } % subfloat
    949 \caption{External Fragmentation with Container Ownership}
    950 \label{f:ExternalFragmentationContainerOwnership}
    951 \end{figure}
     917% In Figure~\ref{f:ExternalFragmentationContainerOwnership}, using object ownership allocates 80\% more space than without ownership.
     918
     919% \begin{figure}
     920% \centering
     921% \subfloat[No Ownership]{
     922%       \input{ContainerNoOwnership}
     923% } % subfloat
     924% \\
     925% \subfloat[Ownership]{
     926%       \input{ContainerOwnership}
     927% } % subfloat
     928% \caption{External Fragmentation with Container Ownership}
     929% \label{f:ExternalFragmentationContainerOwnership}
     930% \end{figure}
    952931
    953932
Note: See TracChangeset for help on using the changeset viewer.