Ignore:
Timestamp:
Jun 17, 2021, 10:38:05 PM (3 years ago)
Author:
Thierry Delisle <tdelisle@…>
Branches:
ADT, ast-experimental, enum, forall-pointer-decay, jacob/cs343-translation, master, new-ast-unique-expr, pthread-emulation, qualifiedEnum
Children:
6e50a6b
Parents:
07b4970 (diff), 572a02f (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the (diff) links above to see all the changes relative to each parent.
Message:

Merge branch 'master' of plg.uwaterloo.ca:software/cfa/cfa-cc

File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/mubeen_zulfiqar_MMath/benchmarks.tex

    r07b4970 rdcbfcbc  
    4141%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    4242
    43 \section Performance Matrices of Memory Allocators
     43\section{Performance Matrices of Memory Allocators}
    4444
    4545When it comes to memory allocators, there are no set standards of performance. Performance of a memory allocator depends highly on the usage pattern of the application. A memory allocator that is the best performer for a certain application X might be the worst for some other application which has completely different memory usage pattern compared to the application X. It is extremely difficult to make one universally best memory allocator which will outperform every other memory allocator for every usage pattern. So, there is a lack of a set of standard benchmarks that are used to evaluate a memory allocators's performance.
    4646
    4747If we breakdown the goals of a memory allocator, there are two basic matrices on which a memory allocator's performance is evaluated.
    48 
    49 1. Memory Overhead
    50 2. Speed
    51 
    52         /subsection Memory Overhead
    53         Memory overhead is the extra memory that a memory allocator takes from OS which is not requested by the application. Ideally, an allocator should get just enough memory from OS that can fulfill application's request and should return this memory to OS as soon as applications frees it. But, allocators retain more memory compared to what application has asked for which causes memory overhead. Memory overhead can happen for various reasons.
    54 
    55                 /subsubsection Fragmentation
    56                 Fragmentation is one of the major reasons behind memory overhead. Fragmentation happens because of situations that are either necassary for proper functioning of the allocator such as internal memory management and book-keeping or are out of allocator's control such as application's usage pattern.
    57 
    58                         /subsubsubsection Internal Fragmentation
    59                         For internal book-keeping, allocators divide raw memory given by OS into chunks, blocks, or lists that can fulfill application's requested size. Allocators use memory given by OS for creating headers, footers etc. to store information about these chunks, blocks, or lists. This increases usage of memory in-addition to the memory requested by application as the allocators need to store their book-keeping information. This extra usage of memory for allocator's own book-keeping is called Internal Fragmentation. Although it cases memory overhead but this overhead is necassary for an allocator's proper funtioning.
    60 
     48\begin{enumerate}
     49\item
     50Memory Overhead
     51\item
     52Speed
     53\end{enumerate}
     54
     55\subsection{Memory Overhead}
     56Memory overhead is the extra memory that a memory allocator takes from OS which is not requested by the application. Ideally, an allocator should get just enough memory from OS that can fulfill application's request and should return this memory to OS as soon as applications frees it. But, allocators retain more memory compared to what application has asked for which causes memory overhead. Memory overhead can happen for various reasons.
     57
     58\subsubsection{Fragmentation}
     59Fragmentation is one of the major reasons behind memory overhead. Fragmentation happens because of situations that are either necassary for proper functioning of the allocator such as internal memory management and book-keeping or are out of allocator's control such as application's usage pattern.
     60
     61\paragraph{Internal Fragmentation}
     62For internal book-keeping, allocators divide raw memory given by OS into chunks, blocks, or lists that can fulfill application's requested size. Allocators use memory given by OS for creating headers, footers etc. to store information about these chunks, blocks, or lists. This increases usage of memory in-addition to the memory requested by application as the allocators need to store their book-keeping information. This extra usage of memory for allocator's own book-keeping is called Internal Fragmentation. Although it cases memory overhead but this overhead is necassary for an allocator's proper funtioning.
    6163
    6264*** FIX ME: Insert a figure of internal fragmentation with explanation
    6365
    64                         /subsubsubsection External Fragmentation
    65                         External fragmentation is the free bits of memory between or around chunks of memory that are currently in-use of the application. Segmentation in memory due to application's usage pattern causes external fragmentation. The memory which is part of external fragmentation is completely free as it is neither used by allocator's internal book-keeping nor by the application. Ideally, an allocator should return a segment of memory back to the OS as soon as application frees it. But, this is not always the case. Allocators get memory from OS in one of the two ways.
    66 
    67                         \begin{itemize}
    68                         \item
    69                         MMap: an allocator can ask OS for whole pages in mmap area. Then, the allocator segments the page internally and fulfills application's request.
    70                         \item
    71                         Heap: an allocator can ask OS for memory in heap area using system calls such as sbrk. Heap are grows downwards and shrinks upwards.
    72                         \begin{itemize}
    73 
    74                         If an allocator uses mmap area, it can only return extra memory back to OS if the whole page is free i.e. no chunk on the page is in-use of the application. Even if one chunk on the whole page is currently in-use of the application, the allocator has to retain the whole page.
    75 
    76                         If an allocator uses the heap area, it can only return the continous free memory at the end of the heap area that is currently in allocator's possession as heap area shrinks upwards. If there are free bits of memory in-between chunks of memory that are currently in-use of the application, the allocator can not return these free bits.
    77 
    78 *** FIX ME: Insert a figure of above scenrio with explanation
    79 
    80                         Even if the entire heap area is free except one small chunk at the end of heap area that is being used by the application, the allocator cannot return the free heap area back to the OS as it is not a continous region at the end of heap area.
    81 
    82 *** FIX ME: Insert a figure of above scenrio with explanation
    83 
    84                         Such scenerios cause external fragmentation but it is out of the allocator's control and depend on application's usage pattern.
    85 
    86                 /subsubsection Internal Memory Management
    87                 Allocators such as je-malloc (FIX ME: insert reference) pro-actively get some memory from the OS and divide it into chunks of certain sizes that can be used in-future to fulfill application's request. This causes memory overhead as these chunks are made before application's request. There is also the possibility that an application may not even request memory of these sizes during their whole life-time.
    88 
    89 *** FIX ME: Insert a figure of above scenrio with explanation
    90 
    91                 Allocators such as rp-malloc (FIX ME: insert reference) maintain lists or blocks of sized memory segments that is freed by the application for future use. These lists are maintained without any guarantee that application will even request these sizes again.
    92 
    93                 Such tactics are usually used to gain speed as allocator will not have to get raw memory from OS and manage it at the time of application's request but they do cause memory overhead.
    94 
    95         Fragmentation and managed sized chunks of free memory can lead to Heap Blowup as the allocator may not be able to use the fragments or sized free chunks of free memory to fulfill application's requests of other sizes.
    96 
    97         /subsection Speed
    98         When it comes to performance evaluation of any piece of software, its runtime is usually the first thing that is evaluated. The same is true for memory allocators but, in case of memory allocators, speed does not only mean the runtime of memory allocator's routines but there are other factors too.
    99 
    100                 /subsubsection Runtime Speed
    101                 Low runtime is the main goal of a memory allocator when it comes it proving its speed. Runtime is the time that it takes for a routine of memory allocator to complete its execution. As mentioned in (FIX ME: refernce to routines' list), there four basic routines that are used in memory allocation. Ideally, each routine of a memory allocator should be fast. Some memory allocator designs use pro-active measures (FIX ME: local refernce) to gain speed when allocating some memory to the application. Some memory allocators do memory allocation faster than memory freeing (FIX ME: graph refernce) while others show similar speed whether memory is allocated or freed.
    102 
    103                 /subsubsection Memory Access Speed
    104                 Runtime speed is not the only speed matrix in memory allocators. The memory that a memory allocator has allocated to the application also needs to be accessible as quick as possible. The application should be able to read/write allocated memory quickly. The allocation method of a memory allocator may introduce some delays when it comes to memory access speed, which is specially important in concurrent applications. Ideally, a memory allocator should allocate all memory on a cache-line to only one thread and no cache-line should be shared among multiple threads. If a memory allocator allocates memory to multple threads on a same cache line, then cache may get invalidated more frequesntly when two different threads running on two different processes will try to read/write the same memory region. On the other hand, if one cache-line is used by only one thread then the cache may get invalidated less frequently. This sharing of one cache-line among multiple threads is called false sharing (FIX ME: cite wasik).
    105 
    106                         /subsubsubsection Active False Sharing
    107                         Active false sharing is the sharing of one cache-line among multiple threads that is caused by memory allocator. It happens when two threads request memory from memory allocator and the allocator allocates memory to both of them on the same cache-line. After that, if the threads are running on different processes who have their own caches and both threads start reading/writing the allocated memory simultanously, their caches will start getting invalidated every time the other thread writes something to the memory. This will cause the application to slow down as the process has to load cache much more frequently.
    108 
    109 *** FIX ME: Insert a figure of above scenrio with explanation
    110 
    111                         /subsubsubsection Passive False Sharing
    112                         Passive false sharing is the kind of false sharing which is caused by the application and not the memory allocator. The memory allocator may preservce passive false sharing in future instead of eradicating it. But, passive false sharing is initiated by the application.
    113 
    114                                 /subsubsubsubsection Program Induced Passive False Sharing
    115                                 Program induced false sharing is completely out of memory allocator's control and is purely caused by the application. When a thread in the application creates multiple objects in the dynamic area and allocator allocates memory for these objects on the same cache-line as the objects are created by the same thread. Passive false sharing will occur if this thread passes one of these objects to another thread but it retains the rest of these objects or it passes some/all of the remaining objects to some third thread(s). Now, one cache-line is shared among multiple threads but it is caused by the application and not the allocator. It is out of allocator's control and has the similar performance impact as Active False Sharing (FIX ME: cite local) if these threads, who are sharing the same cache-line, start reading/writing the given objects simultanously.
     66\paragraph{External Fragmentation}
     67External fragmentation is the free bits of memory between or around chunks of memory that are currently in-use of the application. Segmentation in memory due to application's usage pattern causes external fragmentation. The memory which is part of external fragmentation is completely free as it is neither used by allocator's internal book-keeping nor by the application. Ideally, an allocator should return a segment of memory back to the OS as soon as application frees it. But, this is not always the case. Allocators get memory from OS in one of the two ways.
     68
     69\begin{itemize}
     70\item
     71MMap: an allocator can ask OS for whole pages in mmap area. Then, the allocator segments the page internally and fulfills application's request.
     72\item
     73Heap: an allocator can ask OS for memory in heap area using system calls such as sbrk. Heap are grows downwards and shrinks upwards.
     74\begin{itemize}
     75\item
     76If an allocator uses mmap area, it can only return extra memory back to OS if the whole page is free i.e. no chunk on the page is in-use of the application. Even if one chunk on the whole page is currently in-use of the application, the allocator has to retain the whole page.
     77\item
     78If an allocator uses the heap area, it can only return the continous free memory at the end of the heap area that is currently in allocator's possession as heap area shrinks upwards. If there are free bits of memory in-between chunks of memory that are currently in-use of the application, the allocator can not return these free bits.
     79
     80*** FIX ME: Insert a figure of above scenrio with explanation
     81\item
     82Even if the entire heap area is free except one small chunk at the end of heap area that is being used by the application, the allocator cannot return the free heap area back to the OS as it is not a continous region at the end of heap area.
     83
     84*** FIX ME: Insert a figure of above scenrio with explanation
     85
     86\item
     87Such scenerios cause external fragmentation but it is out of the allocator's control and depend on application's usage pattern.
     88\end{itemize}
     89\end{itemize}
     90
     91\subsubsection{Internal Memory Management}
     92Allocators such as je-malloc (FIX ME: insert reference) pro-actively get some memory from the OS and divide it into chunks of certain sizes that can be used in-future to fulfill application's request. This causes memory overhead as these chunks are made before application's request. There is also the possibility that an application may not even request memory of these sizes during their whole life-time.
     93
     94*** FIX ME: Insert a figure of above scenrio with explanation
     95
     96Allocators such as rp-malloc (FIX ME: insert reference) maintain lists or blocks of sized memory segments that is freed by the application for future use. These lists are maintained without any guarantee that application will even request these sizes again.
     97
     98Such tactics are usually used to gain speed as allocator will not have to get raw memory from OS and manage it at the time of application's request but they do cause memory overhead.
     99
     100Fragmentation and managed sized chunks of free memory can lead to Heap Blowup as the allocator may not be able to use the fragments or sized free chunks of free memory to fulfill application's requests of other sizes.
     101
     102\subsection{Speed}
     103When it comes to performance evaluation of any piece of software, its runtime is usually the first thing that is evaluated. The same is true for memory allocators but, in case of memory allocators, speed does not only mean the runtime of memory allocator's routines but there are other factors too.
     104
     105\subsubsection{Runtime Speed}
     106Low runtime is the main goal of a memory allocator when it comes it proving its speed. Runtime is the time that it takes for a routine of memory allocator to complete its execution. As mentioned in (FIX ME: refernce to routines' list), there four basic routines that are used in memory allocation. Ideally, each routine of a memory allocator should be fast. Some memory allocator designs use pro-active measures (FIX ME: local refernce) to gain speed when allocating some memory to the application. Some memory allocators do memory allocation faster than memory freeing (FIX ME: graph refernce) while others show similar speed whether memory is allocated or freed.
     107
     108\subsubsection{Memory Access Speed}
     109Runtime speed is not the only speed matrix in memory allocators. The memory that a memory allocator has allocated to the application also needs to be accessible as quick as possible. The application should be able to read/write allocated memory quickly. The allocation method of a memory allocator may introduce some delays when it comes to memory access speed, which is specially important in concurrent applications. Ideally, a memory allocator should allocate all memory on a cache-line to only one thread and no cache-line should be shared among multiple threads. If a memory allocator allocates memory to multple threads on a same cache line, then cache may get invalidated more frequesntly when two different threads running on two different processes will try to read/write the same memory region. On the other hand, if one cache-line is used by only one thread then the cache may get invalidated less frequently. This sharing of one cache-line among multiple threads is called false sharing (FIX ME: cite wasik).
     110
     111\paragraph{Active False Sharing}
     112Active false sharing is the sharing of one cache-line among multiple threads that is caused by memory allocator. It happens when two threads request memory from memory allocator and the allocator allocates memory to both of them on the same cache-line. After that, if the threads are running on different processes who have their own caches and both threads start reading/writing the allocated memory simultanously, their caches will start getting invalidated every time the other thread writes something to the memory. This will cause the application to slow down as the process has to load cache much more frequently.
     113
     114*** FIX ME: Insert a figure of above scenrio with explanation
     115
     116\paragraph{Passive False Sharing}
     117Passive false sharing is the kind of false sharing which is caused by the application and not the memory allocator. The memory allocator may preservce passive false sharing in future instead of eradicating it. But, passive false sharing is initiated by the application.
     118
     119\subparagraph{Program Induced Passive False Sharing}
     120Program induced false sharing is completely out of memory allocator's control and is purely caused by the application. When a thread in the application creates multiple objects in the dynamic area and allocator allocates memory for these objects on the same cache-line as the objects are created by the same thread. Passive false sharing will occur if this thread passes one of these objects to another thread but it retains the rest of these objects or it passes some/all of the remaining objects to some third thread(s). Now, one cache-line is shared among multiple threads but it is caused by the application and not the allocator. It is out of allocator's control and has the similar performance impact as Active False Sharing (FIX ME: cite local) if these threads, who are sharing the same cache-line, start reading/writing the given objects simultanously.
    116121
    117122*** FIX ME: Insert a figure of above scenrio 1 with explanation
     
    119124*** FIX ME: Insert a figure of above scenrio 2 with explanation
    120125
    121                                 /subsubsubsubsection Program Induced Allocator Preserved Passive False Sharing
    122                                 Program induced allocator preserved passive false sharing is another interesting case of passive false sharing. Both the application and the allocator are partially responsible for it. It starts the same as Program Induced False Sharing (FIX ME: cite local). Once, an application thread has created multiple dynamic objects on the same cache-line and ditributed these objects among multiple threads causing sharing of one cache-line among multiple threads (Program Induced Passive False Sharing). This kind of false sharing occurs when one of these threads, which got the object on the shared cache-line, frees the passed object then re-allocates another object but the allocator returns the same object (on the shared cache-line) that this thread just freed. Although, the application caused the false sharing to happen in the frst place however, to prevent furthur false sharing, the allocator should have returned the new object on some other cache-line which is only shared by the allocating thread. When it comes to performnce impact, this passive false sharing will slow down the application just like any other kind of false sharing if the threads sharing the cache-line start reading/writing the objects simultanously.
     126\subparagraph{Program Induced Allocator Preserved Passive False Sharing}
     127Program induced allocator preserved passive false sharing is another interesting case of passive false sharing. Both the application and the allocator are partially responsible for it. It starts the same as Program Induced False Sharing (FIX ME: cite local). Once, an application thread has created multiple dynamic objects on the same cache-line and ditributed these objects among multiple threads causing sharing of one cache-line among multiple threads (Program Induced Passive False Sharing). This kind of false sharing occurs when one of these threads, which got the object on the shared cache-line, frees the passed object then re-allocates another object but the allocator returns the same object (on the shared cache-line) that this thread just freed. Although, the application caused the false sharing to happen in the frst place however, to prevent furthur false sharing, the allocator should have returned the new object on some other cache-line which is only shared by the allocating thread. When it comes to performnce impact, this passive false sharing will slow down the application just like any other kind of false sharing if the threads sharing the cache-line start reading/writing the objects simultanously.
     128
    123129
    124130*** FIX ME: Insert a figure of above scenrio with explanation
     
    130136%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    131137
    132 \section Micro Benchmark Suite
     138\section{Micro Benchmark Suite}
    133139The aim of micro benchmark suite is to create a set of programs that can evaluate a memory allocator based on the performance matrices described in (FIX ME: local cite). These programs can be taken as a standard to benchmark an allocator's basic goals. These programs give details of an allocator's memory overhead and speed under a certain allocation pattern. The speed of the allocator is benchmarked in different ways. Similarly, false sharing happening in an allocator is also measured in multiple ways. These benchmarks evalute the allocator under a certain allocation pattern which is configurable and can be changed using a few knobs to benchmark observe an allocator's performance under a desired allocation pattern.
    134140
     
    139145*** FIX ME: Add knobs items after finalize
    140146
    141         /subsection Memory Benchmark
    142         Memory benchmark measures memory overhead of an allocator. It allocates a number of dynamic objects. Then, by reading /self/proc/maps, gets the total memory that the allocator has reuested from the OS. Finally, it calculates the memory head by taking the difference between the memory the allocator has requested from the OS and the memory that program has allocated.
    143         *** FIX ME: Insert a figure of above benchmark with description
    144 
    145                 /subsubsection Relevant Knobs
    146                 *** FIX ME: Insert Relevant Knobs
    147 
    148         /subsection Speed Benchmark
    149         Speed benchmark calculates the runtime speed of an allocator's functions (FIX ME: cite allocator routines). It does by measuring the runtime of allocator routines in two different ways.
    150 
    151                 /subsubsection Speed Time
    152                 The time method does a certain amount of work by calling each routine of the allocator (FIX ME: cite allocator routines) a specific time. It calculates the total time it took to perform this workload. Then, it divides the time it took by the workload and calculates the average time taken by the allocator's routine.
    153                 *** FIX ME: Insert a figure of above benchmark with description
    154 
    155                         /subsubsubsection Relevant Knobs
    156                         *** FIX ME: Insert Relevant Knobs
    157 
    158                 /subsubsection Speed Workload
    159                 The worload method uses the opposite approach. It calls the allocator's routines for a specific amount of time and measures how much work was done during that time. Then, similar to the time method, it divides the time by the workload done during that time and calculates the average time taken by the allocator's routine.
    160                 *** FIX ME: Insert a figure of above benchmark with description
    161 
    162                         /subsubsubsection Relevant Knobs
    163                         *** FIX ME: Insert Relevant Knobs
    164 
    165         /subsection Cache Scratch
    166         Cache Scratch benchmark measures program induced allocator preserved passive false sharing (FIX ME CITE) in an allocator. It does so in two ways.
    167 
    168                 /subsubsection Cache Scratch Time
    169                 Cache Scratch Time allocates dynamic objects. Then, it benchmarks program induced allocator preserved passive false sharing (FIX ME CITE) in an allocator by measuring the time it takes to read/write these objects.
    170                 *** FIX ME: Insert a figure of above benchmark with description
    171 
    172                         /subsubsubsection Relevant Knobs
    173                         *** FIX ME: Insert Relevant Knobs
    174 
    175                 /subsubsection Cache Scratch Layout
    176                 Cache Scratch Layout also allocates dynamic objects. Then, it benchmarks program induced allocator preserved passive false sharing (FIX ME CITE) by using heap addresses returned by the allocator. It calculates how many objects were allocated to different threads on the same cache line.
    177                 *** FIX ME: Insert a figure of above benchmark with description
    178 
    179                         /subsubsubsection Relevant Knobs
    180                         *** FIX ME: Insert Relevant Knobs
    181 
    182         /subsection Cache Thrash
    183         Cache Thrash benchmark measures allocator induced passive false sharing (FIX ME CITE) in an allocator. It also does so in two ways.
    184 
    185                 /subsubsection Cache Thrash Time
    186                 Cache Thrash Time allocates dynamic objects. Then, it benchmarks allocator induced false sharing (FIX ME CITE) in an allocator by measuring the time it takes to read/write these objects.
    187                 *** FIX ME: Insert a figure of above benchmark with description
    188 
    189                         /subsubsubsection Relevant Knobs
    190                         *** FIX ME: Insert Relevant Knobs
    191 
    192                 /subsubsection Cache Thrash Layout
    193                 Cache Thrash Layout also allocates dynamic objects. Then, it benchmarks allocator induced false sharing (FIX ME CITE) by using heap addresses returned by the allocator. It calculates how many objects were allocated to different threads on the same cache line.
    194                 *** FIX ME: Insert a figure of above benchmark with description
    195 
    196                         /subsubsubsection Relevant Knobs
    197                         *** FIX ME: Insert Relevant Knobs
    198 
    199 /section Results
     147\subsection{Memory Benchmark}
     148Memory benchmark measures memory overhead of an allocator. It allocates a number of dynamic objects. Then, by reading /self/proc/maps, gets the total memory that the allocator has reuested from the OS. Finally, it calculates the memory head by taking the difference between the memory the allocator has requested from the OS and the memory that program has allocated.
     149*** FIX ME: Insert a figure of above benchmark with description
     150
     151\subsubsection{Relevant Knobs}
     152*** FIX ME: Insert Relevant Knobs
     153
     154\subsection{Speed Benchmark}
     155Speed benchmark calculates the runtime speed of an allocator's functions (FIX ME: cite allocator routines). It does by measuring the runtime of allocator routines in two different ways.
     156
     157\subsubsection{Speed Time}
     158The time method does a certain amount of work by calling each routine of the allocator (FIX ME: cite allocator routines) a specific time. It calculates the total time it took to perform this workload. Then, it divides the time it took by the workload and calculates the average time taken by the allocator's routine.
     159*** FIX ME: Insert a figure of above benchmark with description
     160
     161\paragraph{Relevant Knobs}
     162*** FIX ME: Insert Relevant Knobs
     163
     164\subsubsection{Speed Workload}
     165The worload method uses the opposite approach. It calls the allocator's routines for a specific amount of time and measures how much work was done during that time. Then, similar to the time method, it divides the time by the workload done during that time and calculates the average time taken by the allocator's routine.
     166*** FIX ME: Insert a figure of above benchmark with description
     167
     168\paragraph{Relevant Knobs}
     169*** FIX ME: Insert Relevant Knobs
     170
     171\subsection{Cache Scratch}
     172Cache Scratch benchmark measures program induced allocator preserved passive false sharing (FIX ME CITE) in an allocator. It does so in two ways.
     173
     174\subsubsection{Cache Scratch Time}
     175Cache Scratch Time allocates dynamic objects. Then, it benchmarks program induced allocator preserved passive false sharing (FIX ME CITE) in an allocator by measuring the time it takes to read/write these objects.
     176*** FIX ME: Insert a figure of above benchmark with description
     177
     178\paragraph{Relevant Knobs}
     179*** FIX ME: Insert Relevant Knobs
     180
     181\subsubsection{Cache Scratch Layout}
     182Cache Scratch Layout also allocates dynamic objects. Then, it benchmarks program induced allocator preserved passive false sharing (FIX ME CITE) by using heap addresses returned by the allocator. It calculates how many objects were allocated to different threads on the same cache line.
     183*** FIX ME: Insert a figure of above benchmark with description
     184
     185\paragraph{Relevant Knobs}
     186*** FIX ME: Insert Relevant Knobs
     187
     188\subsection{Cache Thrash}
     189Cache Thrash benchmark measures allocator induced passive false sharing (FIX ME CITE) in an allocator. It also does so in two ways.
     190
     191\subsubsection{Cache Thrash Time}
     192Cache Thrash Time allocates dynamic objects. Then, it benchmarks allocator induced false sharing (FIX ME CITE) in an allocator by measuring the time it takes to read/write these objects.
     193*** FIX ME: Insert a figure of above benchmark with description
     194
     195\paragraph{Relevant Knobs}
     196*** FIX ME: Insert Relevant Knobs
     197
     198\subsubsection{Cache Thrash Layout}
     199Cache Thrash Layout also allocates dynamic objects. Then, it benchmarks allocator induced false sharing (FIX ME CITE) by using heap addresses returned by the allocator. It calculates how many objects were allocated to different threads on the same cache line.
     200*** FIX ME: Insert a figure of above benchmark with description
     201
     202\paragraph{Relevant Knobs}
     203*** FIX ME: Insert Relevant Knobs
     204
     205\section{Results}
    200206*** FIX ME: add configuration details of memory allocators
    201207
    202         /subsection Memory Benchmark
    203 
    204                 /subsubsection Relevant Knobs
    205 
    206         /subsection Speed Benchmark
    207 
    208                 /subsubsection Speed Time
    209 
    210                         /subsubsubsection Relevant Knobs
    211 
    212                 /subsubsection Speed Workload
    213 
    214                         /subsubsubsection Relevant Knobs
    215 
    216         /subsection Cache Scratch
    217 
    218                 /subsubsection Cache Scratch Time
    219 
    220                         /subsubsubsection Relevant Knobs
    221 
    222                 /subsubsection Cache Scratch Layout
    223 
    224                         /subsubsubsection Relevant Knobs
    225 
    226         /subsection Cache Thrash
    227 
    228                 /subsubsection Cache Thrash Time
    229 
    230                         /subsubsubsection Relevant Knobs
    231 
    232                 /subsubsection Cache Thrash Layout
    233 
    234                         /subsubsubsection Relevant Knobs
     208\subsection{Memory Benchmark}
     209
     210\subsubsection{Relevant Knobs}
     211
     212\subsection{Speed Benchmark}
     213
     214\subsubsection{Speed Time}
     215
     216\paragraph{Relevant Knobs}
     217
     218\subsubsection{Speed Workload}
     219
     220\paragraph{Relevant Knobs}
     221
     222\subsection{Cache Scratch}
     223
     224\subsubsection{Cache Scratch Time}
     225
     226\paragraph{Relevant Knobs}
     227
     228\subsubsection{Cache Scratch Layout}
     229
     230\paragraph{Relevant Knobs}
     231
     232\subsection{Cache Thrash}
     233
     234\subsubsection{Cache Thrash Time}
     235
     236\paragraph{Relevant Knobs}
     237
     238\subsubsection{Cache Thrash Layout}
     239
     240\paragraph{Relevant Knobs}
Note: See TracChangeset for help on using the changeset viewer.