1 | \chapter{Performance}
|
---|
2 | \label{c:Performance}
|
---|
3 |
|
---|
4 | This chapter uses the micro-benchmarks from \VRef[Chapter]{s:Benchmarks} to test a number of current memory allocators, including llheap.
|
---|
5 | The goal is to see if llheap is competitive with the currently popular memory allocators.
|
---|
6 |
|
---|
7 |
|
---|
8 | \section{Machine Specification}
|
---|
9 |
|
---|
10 | The performance experiments were run on two different multi-core architectures (x64 and ARM) to determine if there is consistency across platforms:
|
---|
11 | \begin{itemize}
|
---|
12 | \item
|
---|
13 | \textbf{Algol} Huawei ARM TaiShan 2280 V2 Kunpeng 920, 24-core socket $\times$ 4, 2.6 GHz, GCC version 9.4.0
|
---|
14 | \item
|
---|
15 | \textbf{Nasus} AMD EPYC 7662, 64-core socket $\times$ 2, 2.0 GHz, GCC version 9.3.0
|
---|
16 | \end{itemize}
|
---|
17 |
|
---|
18 |
|
---|
19 | \section{Existing Memory Allocators}
|
---|
20 | \label{sec:curAllocatorSec}
|
---|
21 |
|
---|
22 | With dynamic allocation being an important feature of C, there are many stand-alone memory allocators that have been designed for different purposes.
|
---|
23 | For this thesis, 7 of the most popular and widely used memory allocators were selected for comparison, along with llheap.
|
---|
24 |
|
---|
25 | \paragraph{llheap (\textsf{llh})}
|
---|
26 | is the thread-safe allocator from \VRef[Chapter]{c:Allocator}
|
---|
27 | \\
|
---|
28 | \textbf{Version:} 1.0
|
---|
29 | \textbf{Configuration:} Compiled with dynamic linking, but without statistics or debugging.\\
|
---|
30 | \textbf{Compilation command:} @make@
|
---|
31 |
|
---|
32 | \paragraph{glibc (\textsf{glc})}
|
---|
33 | \cite{glibc} is the default glibc thread-safe allocator.
|
---|
34 | \\
|
---|
35 | \textbf{Version:} Ubuntu GLIBC 2.31-0ubuntu9.7 2.31\\
|
---|
36 | \textbf{Configuration:} Compiled by Ubuntu 20.04.\\
|
---|
37 | \textbf{Compilation command:} N/A
|
---|
38 |
|
---|
39 | \paragraph{dlmalloc (\textsf{dl})}
|
---|
40 | \cite{dlmalloc} is a thread-safe allocator that is single threaded and single heap.
|
---|
41 | It maintains free-lists of different sizes to store freed dynamic memory.
|
---|
42 | \\
|
---|
43 | \textbf{Version:} 2.8.6\\
|
---|
44 | \textbf{Configuration:} Compiled with preprocessor @USE_LOCKS@.\\
|
---|
45 | \textbf{Compilation command:} @gcc -g3 -O3 -Wall -Wextra -fno-builtin-malloc -fno-builtin-calloc@ @-fno-builtin-realloc -fno-builtin-free -fPIC -shared -DUSE_LOCKS -o libdlmalloc.so malloc-2.8.6.c@
|
---|
46 |
|
---|
47 | \paragraph{hoard (\textsf{hrd})}
|
---|
48 | \cite{hoard} is a thread-safe allocator that is multi-threaded and uses a heap layer framework. It has per-thread heaps that have thread-local free-lists, and a global shared heap.
|
---|
49 | \\
|
---|
50 | \textbf{Version:} 3.13\\
|
---|
51 | \textbf{Configuration:} Compiled with hoard's default configurations and @Makefile@.\\
|
---|
52 | \textbf{Compilation command:} @make all@
|
---|
53 |
|
---|
54 | \paragraph{jemalloc (\textsf{je})}
|
---|
55 | \cite{jemalloc} is a thread-safe allocator that uses multiple arenas. Each thread is assigned an arena.
|
---|
56 | Each arena has chunks that contain contagious memory regions of same size. An arena has multiple chunks that contain regions of multiple sizes.
|
---|
57 | \\
|
---|
58 | \textbf{Version:} 5.2.1\\
|
---|
59 | \textbf{Configuration:} Compiled with jemalloc's default configurations and @Makefile@.\\
|
---|
60 | \textbf{Compilation command:} @autogen.sh; configure; make; make install@
|
---|
61 |
|
---|
62 | \paragraph{ptmalloc3 (\textsf{pt3})}
|
---|
63 | \cite{ptmalloc3} is a modification of dlmalloc.
|
---|
64 | It is a thread-safe multi-threaded memory allocator that uses multiple heaps.
|
---|
65 | ptmalloc3 heap has similar design to dlmalloc's heap.
|
---|
66 | \\
|
---|
67 | \textbf{Version:} 1.8\\
|
---|
68 | \textbf{Configuration:} Compiled with ptmalloc3's @Makefile@ using option ``linux-shared''.\\
|
---|
69 | \textbf{Compilation command:} @make linux-shared@
|
---|
70 |
|
---|
71 | \paragraph{rpmalloc (\textsf{rp})}
|
---|
72 | \cite{rpmalloc} is a thread-safe allocator that is multi-threaded and uses per-thread heap.
|
---|
73 | Each heap has multiple size-classes and each size-class contains memory regions of the relevant size.
|
---|
74 | \\
|
---|
75 | \textbf{Version:} 1.4.1\\
|
---|
76 | \textbf{Configuration:} Compiled with rpmalloc's default configurations and ninja build system.\\
|
---|
77 | \textbf{Compilation command:} @python3 configure.py; ninja@
|
---|
78 |
|
---|
79 | \paragraph{tbb malloc (\textsf{tbb})}
|
---|
80 | \cite{tbbmalloc} is a thread-safe allocator that is multi-threaded and uses a private heap for each thread.
|
---|
81 | Each private-heap has multiple bins of different sizes. Each bin contains free regions of the same size.
|
---|
82 | \\
|
---|
83 | \textbf{Version:} intel tbb 2020 update 2, tbb\_interface\_version == 11102\\
|
---|
84 | \textbf{Configuration:} Compiled with tbbmalloc's default configurations and @Makefile@.\\
|
---|
85 | \textbf{Compilation command:} @make@
|
---|
86 |
|
---|
87 | % \section{Experiment Environment}
|
---|
88 | % We used our micro benchmark suite (FIX ME: cite mbench) to evaluate these memory allocators \ref{sec:curAllocatorSec} and our own memory allocator uHeap \ref{sec:allocatorSec}.
|
---|
89 |
|
---|
90 | \section{Experiments}
|
---|
91 |
|
---|
92 | Each micro-benchmark is configured and run with each of the allocators,
|
---|
93 | The less time an allocator takes to complete a benchmark the better so lower in the graphs is better, except for the Memory micro-benchmark graphs.
|
---|
94 | All graphs use log scale on the Y-axis, except for the Memory micro-benchmark (see \VRef{s:MemoryMicroBenchmark}).
|
---|
95 |
|
---|
96 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
---|
97 | %% CHURN
|
---|
98 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
---|
99 |
|
---|
100 | \subsection{Churn Micro-Benchmark}
|
---|
101 |
|
---|
102 | Churn tests allocators for speed under intensive dynamic memory usage (see \VRef{s:ChurnBenchmark}).
|
---|
103 | This experiment was run with following configurations:
|
---|
104 | \begin{description}[itemsep=0pt,parsep=0pt]
|
---|
105 | \item[thread:]
|
---|
106 | 1, 2, 4, 8, 16, 32, 48
|
---|
107 | \item[spots:]
|
---|
108 | 16
|
---|
109 | \item[obj:]
|
---|
110 | 100,000
|
---|
111 | \item[max:]
|
---|
112 | 500
|
---|
113 | \item[min:]
|
---|
114 | 50
|
---|
115 | \item[step:]
|
---|
116 | 50
|
---|
117 | \item[distro:]
|
---|
118 | fisher
|
---|
119 | \end{description}
|
---|
120 |
|
---|
121 | % -maxS : 500
|
---|
122 | % -minS : 50
|
---|
123 | % -stepS : 50
|
---|
124 | % -distroS : fisher
|
---|
125 | % -objN : 100000
|
---|
126 | % -cSpots : 16
|
---|
127 | % -threadN : 1, 2, 4, 8, 16
|
---|
128 |
|
---|
129 | \VRef[Figure]{fig:churn} shows the results for algol and nasus.
|
---|
130 | The X-axis shows the number of threads;
|
---|
131 | the Y-axis shows the total experiment time.
|
---|
132 | Each allocator's performance for each thread is shown in different colors.
|
---|
133 |
|
---|
134 | \begin{figure}
|
---|
135 | \centering
|
---|
136 | \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/churn} }
|
---|
137 | \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/churn} }
|
---|
138 | \caption{Churn}
|
---|
139 | \label{fig:churn}
|
---|
140 | \end{figure}
|
---|
141 |
|
---|
142 | \paragraph{Assessment}
|
---|
143 | All allocators did well in this micro-benchmark, except for \textsf{dl} on the ARM.
|
---|
144 | \textsf{dl}'s is the slowest, indicating some small bottleneck with respect to the other allocators.
|
---|
145 | \textsf{je} is the fastest, with only a small benefit over the other allocators.
|
---|
146 | % llheap is slightly slower because it uses ownership, where many of the allocations have remote frees, which requires locking.
|
---|
147 | % When llheap is compiled without ownership, its performance is the same as the other allocators (not shown).
|
---|
148 |
|
---|
149 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
---|
150 | %% THRASH
|
---|
151 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
---|
152 |
|
---|
153 | \subsection{Cache Thrash}
|
---|
154 | \label{sec:cache-thrash-perf}
|
---|
155 |
|
---|
156 | Thrash tests memory allocators for active false sharing (see \VRef{sec:benchThrashSec}).
|
---|
157 | This experiment was run with following configurations:
|
---|
158 | \begin{description}[itemsep=0pt,parsep=0pt]
|
---|
159 | \item[threads:]
|
---|
160 | 1, 2, 4, 8, 16, 32, 48
|
---|
161 | \item[iterations:]
|
---|
162 | 1,000
|
---|
163 | \item[cacheRW:]
|
---|
164 | 1,000,000
|
---|
165 | \item[size:]
|
---|
166 | 1
|
---|
167 | \end{description}
|
---|
168 |
|
---|
169 | % * Each allocator was tested for its performance across different number of threads.
|
---|
170 | % Experiment was repeated for each allocator for 1, 2, 4, 8, and 16 threads by setting the configuration -threadN.
|
---|
171 |
|
---|
172 | \VRef[Figure]{fig:cacheThrash} shows the results for algol and nasus.
|
---|
173 | The X-axis shows the number of threads;
|
---|
174 | the Y-axis shows the total experiment time.
|
---|
175 | Each allocator's performance for each thread is shown in different colors.
|
---|
176 |
|
---|
177 | \begin{figure}
|
---|
178 | \centering
|
---|
179 | \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/cache_thrash_0-thrash} }
|
---|
180 | \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/cache_thrash_0-thrash} }
|
---|
181 | \caption{Cache Thrash}
|
---|
182 | \label{fig:cacheThrash}
|
---|
183 | \end{figure}
|
---|
184 |
|
---|
185 | \paragraph{Assessment}
|
---|
186 | All allocators did well in this micro-benchmark, except for \textsf{dl} and \textsf{pt3}.
|
---|
187 | \textsf{dl} uses a single heap for all threads so it is understandable that it generates so much active false-sharing.
|
---|
188 | Requests from different threads are dealt with sequentially by the single heap (using a single lock), which can allocate objects to different threads on the same cache line.
|
---|
189 | \textsf{pt3} uses the T:H model, so multiple threads can use one heap, but the active false-sharing is less than \textsf{dl}.
|
---|
190 | The rest of the memory allocators generate little or no active false-sharing.
|
---|
191 |
|
---|
192 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
---|
193 | %% SCRATCH
|
---|
194 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
---|
195 |
|
---|
196 | \subsection{Cache Scratch}
|
---|
197 |
|
---|
198 | Scratch tests memory allocators for program-induced allocator-preserved passive false-sharing (see \VRef{s:CacheScratch}).
|
---|
199 | This experiment was run with following configurations:
|
---|
200 | \begin{description}[itemsep=0pt,parsep=0pt]
|
---|
201 | \item[threads:]
|
---|
202 | 1, 2, 4, 8, 16, 32, 48
|
---|
203 | \item[iterations:]
|
---|
204 | 1,000
|
---|
205 | \item[cacheRW:]
|
---|
206 | 1,000,000
|
---|
207 | \item[size:]
|
---|
208 | 1
|
---|
209 | \end{description}
|
---|
210 |
|
---|
211 | % * Each allocator was tested for its performance across different number of threads.
|
---|
212 | % Experiment was repeated for each allocator for 1, 2, 4, 8, and 16 threads by setting the configuration -threadN.
|
---|
213 |
|
---|
214 | \VRef[Figure]{fig:cacheScratch} shows the results for algol and nasus.
|
---|
215 | The X-axis shows the number of threads;
|
---|
216 | the Y-axis shows the total experiment time.
|
---|
217 | Each allocator's performance for each thread is shown in different colors.
|
---|
218 |
|
---|
219 | \begin{figure}
|
---|
220 | \centering
|
---|
221 | \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/cache_scratch_0-scratch} }
|
---|
222 | \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/cache_scratch_0-scratch} }
|
---|
223 | \caption{Cache Scratch}
|
---|
224 | \label{fig:cacheScratch}
|
---|
225 | \end{figure}
|
---|
226 |
|
---|
227 | \paragraph{Assessment}
|
---|
228 | This micro-benchmark divides the allocators into two groups.
|
---|
229 | First is the high-performer group: \textsf{llh}, \textsf{je}, and \textsf{rp}.
|
---|
230 | These memory allocators generate little or no passive false-sharing and their performance difference is negligible.
|
---|
231 | Second is the low-performer group, which includes the rest of the memory allocators.
|
---|
232 | These memory allocators have significant program-induced passive false-sharing, where \textsf{hrd}'s is the worst performing allocator.
|
---|
233 | All of the allocators in this group are sharing heaps among threads at some level.
|
---|
234 |
|
---|
235 | Interestingly, allocators such as \textsf{hrd} and \textsf{glc} performed well in micro-benchmark cache thrash (see \VRef{sec:cache-thrash-perf}), but, these allocators are among the low performers in the cache scratch.
|
---|
236 | It suggests these allocators do not actively produce false-sharing, but preserve program-induced passive false sharing.
|
---|
237 |
|
---|
238 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
---|
239 | %% SPEED
|
---|
240 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
---|
241 |
|
---|
242 | \subsection{Speed Micro-Benchmark}
|
---|
243 |
|
---|
244 | Speed tests memory allocators for runtime latency (see \VRef{s:SpeedMicroBenchmark}).
|
---|
245 | This experiment was run with following configurations:
|
---|
246 | \begin{description}[itemsep=0pt,parsep=0pt]
|
---|
247 | \item[max:]
|
---|
248 | 500
|
---|
249 | \item[min:]
|
---|
250 | 50
|
---|
251 | \item[step:]
|
---|
252 | 50
|
---|
253 | \item[distro:]
|
---|
254 | fisher
|
---|
255 | \item[objects:]
|
---|
256 | 100,000
|
---|
257 | \item[workers:]
|
---|
258 | 1, 2, 4, 8, 16, 32, 48
|
---|
259 | \end{description}
|
---|
260 |
|
---|
261 | % -maxS : 500
|
---|
262 | % -minS : 50
|
---|
263 | % -stepS : 50
|
---|
264 | % -distroS : fisher
|
---|
265 | % -objN : 1000000
|
---|
266 | % -threadN : \{ 1, 2, 4, 8, 16 \} *
|
---|
267 |
|
---|
268 | %* Each allocator was tested for its performance across different number of threads.
|
---|
269 | %Experiment was repeated for each allocator for 1, 2, 4, 8, and 16 threads by setting the configuration -threadN.
|
---|
270 |
|
---|
271 | \VRefrange[Figures]{fig:speed-3-malloc}{fig:speed-14-malloc-calloc-realloc-free} show 12 figures, one figure for each chain of the speed benchmark.
|
---|
272 | The X-axis shows the number of threads;
|
---|
273 | the Y-axis shows the total experiment time.
|
---|
274 | Each allocator's performance for each thread is shown in different colors.
|
---|
275 |
|
---|
276 | \begin{itemize}
|
---|
277 | \item \VRef[Figure]{fig:speed-3-malloc} shows results for chain: malloc
|
---|
278 | \item \VRef[Figure]{fig:speed-4-realloc} shows results for chain: realloc
|
---|
279 | \item \VRef[Figure]{fig:speed-5-free} shows results for chain: free
|
---|
280 | \item \VRef[Figure]{fig:speed-6-calloc} shows results for chain: calloc
|
---|
281 | \item \VRef[Figure]{fig:speed-7-malloc-free} shows results for chain: malloc-free
|
---|
282 | \item \VRef[Figure]{fig:speed-8-realloc-free} shows results for chain: realloc-free
|
---|
283 | \item \VRef[Figure]{fig:speed-9-calloc-free} shows results for chain: calloc-free
|
---|
284 | \item \VRef[Figure]{fig:speed-10-malloc-realloc} shows results for chain: malloc-realloc
|
---|
285 | \item \VRef[Figure]{fig:speed-11-calloc-realloc} shows results for chain: calloc-realloc
|
---|
286 | \item \VRef[Figure]{fig:speed-12-malloc-realloc-free} shows results for chain: malloc-realloc-free
|
---|
287 | \item \VRef[Figure]{fig:speed-13-calloc-realloc-free} shows results for chain: calloc-realloc-free
|
---|
288 | \item \VRef[Figure]{fig:speed-14-malloc-calloc-realloc-free} shows results for chain: malloc-realloc-free-calloc
|
---|
289 | \end{itemize}
|
---|
290 |
|
---|
291 | \paragraph{Assessment}
|
---|
292 | This micro-benchmark divides the allocators into two groups: with and without @calloc@.
|
---|
293 | @calloc@ uses @memset@ to set the allocated memory to zero, which dominates the cost of the allocation chain (large performance increase) and levels performance across the allocators.
|
---|
294 | But the difference among the allocators in a @calloc@ chain still gives an idea of their relative performance.
|
---|
295 |
|
---|
296 | All allocators did well in this micro-benchmark across all allocation chains, except for \textsf{dl}, \textsf{pt3}, and \textsf{hrd}.
|
---|
297 | Again, the low-performing allocators are sharing heaps among threads, so the contention causes performance increases with increasing numbers of threads.
|
---|
298 | Furthermore, chains with @free@ can trigger coalescing, which slows the fast path.
|
---|
299 | The high-performing allocators all illustrate low latency across the allocation chains, \ie there are no performance spikes as the chain lengths, that might be caused by contention and/or coalescing.
|
---|
300 | Low latency is important for applications that are sensitive to unknown execution delays.
|
---|
301 |
|
---|
302 | %speed-3-malloc.eps
|
---|
303 | \begin{figure}
|
---|
304 | \centering
|
---|
305 | \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/speed-3-malloc} }
|
---|
306 | \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/speed-3-malloc} }
|
---|
307 | \caption{Speed benchmark chain: malloc}
|
---|
308 | \label{fig:speed-3-malloc}
|
---|
309 | \end{figure}
|
---|
310 |
|
---|
311 | %speed-4-realloc.eps
|
---|
312 | \begin{figure}
|
---|
313 | \centering
|
---|
314 | \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/speed-4-realloc} }
|
---|
315 | \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/speed-4-realloc} }
|
---|
316 | \caption{Speed benchmark chain: realloc}
|
---|
317 | \label{fig:speed-4-realloc}
|
---|
318 | \end{figure}
|
---|
319 |
|
---|
320 | %speed-5-free.eps
|
---|
321 | \begin{figure}
|
---|
322 | \centering
|
---|
323 | \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/speed-5-free} }
|
---|
324 | \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/speed-5-free} }
|
---|
325 | \caption{Speed benchmark chain: free}
|
---|
326 | \label{fig:speed-5-free}
|
---|
327 | \end{figure}
|
---|
328 |
|
---|
329 | %speed-6-calloc.eps
|
---|
330 | \begin{figure}
|
---|
331 | \centering
|
---|
332 | \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/speed-6-calloc} }
|
---|
333 | \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/speed-6-calloc} }
|
---|
334 | \caption{Speed benchmark chain: calloc}
|
---|
335 | \label{fig:speed-6-calloc}
|
---|
336 | \end{figure}
|
---|
337 |
|
---|
338 | %speed-7-malloc-free.eps
|
---|
339 | \begin{figure}
|
---|
340 | \centering
|
---|
341 | \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/speed-7-malloc-free} }
|
---|
342 | \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/speed-7-malloc-free} }
|
---|
343 | \caption{Speed benchmark chain: malloc-free}
|
---|
344 | \label{fig:speed-7-malloc-free}
|
---|
345 | \end{figure}
|
---|
346 |
|
---|
347 | %speed-8-realloc-free.eps
|
---|
348 | \begin{figure}
|
---|
349 | \centering
|
---|
350 | \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/speed-8-realloc-free} }
|
---|
351 | \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/speed-8-realloc-free} }
|
---|
352 | \caption{Speed benchmark chain: realloc-free}
|
---|
353 | \label{fig:speed-8-realloc-free}
|
---|
354 | \end{figure}
|
---|
355 |
|
---|
356 | %speed-9-calloc-free.eps
|
---|
357 | \begin{figure}
|
---|
358 | \centering
|
---|
359 | \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/speed-9-calloc-free} }
|
---|
360 | \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/speed-9-calloc-free} }
|
---|
361 | \caption{Speed benchmark chain: calloc-free}
|
---|
362 | \label{fig:speed-9-calloc-free}
|
---|
363 | \end{figure}
|
---|
364 |
|
---|
365 | %speed-10-malloc-realloc.eps
|
---|
366 | \begin{figure}
|
---|
367 | \centering
|
---|
368 | \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/speed-10-malloc-realloc} }
|
---|
369 | \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/speed-10-malloc-realloc} }
|
---|
370 | \caption{Speed benchmark chain: malloc-realloc}
|
---|
371 | \label{fig:speed-10-malloc-realloc}
|
---|
372 | \end{figure}
|
---|
373 |
|
---|
374 | %speed-11-calloc-realloc.eps
|
---|
375 | \begin{figure}
|
---|
376 | \centering
|
---|
377 | \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/speed-11-calloc-realloc} }
|
---|
378 | \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/speed-11-calloc-realloc} }
|
---|
379 | \caption{Speed benchmark chain: calloc-realloc}
|
---|
380 | \label{fig:speed-11-calloc-realloc}
|
---|
381 | \end{figure}
|
---|
382 |
|
---|
383 | %speed-12-malloc-realloc-free.eps
|
---|
384 | \begin{figure}
|
---|
385 | \centering
|
---|
386 | \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/speed-12-malloc-realloc-free} }
|
---|
387 | \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/speed-12-malloc-realloc-free} }
|
---|
388 | \caption{Speed benchmark chain: malloc-realloc-free}
|
---|
389 | \label{fig:speed-12-malloc-realloc-free}
|
---|
390 | \end{figure}
|
---|
391 |
|
---|
392 | %speed-13-calloc-realloc-free.eps
|
---|
393 | \begin{figure}
|
---|
394 | \centering
|
---|
395 | \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/speed-13-calloc-realloc-free} }
|
---|
396 | \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/speed-13-calloc-realloc-free} }
|
---|
397 | \caption{Speed benchmark chain: calloc-realloc-free}
|
---|
398 | \label{fig:speed-13-calloc-realloc-free}
|
---|
399 | \end{figure}
|
---|
400 |
|
---|
401 | %speed-14-{m,c,re}alloc-free.eps
|
---|
402 | \begin{figure}
|
---|
403 | \centering
|
---|
404 | \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/speed-14-m-c-re-alloc-free} }
|
---|
405 | \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/speed-14-m-c-re-alloc-free} }
|
---|
406 | \caption{Speed benchmark chain: malloc-calloc-realloc-free}
|
---|
407 | \label{fig:speed-14-malloc-calloc-realloc-free}
|
---|
408 | \end{figure}
|
---|
409 |
|
---|
410 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
---|
411 | %% MEMORY
|
---|
412 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
---|
413 |
|
---|
414 | \newpage
|
---|
415 | \subsection{Memory Micro-Benchmark}
|
---|
416 | \label{s:MemoryMicroBenchmark}
|
---|
417 |
|
---|
418 | This experiment is run with the following two configurations for each allocator.
|
---|
419 | The difference between the two configurations is the number of producers and consumers.
|
---|
420 | Configuration 1 has one producer and one consumer, and configuration 2 has 4 producers, where each producer has 4 consumers.
|
---|
421 |
|
---|
422 | \noindent
|
---|
423 | Configuration 1:
|
---|
424 | \begin{description}[itemsep=0pt,parsep=0pt]
|
---|
425 | \item[producer (K):]
|
---|
426 | 1
|
---|
427 | \item[consumer (M):]
|
---|
428 | 1
|
---|
429 | \item[round:]
|
---|
430 | 100,000
|
---|
431 | \item[max:]
|
---|
432 | 500
|
---|
433 | \item[min:]
|
---|
434 | 50
|
---|
435 | \item[step:]
|
---|
436 | 50
|
---|
437 | \item[distro:]
|
---|
438 | fisher
|
---|
439 | \item[objects (N):]
|
---|
440 | 100,000
|
---|
441 | \end{description}
|
---|
442 |
|
---|
443 | % -threadA : 1
|
---|
444 | % -threadF : 1
|
---|
445 | % -maxS : 500
|
---|
446 | % -minS : 50
|
---|
447 | % -stepS : 50
|
---|
448 | % -distroS : fisher
|
---|
449 | % -objN : 100000
|
---|
450 | % -consumeS: 100000
|
---|
451 |
|
---|
452 | \noindent
|
---|
453 | Configuration 2:
|
---|
454 | \begin{description}[itemsep=0pt,parsep=0pt]
|
---|
455 | \item[producer (K):]
|
---|
456 | 4
|
---|
457 | \item[consumer (M):]
|
---|
458 | 4
|
---|
459 | \item[round:]
|
---|
460 | 100,000
|
---|
461 | \item[max:]
|
---|
462 | 500
|
---|
463 | \item[min:]
|
---|
464 | 50
|
---|
465 | \item[step:]
|
---|
466 | 50
|
---|
467 | \item[distro:]
|
---|
468 | fisher
|
---|
469 | \item[objects (N):]
|
---|
470 | 100,000
|
---|
471 | \end{description}
|
---|
472 |
|
---|
473 | % -threadA : 4
|
---|
474 | % -threadF : 4
|
---|
475 | % -maxS : 500
|
---|
476 | % -minS : 50
|
---|
477 | % -stepS : 50
|
---|
478 | % -distroS : fisher
|
---|
479 | % -objN : 100000
|
---|
480 | % -consumeS: 100000
|
---|
481 |
|
---|
482 | % \begin{table}[b]
|
---|
483 | % \centering
|
---|
484 | % \begin{tabular}{ |c|c|c| }
|
---|
485 | % \hline
|
---|
486 | % Memory Allocator & Configuration 1 Result & Configuration 2 Result\\
|
---|
487 | % \hline
|
---|
488 | % llh & \VRef[Figure]{fig:mem-1-prod-1-cons-100-llh} & \VRef[Figure]{fig:mem-4-prod-4-cons-100-llh}\\
|
---|
489 | % \hline
|
---|
490 | % dl & \VRef[Figure]{fig:mem-1-prod-1-cons-100-dl} & \VRef[Figure]{fig:mem-4-prod-4-cons-100-dl}\\
|
---|
491 | % \hline
|
---|
492 | % glibc & \VRef[Figure]{fig:mem-1-prod-1-cons-100-glc} & \VRef[Figure]{fig:mem-4-prod-4-cons-100-glc}\\
|
---|
493 | % \hline
|
---|
494 | % hoard & \VRef[Figure]{fig:mem-1-prod-1-cons-100-hrd} & \VRef[Figure]{fig:mem-4-prod-4-cons-100-hrd}\\
|
---|
495 | % \hline
|
---|
496 | % je & \VRef[Figure]{fig:mem-1-prod-1-cons-100-je} & \VRef[Figure]{fig:mem-4-prod-4-cons-100-je}\\
|
---|
497 | % \hline
|
---|
498 | % pt3 & \VRef[Figure]{fig:mem-1-prod-1-cons-100-pt3} & \VRef[Figure]{fig:mem-4-prod-4-cons-100-pt3}\\
|
---|
499 | % \hline
|
---|
500 | % rp & \VRef[Figure]{fig:mem-1-prod-1-cons-100-rp} & \VRef[Figure]{fig:mem-4-prod-4-cons-100-rp}\\
|
---|
501 | % \hline
|
---|
502 | % tbb & \VRef[Figure]{fig:mem-1-prod-1-cons-100-tbb} & \VRef[Figure]{fig:mem-4-prod-4-cons-100-tbb}\\
|
---|
503 | % \hline
|
---|
504 | % \end{tabular}
|
---|
505 | % \caption{Memory benchmark results}
|
---|
506 | % \label{table:mem-benchmark-figs}
|
---|
507 | % \end{table}
|
---|
508 | % Table \ref{table:mem-benchmark-figs} shows the list of figures that contain memory benchmark results.
|
---|
509 |
|
---|
510 | \VRefrange[Figures]{fig:mem-1-prod-1-cons-100-llh}{fig:mem-4-prod-4-cons-100-tbb} show 16 figures, two figures for each of the 8 allocators, one for each configuration.
|
---|
511 | Each figure has 2 graphs, one for each experiment environment.
|
---|
512 | Each graph has following 5 subgraphs that show memory usage and statistics throughout the micro-benchmark's lifetime.
|
---|
513 | \begin{itemize}
|
---|
514 | \item \textit{\textbf{current\_req\_mem(B)}} shows the amount of dynamic memory requested and currently in-use of the benchmark.
|
---|
515 | \item \textit{\textbf{heap}}* shows the memory requested by the program (allocator) from the system that lies in the heap (@sbrk@) area.
|
---|
516 | \item \textit{\textbf{mmap\_so}}* shows the memory requested by the program (allocator) from the system that lies in the @mmap@ area.
|
---|
517 | \item \textit{\textbf{mmap}}* shows the memory requested by the program (allocator or shared libraries) from the system that lies in the @mmap@ area.
|
---|
518 | \item \textit{\textbf{total\_dynamic}} shows the total usage of dynamic memory by the benchmark program, which is a sum of \textit{heap}, \textit{mmap}, and \textit{mmap\_so}.
|
---|
519 | \end{itemize}
|
---|
520 | * These statistics are gathered by monitoring a process's @/proc/self/maps@ file.
|
---|
521 |
|
---|
522 | The X-axis shows the time when the memory information is polled.
|
---|
523 | The Y-axis shows the memory usage in bytes.
|
---|
524 |
|
---|
525 | For this experiment, the difference between the memory requested by the benchmark (\textit{current\_req\_mem(B)}) and the memory that the process has received from system (\textit{heap}, \textit{mmap}) should be minimum.
|
---|
526 | This difference is the memory overhead caused by the allocator and shows the level of fragmentation in the allocator.
|
---|
527 |
|
---|
528 | \paragraph{Assessment}
|
---|
529 | First, the differences in the shape of the curves between architectures (top ARM, bottom x64) is small, where the differences are in the amount of memory used.
|
---|
530 | Hence, it is possible to focus on either the top or bottom graph.
|
---|
531 |
|
---|
532 | Second, the heap curve is 0 for four memory allocators: \textsf{hrd}, \textsf{je}, \textsf{pt3}, and \textsf{rp}, indicating these memory allocators only use @mmap@ to get memory from the system and ignore the @sbrk@ area.
|
---|
533 |
|
---|
534 | The total dynamic memory is higher for \textsf{hrd} and \textsf{tbb} than the other allocators.
|
---|
535 | The main reason is the use of superblocks (see \VRef{s:ObjectContainers}) containing objects of the same size.
|
---|
536 | These superblocks are maintained throughout the life of the program.
|
---|
537 |
|
---|
538 | \textsf{pt3} is the only memory allocator where the total dynamic memory goes down in the second half of the program lifetime when the memory is freed by the benchmark program.
|
---|
539 | It makes pt3 the only memory allocator that gives memory back to the operating system as it is freed by the program.
|
---|
540 |
|
---|
541 | % FOR 1 THREAD
|
---|
542 |
|
---|
543 | %mem-1-prod-1-cons-100-llh.eps
|
---|
544 | \begin{figure}
|
---|
545 | \centering
|
---|
546 | \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-llh} }
|
---|
547 | \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-llh} }
|
---|
548 | \caption{Memory benchmark results with Configuration-1 for llh memory allocator}
|
---|
549 | \label{fig:mem-1-prod-1-cons-100-llh}
|
---|
550 | \end{figure}
|
---|
551 |
|
---|
552 | %mem-1-prod-1-cons-100-dl.eps
|
---|
553 | \begin{figure}
|
---|
554 | \centering
|
---|
555 | \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-dl} }
|
---|
556 | \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-dl} }
|
---|
557 | \caption{Memory benchmark results with Configuration-1 for dl memory allocator}
|
---|
558 | \label{fig:mem-1-prod-1-cons-100-dl}
|
---|
559 | \end{figure}
|
---|
560 |
|
---|
561 | %mem-1-prod-1-cons-100-glc.eps
|
---|
562 | \begin{figure}
|
---|
563 | \centering
|
---|
564 | \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-glc} }
|
---|
565 | \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-glc} }
|
---|
566 | \caption{Memory benchmark results with Configuration-1 for glibc memory allocator}
|
---|
567 | \label{fig:mem-1-prod-1-cons-100-glc}
|
---|
568 | \end{figure}
|
---|
569 |
|
---|
570 | %mem-1-prod-1-cons-100-hrd.eps
|
---|
571 | \begin{figure}
|
---|
572 | \centering
|
---|
573 | \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-hrd} }
|
---|
574 | \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-hrd} }
|
---|
575 | \caption{Memory benchmark results with Configuration-1 for hoard memory allocator}
|
---|
576 | \label{fig:mem-1-prod-1-cons-100-hrd}
|
---|
577 | \end{figure}
|
---|
578 |
|
---|
579 | %mem-1-prod-1-cons-100-je.eps
|
---|
580 | \begin{figure}
|
---|
581 | \centering
|
---|
582 | \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-je} }
|
---|
583 | \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-je} }
|
---|
584 | \caption{Memory benchmark results with Configuration-1 for je memory allocator}
|
---|
585 | \label{fig:mem-1-prod-1-cons-100-je}
|
---|
586 | \end{figure}
|
---|
587 |
|
---|
588 | %mem-1-prod-1-cons-100-pt3.eps
|
---|
589 | \begin{figure}
|
---|
590 | \centering
|
---|
591 | \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-pt3} }
|
---|
592 | \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-pt3} }
|
---|
593 | \caption{Memory benchmark results with Configuration-1 for pt3 memory allocator}
|
---|
594 | \label{fig:mem-1-prod-1-cons-100-pt3}
|
---|
595 | \end{figure}
|
---|
596 |
|
---|
597 | %mem-1-prod-1-cons-100-rp.eps
|
---|
598 | \begin{figure}
|
---|
599 | \centering
|
---|
600 | \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-rp} }
|
---|
601 | \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-rp} }
|
---|
602 | \caption{Memory benchmark results with Configuration-1 for rp memory allocator}
|
---|
603 | \label{fig:mem-1-prod-1-cons-100-rp}
|
---|
604 | \end{figure}
|
---|
605 |
|
---|
606 | %mem-1-prod-1-cons-100-tbb.eps
|
---|
607 | \begin{figure}
|
---|
608 | \centering
|
---|
609 | \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-tbb} }
|
---|
610 | \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-tbb} }
|
---|
611 | \caption{Memory benchmark results with Configuration-1 for tbb memory allocator}
|
---|
612 | \label{fig:mem-1-prod-1-cons-100-tbb}
|
---|
613 | \end{figure}
|
---|
614 |
|
---|
615 | % FOR 4 THREADS
|
---|
616 |
|
---|
617 | %mem-4-prod-4-cons-100-llh.eps
|
---|
618 | \begin{figure}
|
---|
619 | \centering
|
---|
620 | \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-llh} }
|
---|
621 | \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-llh} }
|
---|
622 | \caption{Memory benchmark results with Configuration-2 for llh memory allocator}
|
---|
623 | \label{fig:mem-4-prod-4-cons-100-llh}
|
---|
624 | \end{figure}
|
---|
625 |
|
---|
626 | %mem-4-prod-4-cons-100-dl.eps
|
---|
627 | \begin{figure}
|
---|
628 | \centering
|
---|
629 | \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-dl} }
|
---|
630 | \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-dl} }
|
---|
631 | \caption{Memory benchmark results with Configuration-2 for dl memory allocator}
|
---|
632 | \label{fig:mem-4-prod-4-cons-100-dl}
|
---|
633 | \end{figure}
|
---|
634 |
|
---|
635 | %mem-4-prod-4-cons-100-glc.eps
|
---|
636 | \begin{figure}
|
---|
637 | \centering
|
---|
638 | \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-glc} }
|
---|
639 | \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-glc} }
|
---|
640 | \caption{Memory benchmark results with Configuration-2 for glibc memory allocator}
|
---|
641 | \label{fig:mem-4-prod-4-cons-100-glc}
|
---|
642 | \end{figure}
|
---|
643 |
|
---|
644 | %mem-4-prod-4-cons-100-hrd.eps
|
---|
645 | \begin{figure}
|
---|
646 | \centering
|
---|
647 | \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-hrd} }
|
---|
648 | \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-hrd} }
|
---|
649 | \caption{Memory benchmark results with Configuration-2 for hoard memory allocator}
|
---|
650 | \label{fig:mem-4-prod-4-cons-100-hrd}
|
---|
651 | \end{figure}
|
---|
652 |
|
---|
653 | %mem-4-prod-4-cons-100-je.eps
|
---|
654 | \begin{figure}
|
---|
655 | \centering
|
---|
656 | \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-je} }
|
---|
657 | \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-je} }
|
---|
658 | \caption{Memory benchmark results with Configuration-2 for je memory allocator}
|
---|
659 | \label{fig:mem-4-prod-4-cons-100-je}
|
---|
660 | \end{figure}
|
---|
661 |
|
---|
662 | %mem-4-prod-4-cons-100-pt3.eps
|
---|
663 | \begin{figure}
|
---|
664 | \centering
|
---|
665 | \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-pt3} }
|
---|
666 | \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-pt3} }
|
---|
667 | \caption{Memory benchmark results with Configuration-2 for pt3 memory allocator}
|
---|
668 | \label{fig:mem-4-prod-4-cons-100-pt3}
|
---|
669 | \end{figure}
|
---|
670 |
|
---|
671 | %mem-4-prod-4-cons-100-rp.eps
|
---|
672 | \begin{figure}
|
---|
673 | \centering
|
---|
674 | \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-rp} }
|
---|
675 | \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-rp} }
|
---|
676 | \caption{Memory benchmark results with Configuration-2 for rp memory allocator}
|
---|
677 | \label{fig:mem-4-prod-4-cons-100-rp}
|
---|
678 | \end{figure}
|
---|
679 |
|
---|
680 | %mem-4-prod-4-cons-100-tbb.eps
|
---|
681 | \begin{figure}
|
---|
682 | \centering
|
---|
683 | \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-tbb} }
|
---|
684 | \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-tbb} }
|
---|
685 | \caption{Memory benchmark results with Configuration-2 for tbb memory allocator}
|
---|
686 | \label{fig:mem-4-prod-4-cons-100-tbb}
|
---|
687 | \end{figure}
|
---|