source: doc/theses/mubeen_zulfiqar_MMath/performance.tex@ 108f6c32

ADT ast-experimental pthread-emulation qualifiedEnum stuck-waitfor-destruct
Last change on this file since 108f6c32 was 8f94a63, checked in by Peter A. Buhr <pabuhr@…>, 4 years ago

add citations

  • Property mode set to 100644
File size: 26.7 KB
RevLine 
[d286e94d]1\chapter{Performance}
[6978468]2\label{c:Performance}
[080471a]3
[c9136d9]4This chapter uses the micro-benchmarks from \VRef[Chapter]{s:Benchmarks} to test a number of current memory allocators, including llheap.
5The goal is to see if llheap is competitive with the current best memory allocators.
6
7
[028404f]8\section{Machine Specification}
9
[b81ab1c6]10The performance experiments were run on two different multi-core architectures (x64 and ARM) to determine if there is consistency across platforms:
[028404f]11\begin{itemize}
12\item
[c9136d9]13\textbf{Nasus} AMD EPYC 7662, 64-core socket $\times$ 2, 2.0 GHz, GCC version 9.3.0
[028404f]14\item
[c9136d9]15\textbf{Algol} Huawei ARM TaiShan 2280 V2 Kunpeng 920, 24-core socket $\times$ 4, 2.6 GHz, GCC version 9.4.0
[028404f]16\end{itemize}
17
18
[c9136d9]19\section{Existing Memory Allocators}
20\label{sec:curAllocatorSec}
[028404f]21
[c9136d9]22With dynamic allocation being an important feature of C, there are many stand-alone memory allocators that have been designed for different purposes.
[b81ab1c6]23For this thesis, 7 of the most popular and widely used memory allocators were selected for comparison, along with llheap.
[c9136d9]24
[b81ab1c6]25\paragraph{llheap (\textsf{llh})}
26is the thread-safe allocator from \VRef[Chapter]{c:Allocator}
27\\
28\textbf{Version:} 1.0
29\textbf{Configuration:} Compiled with dynamic linking, but without statistics or debugging.\\
30\textbf{Compilation command:} @make@
31
32\paragraph{glibc (\textsf{glc})}
33\cite{glibc} is the default gcc thread-safe allocator.
[ba897d21]34\\
[c9136d9]35\textbf{Version:} Ubuntu GLIBC 2.31-0ubuntu9.7 2.31\\
36\textbf{Configuration:} Compiled by Ubuntu 20.04.\\
37\textbf{Compilation command:} N/A
38
[b81ab1c6]39\paragraph{dlmalloc (\textsf{dl})}
40\cite{dlmalloc} is a thread-safe allocator that is single threaded and single heap.
[c9136d9]41It maintains free-lists of different sizes to store freed dynamic memory.
[ba897d21]42\\
[c9136d9]43\textbf{Version:} 2.8.6\\
44\textbf{Configuration:} Compiled with preprocessor @USE_LOCKS@.\\
45\textbf{Compilation command:} @gcc -g3 -O3 -Wall -Wextra -fno-builtin-malloc -fno-builtin-calloc@ @-fno-builtin-realloc -fno-builtin-free -fPIC -shared -DUSE_LOCKS -o libdlmalloc.so malloc-2.8.6.c@
[028404f]46
[b81ab1c6]47\paragraph{hoard (\textsf{hrd})}
48\cite{hoard} is a thread-safe allocator that is multi-threaded and using a heap layer framework. It has per-thread heaps that have thread-local free-lists, and a global shared heap.
[ba897d21]49\\
[c9136d9]50\textbf{Version:} 3.13\\
51\textbf{Configuration:} Compiled with hoard's default configurations and @Makefile@.\\
52\textbf{Compilation command:} @make all@
[028404f]53
[b81ab1c6]54\paragraph{jemalloc (\textsf{je})}
55\cite{jemalloc} is a thread-safe allocator that uses multiple arenas. Each thread is assigned an arena.
56Each arena has chunks that contain contagious memory regions of same size. An arena has multiple chunks that contain regions of multiple sizes.
[ba897d21]57\\
[c9136d9]58\textbf{Version:} 5.2.1\\
59\textbf{Configuration:} Compiled with jemalloc's default configurations and @Makefile@.\\
60\textbf{Compilation command:} @autogen.sh; configure; make; make install@
[028404f]61
[8f94a63]62\paragraph{ptmalloc3 (\textsf{pt3})}
63\cite{ptmalloc3} is a modification of dlmalloc.
[b81ab1c6]64It is a thread-safe multi-threaded memory allocator that uses multiple heaps.
[8f94a63]65ptmalloc3 heap has similar design to dlmalloc's heap.
[ba897d21]66\\
[c9136d9]67\textbf{Version:} 1.8\\
[8f94a63]68\textbf{Configuration:} Compiled with ptmalloc3's @Makefile@ using option ``linux-shared''.\\
[c9136d9]69\textbf{Compilation command:} @make linux-shared@
[028404f]70
[b81ab1c6]71\paragraph{rpmalloc (\textsf{rp})}
72\cite{rpmalloc} is a thread-safe allocator that is multi-threaded and uses per-thread heap.
73Each heap has multiple size-classes and each size-class contains memory regions of the relevant size.
[ba897d21]74\\
[c9136d9]75\textbf{Version:} 1.4.1\\
76\textbf{Configuration:} Compiled with rpmalloc's default configurations and ninja build system.\\
77\textbf{Compilation command:} @python3 configure.py; ninja@
[028404f]78
[b81ab1c6]79\paragraph{tbb malloc (\textsf{tbb})}
80\cite{tbbmalloc} is a thread-safe allocator that is multi-threaded and uses private heap for each thread.
81Each private-heap has multiple bins of different sizes. Each bin contains free regions of the same size.
[ba897d21]82\\
[c9136d9]83\textbf{Version:} intel tbb 2020 update 2, tbb\_interface\_version == 11102\\
84\textbf{Configuration:} Compiled with tbbmalloc's default configurations and @Makefile@.\\
85\textbf{Compilation command:} @make@
[080471a]86
[c9136d9]87% \section{Experiment Environment}
88% We used our micro benchmark suite (FIX ME: cite mbench) to evaluate these memory allocators \ref{sec:curAllocatorSec} and our own memory allocator uHeap \ref{sec:allocatorSec}.
[080471a]89
[c9136d9]90\section{Experiments}
[b81ab1c6]91
92The each micro-benchmark is configured and run with each of the allocators,
93The less time an allocator takes to complete a benchmark the better, so lower in the graphs is better.
[080471a]94
[ba897d21]95%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
96%% CHURN
97%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
[080471a]98
[c9136d9]99\subsection{Churn Micro-Benchmark}
[080471a]100
[c9136d9]101Churn tests allocators for speed under intensive dynamic memory usage (see \VRef{s:ChurnBenchmark}).
[ba897d21]102This experiment was run with following configurations:
[c9136d9]103\begin{description}[itemsep=0pt,parsep=0pt]
104\item[thread:]
1051, 2, 4, 8, 16
106\item[spots:]
10716
108\item[obj:]
109100,000
110\item[max:]
111500
112\item[min:]
11350
114\item[step:]
11550
116\item[distro:]
117fisher
118\end{description}
119
120% -maxS : 500
121% -minS : 50
122% -stepS : 50
123% -distroS : fisher
124% -objN : 100000
125% -cSpots : 16
126% -threadN : 1, 2, 4, 8, 16
127
128\VRef[Figure]{fig:churn} shows the results for algol and nasus.
[b81ab1c6]129The X-axis shows the number of threads;
130the Y-axis shows the total experiment time.
[c9136d9]131Each allocator's performance for each thread is shown in different colors.
[ba897d21]132
133\begin{figure}
134\centering
[b81ab1c6]135 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/churn} }
136 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/churn} }
[ba897d21]137\caption{Churn}
138\label{fig:churn}
139\end{figure}
140
[b81ab1c6]141All allocators did well in this micro-benchmark, except for \textsf{dl} on the ARM.
142llheap is slightly slower because it uses ownership, where many of the allocations have remote frees, which requires locking.
143When llheap is compiled without ownership, its performance is the same as the other allocators (not shown).
[c9136d9]144
145
[ba897d21]146%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
147%% THRASH
148%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
[080471a]149
150\subsection{Cache Thrash}
151
[c9136d9]152Thrash tests memory allocators for active false sharing (see \VRef{sec:benchThrashSec}).
[ba897d21]153This experiment was run with following configurations:
[c9136d9]154\begin{description}[itemsep=0pt,parsep=0pt]
[b81ab1c6]155\item[threads:]
[c9136d9]1561, 2, 4, 8, 16
157\item[iterations:]
1581,000
159\item[cacheRW:]
1601,000,000
161\item[size:]
1621
163\end{description}
164
165% * Each allocator was tested for its performance across different number of threads.
166% Experiment was repeated for each allocator for 1, 2, 4, 8, and 16 threads by setting the configuration -threadN.
[ba897d21]167
[b81ab1c6]168\VRef[Figure]{fig:cacheThrash} shows the results for algol and nasus.
169The X-axis shows the number of threads;
170the Y-axis shows the total experiment time.
171Each allocator's performance for each thread is shown in different colors.
[ba897d21]172
173\begin{figure}
174\centering
[b81ab1c6]175 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/cache-time-0-thrash} }
176 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/cache-time-0-thrash} }
[ba897d21]177\caption{Cache Thrash}
178\label{fig:cacheThrash}
179\end{figure}
180
[b81ab1c6]181All allocators did well in this micro-benchmark, except for \textsf{dl} and \textsf{pt3} on the x64.
182Either the memory allocators generate little active false-sharing or the micro-benchmark is not generating scenarios that cause active false-sharing.
183
[ba897d21]184%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
185%% SCRATCH
186%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
187
188\subsection{Cache Scratch}
189
[b81ab1c6]190Scratch tests memory allocators for program-induced allocator-preserved passive false-sharing (see \VRef{s:CacheScratch}).
[ba897d21]191This experiment was run with following configurations:
[c9136d9]192\begin{description}[itemsep=0pt,parsep=0pt]
193\item[threads:]
1941, 2, 4, 8, 16
195\item[iterations:]
1961,000
197\item[cacheRW:]
1981,000,000
199\item[size:]
2001
201\end{description}
202
203% * Each allocator was tested for its performance across different number of threads.
204% Experiment was repeated for each allocator for 1, 2, 4, 8, and 16 threads by setting the configuration -threadN.
[ba897d21]205
[b81ab1c6]206\VRef[Figure]{fig:cacheScratch} shows the results for algol and nasus.
207The X-axis shows the number of threads;
208the Y-axis shows the total experiment time.
209Each allocator's performance for each thread is shown in different colors.
[ba897d21]210
211\begin{figure}
212\centering
[b81ab1c6]213 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/cache-time-0-scratch} }
214 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/cache-time-0-scratch} }
[ba897d21]215\caption{Cache Scratch}
216\label{fig:cacheScratch}
217\end{figure}
218
[b81ab1c6]219All allocators did well in this micro-benchmark on the ARM.
220Allocators \textsf{llh}, \textsf{je}, and \textsf{rp} did well on the x64, while the remaining allocators experienced significant slowdowns from the false sharing.
221
[ba897d21]222%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
223%% SPEED
224%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
225
[c9136d9]226\subsection{Speed Micro-Benchmark}
[ba897d21]227
[b81ab1c6]228Speed tests memory allocators for runtime latency (see \VRef{s:SpeedMicroBenchmark}).
[ba897d21]229This experiment was run with following configurations:
[c9136d9]230\begin{description}[itemsep=0pt,parsep=0pt]
231\item[max:]
232500
233\item[min:]
23450
235\item[step:]
23650
237\item[distro:]
238fisher
239\item[objects:]
2401,000,000
241\item[workers:]
2421, 2, 4, 8, 16
243\end{description}
244
245% -maxS : 500
246% -minS : 50
247% -stepS : 50
248% -distroS : fisher
249% -objN : 1000000
250% -threadN : \{ 1, 2, 4, 8, 16 \} *
251
252%* Each allocator was tested for its performance across different number of threads.
253%Experiment was repeated for each allocator for 1, 2, 4, 8, and 16 threads by setting the configuration -threadN.
[3c79ea9]254
[b81ab1c6]255\VRefrange[Figures]{fig:speed-3-malloc}{fig:speed-14-malloc-calloc-realloc-free} show 12 figures, one figure for each chain of the speed benchmark.
256The X-axis shows the number of threads;
257the Y-axis shows the total experiment time.
258Each allocator's performance for each thread is shown in different colors.
[3c79ea9]259
260\begin{itemize}
[b81ab1c6]261\item \VRef[Figure]{fig:speed-3-malloc} shows results for chain: malloc
262\item \VRef[Figure]{fig:speed-4-realloc} shows results for chain: realloc
263\item \VRef[Figure]{fig:speed-5-free} shows results for chain: free
264\item \VRef[Figure]{fig:speed-6-calloc} shows results for chain: calloc
265\item \VRef[Figure]{fig:speed-7-malloc-free} shows results for chain: malloc-free
266\item \VRef[Figure]{fig:speed-8-realloc-free} shows results for chain: realloc-free
267\item \VRef[Figure]{fig:speed-9-calloc-free} shows results for chain: calloc-free
268\item \VRef[Figure]{fig:speed-10-malloc-realloc} shows results for chain: malloc-realloc
269\item \VRef[Figure]{fig:speed-11-calloc-realloc} shows results for chain: calloc-realloc
270\item \VRef[Figure]{fig:speed-12-malloc-realloc-free} shows results for chain: malloc-realloc-free
271\item \VRef[Figure]{fig:speed-13-calloc-realloc-free} shows results for chain: calloc-realloc-free
272\item \VRef[Figure]{fig:speed-14-malloc-calloc-realloc-free} shows results for chain: malloc-realloc-free-calloc
[3c79ea9]273\end{itemize}
[ba897d21]274
[b81ab1c6]275All allocators did well in this micro-benchmark across all allocation chains, except for \textsf{dl} and \textsf{pt3}.
276
[ba897d21]277%speed-3-malloc.eps
278\begin{figure}
279\centering
[b81ab1c6]280 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/speed-3-malloc} }
281 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/speed-3-malloc} }
[3c79ea9]282\caption{Speed benchmark chain: malloc}
[ba897d21]283\label{fig:speed-3-malloc}
284\end{figure}
285
286%speed-4-realloc.eps
287\begin{figure}
288\centering
[b81ab1c6]289 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/speed-4-realloc} }
290 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/speed-4-realloc} }
[3c79ea9]291\caption{Speed benchmark chain: realloc}
[ba897d21]292\label{fig:speed-4-realloc}
293\end{figure}
294
295%speed-5-free.eps
296\begin{figure}
297\centering
[b81ab1c6]298 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/speed-5-free} }
299 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/speed-5-free} }
[3c79ea9]300\caption{Speed benchmark chain: free}
[ba897d21]301\label{fig:speed-5-free}
302\end{figure}
303
304%speed-6-calloc.eps
305\begin{figure}
306\centering
[b81ab1c6]307 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/speed-6-calloc} }
308 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/speed-6-calloc} }
[3c79ea9]309\caption{Speed benchmark chain: calloc}
[ba897d21]310\label{fig:speed-6-calloc}
311\end{figure}
312
313%speed-7-malloc-free.eps
314\begin{figure}
315\centering
[b81ab1c6]316 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/speed-7-malloc-free} }
317 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/speed-7-malloc-free} }
[3c79ea9]318\caption{Speed benchmark chain: malloc-free}
[ba897d21]319\label{fig:speed-7-malloc-free}
320\end{figure}
321
322%speed-8-realloc-free.eps
323\begin{figure}
324\centering
[b81ab1c6]325 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/speed-8-realloc-free} }
326 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/speed-8-realloc-free} }
[3c79ea9]327\caption{Speed benchmark chain: realloc-free}
[ba897d21]328\label{fig:speed-8-realloc-free}
329\end{figure}
330
331%speed-9-calloc-free.eps
332\begin{figure}
333\centering
[b81ab1c6]334 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/speed-9-calloc-free} }
335 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/speed-9-calloc-free} }
[3c79ea9]336\caption{Speed benchmark chain: calloc-free}
[ba897d21]337\label{fig:speed-9-calloc-free}
338\end{figure}
339
340%speed-10-malloc-realloc.eps
341\begin{figure}
342\centering
[b81ab1c6]343 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/speed-10-malloc-realloc} }
344 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/speed-10-malloc-realloc} }
[3c79ea9]345\caption{Speed benchmark chain: malloc-realloc}
[ba897d21]346\label{fig:speed-10-malloc-realloc}
347\end{figure}
348
349%speed-11-calloc-realloc.eps
350\begin{figure}
351\centering
[b81ab1c6]352 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/speed-11-calloc-realloc} }
353 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/speed-11-calloc-realloc} }
[3c79ea9]354\caption{Speed benchmark chain: calloc-realloc}
[ba897d21]355\label{fig:speed-11-calloc-realloc}
356\end{figure}
357
358%speed-12-malloc-realloc-free.eps
359\begin{figure}
360\centering
[b81ab1c6]361 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/speed-12-malloc-realloc-free} }
362 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/speed-12-malloc-realloc-free} }
[3c79ea9]363\caption{Speed benchmark chain: malloc-realloc-free}
[ba897d21]364\label{fig:speed-12-malloc-realloc-free}
365\end{figure}
366
367%speed-13-calloc-realloc-free.eps
368\begin{figure}
369\centering
[b81ab1c6]370 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/speed-13-calloc-realloc-free} }
371 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/speed-13-calloc-realloc-free} }
[3c79ea9]372\caption{Speed benchmark chain: calloc-realloc-free}
[ba897d21]373\label{fig:speed-13-calloc-realloc-free}
374\end{figure}
375
376%speed-14-{m,c,re}alloc-free.eps
377\begin{figure}
378\centering
[b81ab1c6]379 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/speed-14-m-c-re-alloc-free} }
380 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/speed-14-m-c-re-alloc-free} }
[3c79ea9]381\caption{Speed benchmark chain: malloc-calloc-realloc-free}
382\label{fig:speed-14-malloc-calloc-realloc-free}
[ba897d21]383\end{figure}
384
385%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
386%% MEMORY
387%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
388
[b81ab1c6]389\newpage
[c9136d9]390\subsection{Memory Micro-Benchmark}
[ba897d21]391
[b81ab1c6]392This experiment is run with the following two configurations for each allocator.
[c9136d9]393The difference between the two configurations is the number of producers and consumers.
[b81ab1c6]394Configuration 1 has one producer and one consumer, and configuration 2 has 4 producers, where each producer has 4 consumers.
[3c79ea9]395
[c9136d9]396\noindent
[3c79ea9]397Configuration 1:
[c9136d9]398\begin{description}[itemsep=0pt,parsep=0pt]
399\item[producer (K):]
4001
401\item[consumer (M):]
4021
403\item[round:]
404100,000
405\item[max:]
406500
407\item[min:]
40850
409\item[step:]
41050
411\item[distro:]
412fisher
413\item[objects (N):]
414100,000
415\end{description}
416
417% -threadA : 1
418% -threadF : 1
419% -maxS : 500
420% -minS : 50
421% -stepS : 50
422% -distroS : fisher
423% -objN : 100000
424% -consumeS: 100000
425
426\noindent
[3c79ea9]427Configuration 2:
[c9136d9]428\begin{description}[itemsep=0pt,parsep=0pt]
429\item[producer (K):]
4304
431\item[consumer (M):]
4324
433\item[round:]
434100,000
435\item[max:]
436500
437\item[min:]
43850
439\item[step:]
44050
441\item[distro:]
442fisher
443\item[objects (N):]
444100,000
445\end{description}
446
447% -threadA : 4
448% -threadF : 4
449% -maxS : 500
450% -minS : 50
451% -stepS : 50
452% -distroS : fisher
453% -objN : 100000
454% -consumeS: 100000
455
456\begin{table}[b]
[3c79ea9]457\centering
[94d91e17]458 \begin{tabular}{ |c|c|c| }
[3c79ea9]459 \hline
460 Memory Allocator & Configuration 1 Result & Configuration 2 Result\\
461 \hline
[b81ab1c6]462 llh & \VRef[Figure]{fig:mem-1-prod-1-cons-100-cfa} & \VRef[Figure]{fig:mem-4-prod-4-cons-100-cfa}\\
[3c79ea9]463 \hline
[b81ab1c6]464 dl & \VRef[Figure]{fig:mem-1-prod-1-cons-100-dl} & \VRef[Figure]{fig:mem-4-prod-4-cons-100-dl}\\
[3c79ea9]465 \hline
[b81ab1c6]466 glibc & \VRef[Figure]{fig:mem-1-prod-1-cons-100-glc} & \VRef[Figure]{fig:mem-4-prod-4-cons-100-glc}\\
[3c79ea9]467 \hline
[b81ab1c6]468 hoard & \VRef[Figure]{fig:mem-1-prod-1-cons-100-hrd} & \VRef[Figure]{fig:mem-4-prod-4-cons-100-hrd}\\
[3c79ea9]469 \hline
[b81ab1c6]470 je & \VRef[Figure]{fig:mem-1-prod-1-cons-100-je} & \VRef[Figure]{fig:mem-4-prod-4-cons-100-je}\\
[3c79ea9]471 \hline
[b81ab1c6]472 pt3 & \VRef[Figure]{fig:mem-1-prod-1-cons-100-pt3} & \VRef[Figure]{fig:mem-4-prod-4-cons-100-pt3}\\
[3c79ea9]473 \hline
[b81ab1c6]474 rp & \VRef[Figure]{fig:mem-1-prod-1-cons-100-rp} & \VRef[Figure]{fig:mem-4-prod-4-cons-100-rp}\\
[3c79ea9]475 \hline
[b81ab1c6]476 tbb & \VRef[Figure]{fig:mem-1-prod-1-cons-100-tbb} & \VRef[Figure]{fig:mem-4-prod-4-cons-100-tbb}\\
[3c79ea9]477 \hline
478 \end{tabular}
479\caption{Memory benchmark results}
480\label{table:mem-benchmark-figs}
481\end{table}
482
[b81ab1c6]483\VRefrange[Figures]{fig:mem-1-prod-1-cons-100-cfa}{fig:mem-4-prod-4-cons-100-tbb} show 16 figures, two figures for each of the 8 allocators, one for each configuration.
[c9136d9]484Table \ref{table:mem-benchmark-figs} shows the list of figures that contain memory benchmark results.
[3c79ea9]485
486Each figure has 2 graphs, one for each experiment environment.
487Each graph has following 5 subgraphs that show program's memory usage and statistics throughout the program lifetime.
488
489\begin{itemize}
490\item \textit{\textbf{current\_req\_mem(B)}} shows the amount of dynamic memory requested and currently in-use of the benchmark.
491\item \textit{\textbf{heap}}* shows the memory requested by the program (allocator) from the system that lies in the heap area.
492\item \textit{\textbf{mmap\_so}}* shows the memory requested by the program (allocator) from the system that lies in the mmap area.
493\item \textit{\textbf{mmap}}* shows the memory requested by the program (allocator or shared libraries) from the system that lies in the mmap area.
494\item \textit{\textbf{total\_dynamic}} shows the total usage of dynamic memory by the benchmark program which is a sum of heap, mmap, and mmap\_so.
495\end{itemize}
496
497* These statistics are gathered by monitoring the \textit{/proc/self/maps} file of the process in linux system.
498
[c9136d9]499For each subgraph, x-axis shows the time during the program lifetime at which the data point was generated.
[3c79ea9]500Y-axis shows the memory usage in bytes.
501
[c9136d9]502For the experiment, at a certain time in the program's life, the difference between the memory requested by the benchmark (\textit{current\_req\_mem(B)})
503and the memory that the process has received from system (\textit{heap}, \textit{mmap}) should be minimum.
[3c79ea9]504This difference is the memory overhead caused by the allocator and shows the level of fragmentation in the allocator.
505
[ba897d21]506%mem-1-prod-1-cons-100-cfa.eps
507\begin{figure}
508\centering
[b81ab1c6]509 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-cfa} }
510 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-cfa} }
511\caption{Memory benchmark results with 1 producer for llh memory allocator}
[ba897d21]512\label{fig:mem-1-prod-1-cons-100-cfa}
513\end{figure}
514
515%mem-1-prod-1-cons-100-dl.eps
516\begin{figure}
517\centering
[b81ab1c6]518 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-dl} }
519 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-dl} }
[c9136d9]520\caption{Memory benchmark results with 1 producer for dl memory allocator}
[ba897d21]521\label{fig:mem-1-prod-1-cons-100-dl}
522\end{figure}
523
524%mem-1-prod-1-cons-100-glc.eps
525\begin{figure}
526\centering
[b81ab1c6]527 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-glc} }
528 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-glc} }
[c9136d9]529\caption{Memory benchmark results with 1 producer for glibc memory allocator}
[ba897d21]530\label{fig:mem-1-prod-1-cons-100-glc}
531\end{figure}
532
533%mem-1-prod-1-cons-100-hrd.eps
534\begin{figure}
535\centering
[b81ab1c6]536 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-hrd} }
537 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-hrd} }
[c9136d9]538\caption{Memory benchmark results with 1 producer for hoard memory allocator}
[ba897d21]539\label{fig:mem-1-prod-1-cons-100-hrd}
540\end{figure}
541
542%mem-1-prod-1-cons-100-je.eps
543\begin{figure}
544\centering
[b81ab1c6]545 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-je} }
546 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-je} }
[c9136d9]547\caption{Memory benchmark results with 1 producer for je memory allocator}
[ba897d21]548\label{fig:mem-1-prod-1-cons-100-je}
549\end{figure}
550
551%mem-1-prod-1-cons-100-pt3.eps
552\begin{figure}
553\centering
[b81ab1c6]554 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-pt3} }
555 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-pt3} }
[c9136d9]556\caption{Memory benchmark results with 1 producer for pt3 memory allocator}
[ba897d21]557\label{fig:mem-1-prod-1-cons-100-pt3}
558\end{figure}
559
560%mem-1-prod-1-cons-100-rp.eps
561\begin{figure}
562\centering
[b81ab1c6]563 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-rp} }
564 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-rp} }
[c9136d9]565\caption{Memory benchmark results with 1 producer for rp memory allocator}
[ba897d21]566\label{fig:mem-1-prod-1-cons-100-rp}
567\end{figure}
568
569%mem-1-prod-1-cons-100-tbb.eps
570\begin{figure}
571\centering
[b81ab1c6]572 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-tbb} }
573 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-tbb} }
[c9136d9]574\caption{Memory benchmark results with 1 producer for tbb memory allocator}
[ba897d21]575\label{fig:mem-1-prod-1-cons-100-tbb}
576\end{figure}
577
578%mem-4-prod-4-cons-100-cfa.eps
579\begin{figure}
580\centering
[b81ab1c6]581 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-cfa} }
582 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-cfa} }
583\caption{Memory benchmark results with 4 producers for llh memory allocator}
[ba897d21]584\label{fig:mem-4-prod-4-cons-100-cfa}
585\end{figure}
586
587%mem-4-prod-4-cons-100-dl.eps
588\begin{figure}
589\centering
[b81ab1c6]590 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-dl} }
591 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-dl} }
[c9136d9]592\caption{Memory benchmark results with 4 producers for dl memory allocator}
[ba897d21]593\label{fig:mem-4-prod-4-cons-100-dl}
594\end{figure}
595
596%mem-4-prod-4-cons-100-glc.eps
597\begin{figure}
598\centering
[b81ab1c6]599 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-glc} }
600 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-glc} }
[c9136d9]601\caption{Memory benchmark results with 4 producers for glibc memory allocator}
[ba897d21]602\label{fig:mem-4-prod-4-cons-100-glc}
603\end{figure}
604
605%mem-4-prod-4-cons-100-hrd.eps
606\begin{figure}
607\centering
[b81ab1c6]608 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-hrd} }
609 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-hrd} }
[c9136d9]610\caption{Memory benchmark results with 4 producers for hoard memory allocator}
[ba897d21]611\label{fig:mem-4-prod-4-cons-100-hrd}
612\end{figure}
613
614%mem-4-prod-4-cons-100-je.eps
615\begin{figure}
616\centering
[b81ab1c6]617 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-je} }
618 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-je} }
[c9136d9]619\caption{Memory benchmark results with 4 producers for je memory allocator}
[ba897d21]620\label{fig:mem-4-prod-4-cons-100-je}
621\end{figure}
622
623%mem-4-prod-4-cons-100-pt3.eps
624\begin{figure}
625\centering
[b81ab1c6]626 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-pt3} }
627 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-pt3} }
[c9136d9]628\caption{Memory benchmark results with 4 producers for pt3 memory allocator}
[ba897d21]629\label{fig:mem-4-prod-4-cons-100-pt3}
630\end{figure}
631
632%mem-4-prod-4-cons-100-rp.eps
633\begin{figure}
634\centering
[b81ab1c6]635 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-rp} }
636 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-rp} }
[c9136d9]637\caption{Memory benchmark results with 4 producers for rp memory allocator}
[ba897d21]638\label{fig:mem-4-prod-4-cons-100-rp}
639\end{figure}
[080471a]640
[ba897d21]641%mem-4-prod-4-cons-100-tbb.eps
642\begin{figure}
643\centering
[b81ab1c6]644 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-tbb} }
645 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-tbb} }
[c9136d9]646\caption{Memory benchmark results with 4 producers for tbb memory allocator}
[ba897d21]647\label{fig:mem-4-prod-4-cons-100-tbb}
648\end{figure}
[3c79ea9]649
650%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
651%% ANALYSIS
652%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Note: See TracBrowser for help on using the repository browser.