source: doc/theses/mubeen_zulfiqar_MMath/performance.tex @ b81ab1c6

pthread-emulationqualifiedEnum
Last change on this file since b81ab1c6 was b81ab1c6, checked in by Peter A. Buhr <pabuhr@…>, 9 months ago

second proofread of chapter performance

  • Property mode set to 100644
File size: 26.7 KB
Line 
1\chapter{Performance}
2\label{c:Performance}
3
4This chapter uses the micro-benchmarks from \VRef[Chapter]{s:Benchmarks} to test a number of current memory allocators, including llheap.
5The goal is to see if llheap is competitive with the current best memory allocators.
6
7
8\section{Machine Specification}
9
10The performance experiments were run on two different multi-core architectures (x64 and ARM) to determine if there is consistency across platforms:
11\begin{itemize}
12\item
13\textbf{Nasus} AMD EPYC 7662, 64-core socket $\times$ 2, 2.0 GHz, GCC version 9.3.0
14\item
15\textbf{Algol} Huawei ARM TaiShan 2280 V2 Kunpeng 920, 24-core socket $\times$ 4, 2.6 GHz, GCC version 9.4.0
16\end{itemize}
17
18
19\section{Existing Memory Allocators}
20\label{sec:curAllocatorSec}
21
22With dynamic allocation being an important feature of C, there are many stand-alone memory allocators that have been designed for different purposes.
23For this thesis, 7 of the most popular and widely used memory allocators were selected for comparison, along with llheap.
24
25\paragraph{llheap (\textsf{llh})} 
26is the thread-safe allocator from \VRef[Chapter]{c:Allocator}
27\\
28\textbf{Version:} 1.0
29\textbf{Configuration:} Compiled with dynamic linking, but without statistics or debugging.\\
30\textbf{Compilation command:} @make@
31
32\paragraph{glibc (\textsf{glc})}
33\cite{glibc} is the default gcc thread-safe allocator.
34\\
35\textbf{Version:} Ubuntu GLIBC 2.31-0ubuntu9.7 2.31\\
36\textbf{Configuration:} Compiled by Ubuntu 20.04.\\
37\textbf{Compilation command:} N/A
38
39\paragraph{dlmalloc (\textsf{dl})}
40\cite{dlmalloc} is a thread-safe allocator that is single threaded and single heap.
41It maintains free-lists of different sizes to store freed dynamic memory.
42\\
43\textbf{Version:} 2.8.6\\
44\textbf{Configuration:} Compiled with preprocessor @USE_LOCKS@.\\
45\textbf{Compilation command:} @gcc -g3 -O3 -Wall -Wextra -fno-builtin-malloc -fno-builtin-calloc@ @-fno-builtin-realloc -fno-builtin-free -fPIC -shared -DUSE_LOCKS -o libdlmalloc.so malloc-2.8.6.c@
46
47\paragraph{hoard (\textsf{hrd})}
48\cite{hoard} is a thread-safe allocator that is multi-threaded and using a heap layer framework. It has per-thread heaps that have thread-local free-lists, and a global shared heap.
49\\
50\textbf{Version:} 3.13\\
51\textbf{Configuration:} Compiled with hoard's default configurations and @Makefile@.\\
52\textbf{Compilation command:} @make all@
53
54\paragraph{jemalloc (\textsf{je})}
55\cite{jemalloc} is a thread-safe allocator that uses multiple arenas. Each thread is assigned an arena.
56Each arena has chunks that contain contagious memory regions of same size. An arena has multiple chunks that contain regions of multiple sizes.
57\\
58\textbf{Version:} 5.2.1\\
59\textbf{Configuration:} Compiled with jemalloc's default configurations and @Makefile@.\\
60\textbf{Compilation command:} @autogen.sh; configure; make; make install@
61
62\paragraph{pt3malloc (\textsf{pt3})}
63\cite{pt3malloc} is a modification of dlmalloc.
64It is a thread-safe multi-threaded memory allocator that uses multiple heaps.
65pt3malloc heap has similar design to dlmalloc's heap.
66\\
67\textbf{Version:} 1.8\\
68\textbf{Configuration:} Compiled with pt3malloc's @Makefile@ using option ``linux-shared''.\\
69\textbf{Compilation command:} @make linux-shared@
70
71\paragraph{rpmalloc (\textsf{rp})}
72\cite{rpmalloc} is a thread-safe allocator that is multi-threaded and uses per-thread heap.
73Each heap has multiple size-classes and each size-class contains memory regions of the relevant size.
74\\
75\textbf{Version:} 1.4.1\\
76\textbf{Configuration:} Compiled with rpmalloc's default configurations and ninja build system.\\
77\textbf{Compilation command:} @python3 configure.py; ninja@
78
79\paragraph{tbb malloc (\textsf{tbb})}
80\cite{tbbmalloc} is a thread-safe allocator that is multi-threaded and uses private heap for each thread.
81Each private-heap has multiple bins of different sizes. Each bin contains free regions of the same size.
82\\
83\textbf{Version:} intel tbb 2020 update 2, tbb\_interface\_version == 11102\\
84\textbf{Configuration:} Compiled with tbbmalloc's default configurations and @Makefile@.\\
85\textbf{Compilation command:} @make@
86
87% \section{Experiment Environment}
88% We used our micro benchmark suite (FIX ME: cite mbench) to evaluate these memory allocators \ref{sec:curAllocatorSec} and our own memory allocator uHeap \ref{sec:allocatorSec}.
89
90\section{Experiments}
91
92The each micro-benchmark is configured and run with each of the allocators,
93The less time an allocator takes to complete a benchmark the better, so lower in the graphs is better.
94
95%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
96%% CHURN
97%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
98
99\subsection{Churn Micro-Benchmark}
100
101Churn tests allocators for speed under intensive dynamic memory usage (see \VRef{s:ChurnBenchmark}).
102This experiment was run with following configurations:
103\begin{description}[itemsep=0pt,parsep=0pt]
104\item[thread:]
1051, 2, 4, 8, 16
106\item[spots:]
10716
108\item[obj:]
109100,000
110\item[max:]
111500
112\item[min:]
11350
114\item[step:]
11550
116\item[distro:]
117fisher
118\end{description}
119
120% -maxS          : 500
121% -minS          : 50
122% -stepS                 : 50
123% -distroS       : fisher
124% -objN          : 100000
125% -cSpots                : 16
126% -threadN       : 1, 2, 4, 8, 16
127
128\VRef[Figure]{fig:churn} shows the results for algol and nasus.
129The X-axis shows the number of threads;
130the Y-axis shows the total experiment time.
131Each allocator's performance for each thread is shown in different colors.
132
133\begin{figure}
134\centering
135    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/churn} }
136    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/churn} }
137\caption{Churn}
138\label{fig:churn}
139\end{figure}
140
141All allocators did well in this micro-benchmark, except for \textsf{dl} on the ARM.
142llheap is slightly slower because it uses ownership, where many of the allocations have remote frees, which requires locking.
143When llheap is compiled without ownership, its performance is the same as the other allocators (not shown).
144
145
146%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
147%% THRASH
148%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
149
150\subsection{Cache Thrash}
151
152Thrash tests memory allocators for active false sharing (see \VRef{sec:benchThrashSec}).
153This experiment was run with following configurations:
154\begin{description}[itemsep=0pt,parsep=0pt]
155\item[threads:]
1561, 2, 4, 8, 16
157\item[iterations:]
1581,000
159\item[cacheRW:]
1601,000,000
161\item[size:]
1621
163\end{description}
164
165% * Each allocator was tested for its performance across different number of threads.
166% Experiment was repeated for each allocator for 1, 2, 4, 8, and 16 threads by setting the configuration -threadN.
167
168\VRef[Figure]{fig:cacheThrash} shows the results for algol and nasus.
169The X-axis shows the number of threads;
170the Y-axis shows the total experiment time.
171Each allocator's performance for each thread is shown in different colors.
172
173\begin{figure}
174\centering
175    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/cache-time-0-thrash} }
176    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/cache-time-0-thrash} }
177\caption{Cache Thrash}
178\label{fig:cacheThrash}
179\end{figure}
180
181All allocators did well in this micro-benchmark, except for \textsf{dl} and \textsf{pt3} on the x64.
182Either the memory allocators generate little active false-sharing or the micro-benchmark is not generating scenarios that cause active false-sharing.
183
184%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
185%% SCRATCH
186%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
187
188\subsection{Cache Scratch}
189
190Scratch tests memory allocators for program-induced allocator-preserved passive false-sharing (see \VRef{s:CacheScratch}).
191This experiment was run with following configurations:
192\begin{description}[itemsep=0pt,parsep=0pt]
193\item[threads:]
1941, 2, 4, 8, 16
195\item[iterations:]
1961,000
197\item[cacheRW:]
1981,000,000
199\item[size:]
2001
201\end{description}
202
203% * Each allocator was tested for its performance across different number of threads.
204% Experiment was repeated for each allocator for 1, 2, 4, 8, and 16 threads by setting the configuration -threadN.
205
206\VRef[Figure]{fig:cacheScratch} shows the results for algol and nasus.
207The X-axis shows the number of threads;
208the Y-axis shows the total experiment time.
209Each allocator's performance for each thread is shown in different colors.
210
211\begin{figure}
212\centering
213    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/cache-time-0-scratch} }
214    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/cache-time-0-scratch} }
215\caption{Cache Scratch}
216\label{fig:cacheScratch}
217\end{figure}
218
219All allocators did well in this micro-benchmark on the ARM.
220Allocators \textsf{llh}, \textsf{je}, and \textsf{rp} did well on the x64, while the remaining allocators experienced significant slowdowns from the false sharing.
221
222%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
223%% SPEED
224%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
225
226\subsection{Speed Micro-Benchmark}
227
228Speed tests memory allocators for runtime latency (see \VRef{s:SpeedMicroBenchmark}).
229This experiment was run with following configurations:
230\begin{description}[itemsep=0pt,parsep=0pt]
231\item[max:]
232500
233\item[min:]
23450
235\item[step:]
23650
237\item[distro:]
238fisher
239\item[objects:]
2401,000,000
241\item[workers:]
2421, 2, 4, 8, 16
243\end{description}
244
245% -maxS    :  500
246% -minS    :  50
247% -stepS   :  50
248% -distroS :  fisher
249% -objN    :  1000000
250% -threadN    : \{ 1, 2, 4, 8, 16 \} *
251
252%* Each allocator was tested for its performance across different number of threads.
253%Experiment was repeated for each allocator for 1, 2, 4, 8, and 16 threads by setting the configuration -threadN.
254
255\VRefrange[Figures]{fig:speed-3-malloc}{fig:speed-14-malloc-calloc-realloc-free} show 12 figures, one figure for each chain of the speed benchmark.
256The X-axis shows the number of threads;
257the Y-axis shows the total experiment time.
258Each allocator's performance for each thread is shown in different colors.
259
260\begin{itemize}
261\item \VRef[Figure]{fig:speed-3-malloc} shows results for chain: malloc
262\item \VRef[Figure]{fig:speed-4-realloc} shows results for chain: realloc
263\item \VRef[Figure]{fig:speed-5-free} shows results for chain: free
264\item \VRef[Figure]{fig:speed-6-calloc} shows results for chain: calloc
265\item \VRef[Figure]{fig:speed-7-malloc-free} shows results for chain: malloc-free
266\item \VRef[Figure]{fig:speed-8-realloc-free} shows results for chain: realloc-free
267\item \VRef[Figure]{fig:speed-9-calloc-free} shows results for chain: calloc-free
268\item \VRef[Figure]{fig:speed-10-malloc-realloc} shows results for chain: malloc-realloc
269\item \VRef[Figure]{fig:speed-11-calloc-realloc} shows results for chain: calloc-realloc
270\item \VRef[Figure]{fig:speed-12-malloc-realloc-free} shows results for chain: malloc-realloc-free
271\item \VRef[Figure]{fig:speed-13-calloc-realloc-free} shows results for chain: calloc-realloc-free
272\item \VRef[Figure]{fig:speed-14-malloc-calloc-realloc-free} shows results for chain: malloc-realloc-free-calloc
273\end{itemize}
274
275All allocators did well in this micro-benchmark across all allocation chains, except for \textsf{dl} and \textsf{pt3}.
276
277%speed-3-malloc.eps
278\begin{figure}
279\centering
280    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/speed-3-malloc} }
281    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/speed-3-malloc} }
282\caption{Speed benchmark chain: malloc}
283\label{fig:speed-3-malloc}
284\end{figure}
285
286%speed-4-realloc.eps
287\begin{figure}
288\centering
289    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/speed-4-realloc} }
290    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/speed-4-realloc} }
291\caption{Speed benchmark chain: realloc}
292\label{fig:speed-4-realloc}
293\end{figure}
294
295%speed-5-free.eps
296\begin{figure}
297\centering
298    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/speed-5-free} }
299    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/speed-5-free} }
300\caption{Speed benchmark chain: free}
301\label{fig:speed-5-free}
302\end{figure}
303
304%speed-6-calloc.eps
305\begin{figure}
306\centering
307    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/speed-6-calloc} }
308    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/speed-6-calloc} }
309\caption{Speed benchmark chain: calloc}
310\label{fig:speed-6-calloc}
311\end{figure}
312
313%speed-7-malloc-free.eps
314\begin{figure}
315\centering
316    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/speed-7-malloc-free} }
317    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/speed-7-malloc-free} }
318\caption{Speed benchmark chain: malloc-free}
319\label{fig:speed-7-malloc-free}
320\end{figure}
321
322%speed-8-realloc-free.eps
323\begin{figure}
324\centering
325    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/speed-8-realloc-free} }
326    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/speed-8-realloc-free} }
327\caption{Speed benchmark chain: realloc-free}
328\label{fig:speed-8-realloc-free}
329\end{figure}
330
331%speed-9-calloc-free.eps
332\begin{figure}
333\centering
334    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/speed-9-calloc-free} }
335    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/speed-9-calloc-free} }
336\caption{Speed benchmark chain: calloc-free}
337\label{fig:speed-9-calloc-free}
338\end{figure}
339
340%speed-10-malloc-realloc.eps
341\begin{figure}
342\centering
343    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/speed-10-malloc-realloc} }
344    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/speed-10-malloc-realloc} }
345\caption{Speed benchmark chain: malloc-realloc}
346\label{fig:speed-10-malloc-realloc}
347\end{figure}
348
349%speed-11-calloc-realloc.eps
350\begin{figure}
351\centering
352    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/speed-11-calloc-realloc} }
353    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/speed-11-calloc-realloc} }
354\caption{Speed benchmark chain: calloc-realloc}
355\label{fig:speed-11-calloc-realloc}
356\end{figure}
357
358%speed-12-malloc-realloc-free.eps
359\begin{figure}
360\centering
361    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/speed-12-malloc-realloc-free} }
362    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/speed-12-malloc-realloc-free} }
363\caption{Speed benchmark chain: malloc-realloc-free}
364\label{fig:speed-12-malloc-realloc-free}
365\end{figure}
366
367%speed-13-calloc-realloc-free.eps
368\begin{figure}
369\centering
370    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/speed-13-calloc-realloc-free} }
371    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/speed-13-calloc-realloc-free} }
372\caption{Speed benchmark chain: calloc-realloc-free}
373\label{fig:speed-13-calloc-realloc-free}
374\end{figure}
375
376%speed-14-{m,c,re}alloc-free.eps
377\begin{figure}
378\centering
379    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/speed-14-m-c-re-alloc-free} }
380    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/speed-14-m-c-re-alloc-free} }
381\caption{Speed benchmark chain: malloc-calloc-realloc-free}
382\label{fig:speed-14-malloc-calloc-realloc-free}
383\end{figure}
384
385%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
386%% MEMORY
387%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
388
389\newpage
390\subsection{Memory Micro-Benchmark}
391
392This experiment is run with the following two configurations for each allocator.
393The difference between the two configurations is the number of producers and consumers.
394Configuration 1 has one producer and one consumer, and configuration 2 has 4 producers, where each producer has 4 consumers.
395
396\noindent
397Configuration 1:
398\begin{description}[itemsep=0pt,parsep=0pt]
399\item[producer (K):]
4001
401\item[consumer (M):]
4021
403\item[round:]
404100,000
405\item[max:]
406500
407\item[min:]
40850
409\item[step:]
41050
411\item[distro:]
412fisher
413\item[objects (N):]
414100,000
415\end{description}
416
417% -threadA :  1
418% -threadF :  1
419% -maxS    :  500
420% -minS    :  50
421% -stepS   :  50
422% -distroS :  fisher
423% -objN    :  100000
424% -consumeS:  100000
425
426\noindent
427Configuration 2:
428\begin{description}[itemsep=0pt,parsep=0pt]
429\item[producer (K):]
4304
431\item[consumer (M):]
4324
433\item[round:]
434100,000
435\item[max:]
436500
437\item[min:]
43850
439\item[step:]
44050
441\item[distro:]
442fisher
443\item[objects (N):]
444100,000
445\end{description}
446
447% -threadA :  4
448% -threadF :  4
449% -maxS    :  500
450% -minS    :  50
451% -stepS   :  50
452% -distroS :  fisher
453% -objN    :  100000
454% -consumeS:  100000
455
456\begin{table}[b]
457\centering
458    \begin{tabular}{ |c|c|c| }
459     \hline
460    Memory Allocator & Configuration 1 Result & Configuration 2 Result\\
461     \hline
462    llh & \VRef[Figure]{fig:mem-1-prod-1-cons-100-cfa} & \VRef[Figure]{fig:mem-4-prod-4-cons-100-cfa}\\
463     \hline
464    dl & \VRef[Figure]{fig:mem-1-prod-1-cons-100-dl} & \VRef[Figure]{fig:mem-4-prod-4-cons-100-dl}\\
465     \hline
466    glibc & \VRef[Figure]{fig:mem-1-prod-1-cons-100-glc} & \VRef[Figure]{fig:mem-4-prod-4-cons-100-glc}\\
467     \hline
468    hoard & \VRef[Figure]{fig:mem-1-prod-1-cons-100-hrd} & \VRef[Figure]{fig:mem-4-prod-4-cons-100-hrd}\\
469     \hline
470    je & \VRef[Figure]{fig:mem-1-prod-1-cons-100-je} & \VRef[Figure]{fig:mem-4-prod-4-cons-100-je}\\
471     \hline
472    pt3 & \VRef[Figure]{fig:mem-1-prod-1-cons-100-pt3} & \VRef[Figure]{fig:mem-4-prod-4-cons-100-pt3}\\
473     \hline
474    rp & \VRef[Figure]{fig:mem-1-prod-1-cons-100-rp} & \VRef[Figure]{fig:mem-4-prod-4-cons-100-rp}\\
475     \hline
476    tbb & \VRef[Figure]{fig:mem-1-prod-1-cons-100-tbb} & \VRef[Figure]{fig:mem-4-prod-4-cons-100-tbb}\\
477     \hline
478    \end{tabular}
479\caption{Memory benchmark results}
480\label{table:mem-benchmark-figs}
481\end{table}
482
483\VRefrange[Figures]{fig:mem-1-prod-1-cons-100-cfa}{fig:mem-4-prod-4-cons-100-tbb} show 16 figures, two figures for each of the 8 allocators, one for each configuration.
484Table \ref{table:mem-benchmark-figs} shows the list of figures that contain memory benchmark results.
485
486Each figure has 2 graphs, one for each experiment environment.
487Each graph has following 5 subgraphs that show program's memory usage and statistics throughout the program lifetime.
488
489\begin{itemize}
490\item \textit{\textbf{current\_req\_mem(B)}} shows the amount of dynamic memory requested and currently in-use of the benchmark.
491\item \textit{\textbf{heap}}* shows the memory requested by the program (allocator) from the system that lies in the heap area.
492\item \textit{\textbf{mmap\_so}}* shows the memory requested by the program (allocator) from the system that lies in the mmap area.
493\item \textit{\textbf{mmap}}* shows the memory requested by the program (allocator or shared libraries) from the system that lies in the mmap area.
494\item \textit{\textbf{total\_dynamic}} shows the total usage of dynamic memory by the benchmark program which is a sum of heap, mmap, and mmap\_so.
495\end{itemize}
496
497* These statistics are gathered by monitoring the \textit{/proc/self/maps} file of the process in linux system.
498
499For each subgraph, x-axis shows the time during the program lifetime at which the data point was generated.
500Y-axis shows the memory usage in bytes.
501
502For the experiment, at a certain time in the program's life, the difference between the memory requested by the benchmark (\textit{current\_req\_mem(B)})
503and the memory that the process has received from system (\textit{heap}, \textit{mmap}) should be minimum.
504This difference is the memory overhead caused by the allocator and shows the level of fragmentation in the allocator.
505
506%mem-1-prod-1-cons-100-cfa.eps
507\begin{figure}
508\centering
509    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-cfa} }
510    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-cfa} }
511\caption{Memory benchmark results with 1 producer for llh memory allocator}
512\label{fig:mem-1-prod-1-cons-100-cfa}
513\end{figure}
514
515%mem-1-prod-1-cons-100-dl.eps
516\begin{figure}
517\centering
518    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-dl} }
519    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-dl} }
520\caption{Memory benchmark results with 1 producer for dl memory allocator}
521\label{fig:mem-1-prod-1-cons-100-dl}
522\end{figure}
523
524%mem-1-prod-1-cons-100-glc.eps
525\begin{figure}
526\centering
527    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-glc} }
528    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-glc} }
529\caption{Memory benchmark results with 1 producer for glibc memory allocator}
530\label{fig:mem-1-prod-1-cons-100-glc}
531\end{figure}
532
533%mem-1-prod-1-cons-100-hrd.eps
534\begin{figure}
535\centering
536    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-hrd} }
537    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-hrd} }
538\caption{Memory benchmark results with 1 producer for hoard memory allocator}
539\label{fig:mem-1-prod-1-cons-100-hrd}
540\end{figure}
541
542%mem-1-prod-1-cons-100-je.eps
543\begin{figure}
544\centering
545    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-je} }
546    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-je} }
547\caption{Memory benchmark results with 1 producer for je memory allocator}
548\label{fig:mem-1-prod-1-cons-100-je}
549\end{figure}
550
551%mem-1-prod-1-cons-100-pt3.eps
552\begin{figure}
553\centering
554    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-pt3} }
555    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-pt3} }
556\caption{Memory benchmark results with 1 producer for pt3 memory allocator}
557\label{fig:mem-1-prod-1-cons-100-pt3}
558\end{figure}
559
560%mem-1-prod-1-cons-100-rp.eps
561\begin{figure}
562\centering
563    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-rp} }
564    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-rp} }
565\caption{Memory benchmark results with 1 producer for rp memory allocator}
566\label{fig:mem-1-prod-1-cons-100-rp}
567\end{figure}
568
569%mem-1-prod-1-cons-100-tbb.eps
570\begin{figure}
571\centering
572    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-tbb} }
573    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-tbb} }
574\caption{Memory benchmark results with 1 producer for tbb memory allocator}
575\label{fig:mem-1-prod-1-cons-100-tbb}
576\end{figure}
577
578%mem-4-prod-4-cons-100-cfa.eps
579\begin{figure}
580\centering
581    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-cfa} }
582    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-cfa} }
583\caption{Memory benchmark results with 4 producers for llh memory allocator}
584\label{fig:mem-4-prod-4-cons-100-cfa}
585\end{figure}
586
587%mem-4-prod-4-cons-100-dl.eps
588\begin{figure}
589\centering
590    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-dl} }
591    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-dl} }
592\caption{Memory benchmark results with 4 producers for dl memory allocator}
593\label{fig:mem-4-prod-4-cons-100-dl}
594\end{figure}
595
596%mem-4-prod-4-cons-100-glc.eps
597\begin{figure}
598\centering
599    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-glc} }
600    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-glc} }
601\caption{Memory benchmark results with 4 producers for glibc memory allocator}
602\label{fig:mem-4-prod-4-cons-100-glc}
603\end{figure}
604
605%mem-4-prod-4-cons-100-hrd.eps
606\begin{figure}
607\centering
608    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-hrd} }
609    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-hrd} }
610\caption{Memory benchmark results with 4 producers for hoard memory allocator}
611\label{fig:mem-4-prod-4-cons-100-hrd}
612\end{figure}
613
614%mem-4-prod-4-cons-100-je.eps
615\begin{figure}
616\centering
617    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-je} }
618    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-je} }
619\caption{Memory benchmark results with 4 producers for je memory allocator}
620\label{fig:mem-4-prod-4-cons-100-je}
621\end{figure}
622
623%mem-4-prod-4-cons-100-pt3.eps
624\begin{figure}
625\centering
626    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-pt3} }
627    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-pt3} }
628\caption{Memory benchmark results with 4 producers for pt3 memory allocator}
629\label{fig:mem-4-prod-4-cons-100-pt3}
630\end{figure}
631
632%mem-4-prod-4-cons-100-rp.eps
633\begin{figure}
634\centering
635    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-rp} }
636    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-rp} }
637\caption{Memory benchmark results with 4 producers for rp memory allocator}
638\label{fig:mem-4-prod-4-cons-100-rp}
639\end{figure}
640
641%mem-4-prod-4-cons-100-tbb.eps
642\begin{figure}
643\centering
644    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-tbb} }
645    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-tbb} }
646\caption{Memory benchmark results with 4 producers for tbb memory allocator}
647\label{fig:mem-4-prod-4-cons-100-tbb}
648\end{figure}
649
650%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
651%% ANALYSIS
652%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Note: See TracBrowser for help on using the repository browser.