source: doc/theses/mubeen_zulfiqar_MMath/performance.tex @ 9939dc3

ADTast-experimentalpthread-emulationqualifiedEnum
Last change on this file since 9939dc3 was d8075d28, checked in by m3zulfiq <m3zulfiq@…>, 2 years ago

corrected benchmark configurations as per last run

  • Property mode set to 100644
File size: 30.0 KB
Line 
1\chapter{Performance}
2\label{c:Performance}
3
4This chapter uses the micro-benchmarks from \VRef[Chapter]{s:Benchmarks} to test a number of current memory allocators, including llheap.
5The goal is to see if llheap is competitive with the current best memory allocators.
6
7
8\section{Machine Specification}
9
10The performance experiments were run on two different multi-core architectures (x64 and ARM) to determine if there is consistency across platforms:
11\begin{itemize}
12\item
13\textbf{Nasus} AMD EPYC 7662, 64-core socket $\times$ 2, 2.0 GHz, GCC version 9.3.0
14\item
15\textbf{Algol} Huawei ARM TaiShan 2280 V2 Kunpeng 920, 24-core socket $\times$ 4, 2.6 GHz, GCC version 9.4.0
16\end{itemize}
17
18
19\section{Existing Memory Allocators}
20\label{sec:curAllocatorSec}
21
22With dynamic allocation being an important feature of C, there are many stand-alone memory allocators that have been designed for different purposes.
23For this thesis, 7 of the most popular and widely used memory allocators were selected for comparison, along with llheap.
24
25\paragraph{llheap (\textsf{llh})}
26is the thread-safe allocator from \VRef[Chapter]{c:Allocator}
27\\
28\textbf{Version:} 1.0
29\textbf{Configuration:} Compiled with dynamic linking, but without statistics or debugging.\\
30\textbf{Compilation command:} @make@
31
32\paragraph{glibc (\textsf{glc})}
33\cite{glibc} is the default gcc thread-safe allocator.
34\\
35\textbf{Version:} Ubuntu GLIBC 2.31-0ubuntu9.7 2.31\\
36\textbf{Configuration:} Compiled by Ubuntu 20.04.\\
37\textbf{Compilation command:} N/A
38
39\paragraph{dlmalloc (\textsf{dl})}
40\cite{dlmalloc} is a thread-safe allocator that is single threaded and single heap.
41It maintains free-lists of different sizes to store freed dynamic memory.
42\\
43\textbf{Version:} 2.8.6\\
44\textbf{Configuration:} Compiled with preprocessor @USE_LOCKS@.\\
45\textbf{Compilation command:} @gcc -g3 -O3 -Wall -Wextra -fno-builtin-malloc -fno-builtin-calloc@ @-fno-builtin-realloc -fno-builtin-free -fPIC -shared -DUSE_LOCKS -o libdlmalloc.so malloc-2.8.6.c@
46
47\paragraph{hoard (\textsf{hrd})}
48\cite{hoard} is a thread-safe allocator that is multi-threaded and using a heap layer framework. It has per-thread heaps that have thread-local free-lists, and a global shared heap.
49\\
50\textbf{Version:} 3.13\\
51\textbf{Configuration:} Compiled with hoard's default configurations and @Makefile@.\\
52\textbf{Compilation command:} @make all@
53
54\paragraph{jemalloc (\textsf{je})}
55\cite{jemalloc} is a thread-safe allocator that uses multiple arenas. Each thread is assigned an arena.
56Each arena has chunks that contain contagious memory regions of same size. An arena has multiple chunks that contain regions of multiple sizes.
57\\
58\textbf{Version:} 5.2.1\\
59\textbf{Configuration:} Compiled with jemalloc's default configurations and @Makefile@.\\
60\textbf{Compilation command:} @autogen.sh; configure; make; make install@
61
62\paragraph{ptmalloc3 (\textsf{pt3})}
63\cite{ptmalloc3} is a modification of dlmalloc.
64It is a thread-safe multi-threaded memory allocator that uses multiple heaps.
65ptmalloc3 heap has similar design to dlmalloc's heap.
66\\
67\textbf{Version:} 1.8\\
68\textbf{Configuration:} Compiled with ptmalloc3's @Makefile@ using option ``linux-shared''.\\
69\textbf{Compilation command:} @make linux-shared@
70
71\paragraph{rpmalloc (\textsf{rp})}
72\cite{rpmalloc} is a thread-safe allocator that is multi-threaded and uses per-thread heap.
73Each heap has multiple size-classes and each size-class contains memory regions of the relevant size.
74\\
75\textbf{Version:} 1.4.1\\
76\textbf{Configuration:} Compiled with rpmalloc's default configurations and ninja build system.\\
77\textbf{Compilation command:} @python3 configure.py; ninja@
78
79\paragraph{tbb malloc (\textsf{tbb})}
80\cite{tbbmalloc} is a thread-safe allocator that is multi-threaded and uses private heap for each thread.
81Each private-heap has multiple bins of different sizes. Each bin contains free regions of the same size.
82\\
83\textbf{Version:} intel tbb 2020 update 2, tbb\_interface\_version == 11102\\
84\textbf{Configuration:} Compiled with tbbmalloc's default configurations and @Makefile@.\\
85\textbf{Compilation command:} @make@
86
87% \section{Experiment Environment}
88% We used our micro benchmark suite (FIX ME: cite mbench) to evaluate these memory allocators \ref{sec:curAllocatorSec} and our own memory allocator uHeap \ref{sec:allocatorSec}.
89
90\section{Experiments}
91
92The each micro-benchmark is configured and run with each of the allocators,
93The less time an allocator takes to complete a benchmark the better, so lower in the graphs is better.
94All graphs use log scale on the Y-axis, except for the Memory micro-benchmark (see \VRef{s:MemoryMicroBenchmark}).
95
96%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
97%% CHURN
98%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
99
100\subsection{Churn Micro-Benchmark}
101
102Churn tests allocators for speed under intensive dynamic memory usage (see \VRef{s:ChurnBenchmark}).
103This experiment was run with following configurations:
104\begin{description}[itemsep=0pt,parsep=0pt]
105\item[thread:]
1061, 2, 4, 8, 16, 32, 48
107\item[spots:]
10816
109\item[obj:]
110100,000
111\item[max:]
112500
113\item[min:]
11450
115\item[step:]
11650
117\item[distro:]
118fisher
119\end{description}
120
121% -maxS          : 500
122% -minS          : 50
123% -stepS                 : 50
124% -distroS       : fisher
125% -objN          : 100000
126% -cSpots                : 16
127% -threadN       : 1, 2, 4, 8, 16
128
129\VRef[Figure]{fig:churn} shows the results for algol and nasus.
130The X-axis shows the number of threads;
131the Y-axis shows the total experiment time.
132Each allocator's performance for each thread is shown in different colors.
133
134\begin{figure}
135\centering
136    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/churn} }
137    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/churn} }
138\caption{Churn}
139\label{fig:churn}
140\end{figure}
141
142\paragraph{Assessment}
143All allocators did well in this micro-benchmark, except for \textsf{dl} on the ARM.
144\textsf{dl}'s is the slowest, indicating some small bottleneck with respect to the other allocators.
145\textsf{je} is the fastest, with only a small benefit over the other allocators.
146% llheap is slightly slower because it uses ownership, where many of the allocations have remote frees, which requires locking.
147% When llheap is compiled without ownership, its performance is the same as the other allocators (not shown).
148
149%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
150%% THRASH
151%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
152
153\subsection{Cache Thrash}
154\label{sec:cache-thrash-perf}
155
156Thrash tests memory allocators for active false sharing (see \VRef{sec:benchThrashSec}).
157This experiment was run with following configurations:
158\begin{description}[itemsep=0pt,parsep=0pt]
159\item[threads:]
1601, 2, 4, 8, 16, 32, 48
161\item[iterations:]
1621,000
163\item[cacheRW:]
1641,000,000
165\item[size:]
1661
167\end{description}
168
169% * Each allocator was tested for its performance across different number of threads.
170% Experiment was repeated for each allocator for 1, 2, 4, 8, and 16 threads by setting the configuration -threadN.
171
172\VRef[Figure]{fig:cacheThrash} shows the results for algol and nasus.
173The X-axis shows the number of threads;
174the Y-axis shows the total experiment time.
175Each allocator's performance for each thread is shown in different colors.
176
177\begin{figure}
178\centering
179    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/cache_thrash_0-thrash} }
180    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/cache_thrash_0-thrash} }
181\caption{Cache Thrash}
182\label{fig:cacheThrash}
183\end{figure}
184
185\paragraph{Assessment}
186All allocators did well in this micro-benchmark, except for \textsf{dl} and \textsf{pt3}.
187\textsf{dl} uses a single heap for all threads so it is understandable that it generates so much active false-sharing.
188Requests from different threads are dealt with sequentially by the single heap (using a single lock), which can allocate objects to different threads on the same cache line.
189\textsf{pt3} uses the T:H model, so multiple threads can use one heap, but the active false-sharing is less than \textsf{dl}.
190The rest of the memory allocators generate little or no active false-sharing.
191
192%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
193%% SCRATCH
194%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
195
196\subsection{Cache Scratch}
197
198Scratch tests memory allocators for program-induced allocator-preserved passive false-sharing (see \VRef{s:CacheScratch}).
199This experiment was run with following configurations:
200\begin{description}[itemsep=0pt,parsep=0pt]
201\item[threads:]
2021, 2, 4, 8, 16, 32, 48
203\item[iterations:]
2041,000
205\item[cacheRW:]
2061,000,000
207\item[size:]
2081
209\end{description}
210
211% * Each allocator was tested for its performance across different number of threads.
212% Experiment was repeated for each allocator for 1, 2, 4, 8, and 16 threads by setting the configuration -threadN.
213
214\VRef[Figure]{fig:cacheScratch} shows the results for algol and nasus.
215The X-axis shows the number of threads;
216the Y-axis shows the total experiment time.
217Each allocator's performance for each thread is shown in different colors.
218
219\begin{figure}
220\centering
221    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/cache_scratch_0-scratch} }
222    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/cache_scratch_0-scratch} }
223\caption{Cache Scratch}
224\label{fig:cacheScratch}
225\end{figure}
226
227\paragraph{Assessment}
228This micro-benchmark divides the allocators into two groups.
229First is the high-performer group: \textsf{llh}, \textsf{je}, and \textsf{rp}.
230These memory allocators generate little or no passive false-sharing and their performance difference is negligible.
231Second is the low-performer group, which includes the rest of the memory allocators.
232These memory allocators have significant program-induced passive false-sharing, where \textsf{hrd}'s is the worst performing allocator.
233All of the allocator's in this group are sharing heaps among threads at some level.
234
235Interestingly, allocators such as \textsf{hrd} and \textsf{glc} performed well in micro-benchmark cache thrash (see \VRef{sec:cache-thrash-perf}).
236But, these allocators are among the low performers in the cache scratch.
237It suggests these allocators do not actively produce false-sharing but preserve program-induced passive false sharing.
238
239%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
240%% SPEED
241%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
242
243\subsection{Speed Micro-Benchmark}
244
245Speed tests memory allocators for runtime latency (see \VRef{s:SpeedMicroBenchmark}).
246This experiment was run with following configurations:
247\begin{description}[itemsep=0pt,parsep=0pt]
248\item[max:]
249500
250\item[min:]
25150
252\item[step:]
25350
254\item[distro:]
255fisher
256\item[objects:]
257100,000
258\item[workers:]
2591, 2, 4, 8, 16, 32, 48
260\end{description}
261
262% -maxS    :  500
263% -minS    :  50
264% -stepS   :  50
265% -distroS :  fisher
266% -objN    :  1000000
267% -threadN    : \{ 1, 2, 4, 8, 16 \} *
268
269%* Each allocator was tested for its performance across different number of threads.
270%Experiment was repeated for each allocator for 1, 2, 4, 8, and 16 threads by setting the configuration -threadN.
271
272\VRefrange[Figures]{fig:speed-3-malloc}{fig:speed-14-malloc-calloc-realloc-free} show 12 figures, one figure for each chain of the speed benchmark.
273The X-axis shows the number of threads;
274the Y-axis shows the total experiment time.
275Each allocator's performance for each thread is shown in different colors.
276
277\begin{itemize}
278\item \VRef[Figure]{fig:speed-3-malloc} shows results for chain: malloc
279\item \VRef[Figure]{fig:speed-4-realloc} shows results for chain: realloc
280\item \VRef[Figure]{fig:speed-5-free} shows results for chain: free
281\item \VRef[Figure]{fig:speed-6-calloc} shows results for chain: calloc
282\item \VRef[Figure]{fig:speed-7-malloc-free} shows results for chain: malloc-free
283\item \VRef[Figure]{fig:speed-8-realloc-free} shows results for chain: realloc-free
284\item \VRef[Figure]{fig:speed-9-calloc-free} shows results for chain: calloc-free
285\item \VRef[Figure]{fig:speed-10-malloc-realloc} shows results for chain: malloc-realloc
286\item \VRef[Figure]{fig:speed-11-calloc-realloc} shows results for chain: calloc-realloc
287\item \VRef[Figure]{fig:speed-12-malloc-realloc-free} shows results for chain: malloc-realloc-free
288\item \VRef[Figure]{fig:speed-13-calloc-realloc-free} shows results for chain: calloc-realloc-free
289\item \VRef[Figure]{fig:speed-14-malloc-calloc-realloc-free} shows results for chain: malloc-realloc-free-calloc
290\end{itemize}
291
292\paragraph{Assessment}
293This micro-benchmark divides the allocators into two groups: with and without @calloc@.
294@calloc@ uses @memset@ to set the allocated memory to zero, which dominates the cost of the allocation chain (large performance increase) and levels performance across the allocators.
295But the difference among the allocators in a @calloc@ chain still gives an idea of their relative performance.
296
297All allocators did well in this micro-benchmark across all allocation chains, except for \textsf{dl}, \textsf{pt3}, and \textsf{hrd}.
298Again, the low-performing allocators are sharing heaps among threads, so the contention causes performance increases with increasing numbers of threads.
299Furthermore, chains with @free@ can trigger coalescing, which slows the fast path.
300The high-performing allocators all illustrate low latency across the allocation chains, \ie there are no performance spikes as the chain lengths, that might be caused by contention and/or coalescing.
301Low latency is important for applications that are sensitive to unknown execution delays.
302
303%speed-3-malloc.eps
304\begin{figure}
305\centering
306    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/speed-3-malloc} }
307    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/speed-3-malloc} }
308\caption{Speed benchmark chain: malloc}
309\label{fig:speed-3-malloc}
310\end{figure}
311
312%speed-4-realloc.eps
313\begin{figure}
314\centering
315    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/speed-4-realloc} }
316    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/speed-4-realloc} }
317\caption{Speed benchmark chain: realloc}
318\label{fig:speed-4-realloc}
319\end{figure}
320
321%speed-5-free.eps
322\begin{figure}
323\centering
324    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/speed-5-free} }
325    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/speed-5-free} }
326\caption{Speed benchmark chain: free}
327\label{fig:speed-5-free}
328\end{figure}
329
330%speed-6-calloc.eps
331\begin{figure}
332\centering
333    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/speed-6-calloc} }
334    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/speed-6-calloc} }
335\caption{Speed benchmark chain: calloc}
336\label{fig:speed-6-calloc}
337\end{figure}
338
339%speed-7-malloc-free.eps
340\begin{figure}
341\centering
342    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/speed-7-malloc-free} }
343    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/speed-7-malloc-free} }
344\caption{Speed benchmark chain: malloc-free}
345\label{fig:speed-7-malloc-free}
346\end{figure}
347
348%speed-8-realloc-free.eps
349\begin{figure}
350\centering
351    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/speed-8-realloc-free} }
352    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/speed-8-realloc-free} }
353\caption{Speed benchmark chain: realloc-free}
354\label{fig:speed-8-realloc-free}
355\end{figure}
356
357%speed-9-calloc-free.eps
358\begin{figure}
359\centering
360    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/speed-9-calloc-free} }
361    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/speed-9-calloc-free} }
362\caption{Speed benchmark chain: calloc-free}
363\label{fig:speed-9-calloc-free}
364\end{figure}
365
366%speed-10-malloc-realloc.eps
367\begin{figure}
368\centering
369    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/speed-10-malloc-realloc} }
370    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/speed-10-malloc-realloc} }
371\caption{Speed benchmark chain: malloc-realloc}
372\label{fig:speed-10-malloc-realloc}
373\end{figure}
374
375%speed-11-calloc-realloc.eps
376\begin{figure}
377\centering
378    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/speed-11-calloc-realloc} }
379    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/speed-11-calloc-realloc} }
380\caption{Speed benchmark chain: calloc-realloc}
381\label{fig:speed-11-calloc-realloc}
382\end{figure}
383
384%speed-12-malloc-realloc-free.eps
385\begin{figure}
386\centering
387    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/speed-12-malloc-realloc-free} }
388    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/speed-12-malloc-realloc-free} }
389\caption{Speed benchmark chain: malloc-realloc-free}
390\label{fig:speed-12-malloc-realloc-free}
391\end{figure}
392
393%speed-13-calloc-realloc-free.eps
394\begin{figure}
395\centering
396    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/speed-13-calloc-realloc-free} }
397    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/speed-13-calloc-realloc-free} }
398\caption{Speed benchmark chain: calloc-realloc-free}
399\label{fig:speed-13-calloc-realloc-free}
400\end{figure}
401
402%speed-14-{m,c,re}alloc-free.eps
403\begin{figure}
404\centering
405    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/speed-14-m-c-re-alloc-free} }
406    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/speed-14-m-c-re-alloc-free} }
407\caption{Speed benchmark chain: malloc-calloc-realloc-free}
408\label{fig:speed-14-malloc-calloc-realloc-free}
409\end{figure}
410
411%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
412%% MEMORY
413%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
414
415\newpage
416\subsection{Memory Micro-Benchmark}
417\label{s:MemoryMicroBenchmark}
418
419This experiment is run with the following two configurations for each allocator.
420The difference between the two configurations is the number of producers and consumers.
421Configuration 1 has one producer and one consumer, and configuration 2 has 4 producers, where each producer has 4 consumers.
422
423\noindent
424Configuration 1:
425\begin{description}[itemsep=0pt,parsep=0pt]
426\item[producer (K):]
4271
428\item[consumer (M):]
4291
430\item[round:]
431100,000
432\item[max:]
433500
434\item[min:]
43550
436\item[step:]
43750
438\item[distro:]
439fisher
440\item[objects (N):]
441100,000
442\end{description}
443
444% -threadA :  1
445% -threadF :  1
446% -maxS    :  500
447% -minS    :  50
448% -stepS   :  50
449% -distroS :  fisher
450% -objN    :  100000
451% -consumeS:  100000
452
453\noindent
454Configuration 2:
455\begin{description}[itemsep=0pt,parsep=0pt]
456\item[producer (K):]
4574
458\item[consumer (M):]
4594
460\item[round:]
461100,000
462\item[max:]
463500
464\item[min:]
46550
466\item[step:]
46750
468\item[distro:]
469fisher
470\item[objects (N):]
471100,000
472\end{description}
473
474% -threadA :  4
475% -threadF :  4
476% -maxS    :  500
477% -minS    :  50
478% -stepS   :  50
479% -distroS :  fisher
480% -objN    :  100000
481% -consumeS:  100000
482
483% \begin{table}[b]
484% \centering
485%     \begin{tabular}{ |c|c|c| }
486%      \hline
487%     Memory Allocator & Configuration 1 Result & Configuration 2 Result\\
488%      \hline
489%     llh & \VRef[Figure]{fig:mem-1-prod-1-cons-100-llh} & \VRef[Figure]{fig:mem-4-prod-4-cons-100-llh}\\
490%      \hline
491%     dl & \VRef[Figure]{fig:mem-1-prod-1-cons-100-dl} & \VRef[Figure]{fig:mem-4-prod-4-cons-100-dl}\\
492%      \hline
493%     glibc & \VRef[Figure]{fig:mem-1-prod-1-cons-100-glc} & \VRef[Figure]{fig:mem-4-prod-4-cons-100-glc}\\
494%      \hline
495%     hoard & \VRef[Figure]{fig:mem-1-prod-1-cons-100-hrd} & \VRef[Figure]{fig:mem-4-prod-4-cons-100-hrd}\\
496%      \hline
497%     je & \VRef[Figure]{fig:mem-1-prod-1-cons-100-je} & \VRef[Figure]{fig:mem-4-prod-4-cons-100-je}\\
498%      \hline
499%     pt3 & \VRef[Figure]{fig:mem-1-prod-1-cons-100-pt3} & \VRef[Figure]{fig:mem-4-prod-4-cons-100-pt3}\\
500%      \hline
501%     rp & \VRef[Figure]{fig:mem-1-prod-1-cons-100-rp} & \VRef[Figure]{fig:mem-4-prod-4-cons-100-rp}\\
502%      \hline
503%     tbb & \VRef[Figure]{fig:mem-1-prod-1-cons-100-tbb} & \VRef[Figure]{fig:mem-4-prod-4-cons-100-tbb}\\
504%      \hline
505%     \end{tabular}
506% \caption{Memory benchmark results}
507% \label{table:mem-benchmark-figs}
508% \end{table}
509% Table \ref{table:mem-benchmark-figs} shows the list of figures that contain memory benchmark results.
510
511\VRefrange[Figures]{fig:mem-1-prod-1-cons-100-llh}{fig:mem-4-prod-4-cons-100-tbb} show 16 figures, two figures for each of the 8 allocators, one for each configuration.
512Each figure has 2 graphs, one for each experiment environment.
513Each graph has following 5 subgraphs that show memory usage and statistics throughout the micro-benchmark's lifetime.
514\begin{itemize}
515\item \textit{\textbf{current\_req\_mem(B)}} shows the amount of dynamic memory requested and currently in-use of the benchmark.
516\item \textit{\textbf{heap}}* shows the memory requested by the program (allocator) from the system that lies in the heap (@sbrk@) area.
517\item \textit{\textbf{mmap\_so}}* shows the memory requested by the program (allocator) from the system that lies in the @mmap@ area.
518\item \textit{\textbf{mmap}}* shows the memory requested by the program (allocator or shared libraries) from the system that lies in the @mmap@ area.
519\item \textit{\textbf{total\_dynamic}} shows the total usage of dynamic memory by the benchmark program, which is a sum of \textit{heap}, \textit{mmap}, and \textit{mmap\_so}.
520\end{itemize}
521* These statistics are gathered by monitoring a process's @/proc/self/maps@ file.
522
523The X-axis shows the time when the memory information is polled.
524The Y-axis shows the memory usage in bytes.
525
526For this experiment, the difference between the memory requested by the benchmark (\textit{current\_req\_mem(B)}) and the memory that the process has received from system (\textit{heap}, \textit{mmap}) should be minimum.
527This difference is the memory overhead caused by the allocator and shows the level of fragmentation in the allocator.
528
529\paragraph{Assessment}
530First, the differences in the shape of the curves between architectures (top ARM, bottom x64) is small, where the differences are in the amount of memory used.
531Hence, it is possible to focus on either the top or bottom graph.
532
533Second, the heap curve is 0 for four memory allocators: \textsf{hrd}, \textsf{je}, \textsf{pt3}, and \textsf{rp}, indicating these memory allocators only use @mmap@ to get memory from the system and ignore the @sbrk@ area.
534
535The total dynamic memory is higher for \textsf{hrd} and \textsf{tbb} than the other allocators.
536The main reason is the use of superblocks (see \VRef{s:ObjectContainers}) containing objects of the same size.
537These superblocks are maintained throughout the life of the program.
538
539\textsf{pt3} is the only memory allocator where the total dynamic memory goes down in the second half of the program lifetime when the memory is freed by the benchmark program.
540It makes pt3 the only memory allocator that gives memory back to the operating system as it is freed by the program.
541
542% FOR 1 THREAD
543
544%mem-1-prod-1-cons-100-llh.eps
545\begin{figure}
546\centering
547    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-llh} }
548    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-llh} }
549\caption{Memory benchmark results with Configuration-1 for llh memory allocator}
550\label{fig:mem-1-prod-1-cons-100-llh}
551\end{figure}
552
553%mem-1-prod-1-cons-100-dl.eps
554\begin{figure}
555\centering
556    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-dl} }
557    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-dl} }
558\caption{Memory benchmark results with Configuration-1 for dl memory allocator}
559\label{fig:mem-1-prod-1-cons-100-dl}
560\end{figure}
561
562%mem-1-prod-1-cons-100-glc.eps
563\begin{figure}
564\centering
565    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-glc} }
566    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-glc} }
567\caption{Memory benchmark results with Configuration-1 for glibc memory allocator}
568\label{fig:mem-1-prod-1-cons-100-glc}
569\end{figure}
570
571%mem-1-prod-1-cons-100-hrd.eps
572\begin{figure}
573\centering
574    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-hrd} }
575    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-hrd} }
576\caption{Memory benchmark results with Configuration-1 for hoard memory allocator}
577\label{fig:mem-1-prod-1-cons-100-hrd}
578\end{figure}
579
580%mem-1-prod-1-cons-100-je.eps
581\begin{figure}
582\centering
583    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-je} }
584    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-je} }
585\caption{Memory benchmark results with Configuration-1 for je memory allocator}
586\label{fig:mem-1-prod-1-cons-100-je}
587\end{figure}
588
589%mem-1-prod-1-cons-100-pt3.eps
590\begin{figure}
591\centering
592    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-pt3} }
593    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-pt3} }
594\caption{Memory benchmark results with Configuration-1 for pt3 memory allocator}
595\label{fig:mem-1-prod-1-cons-100-pt3}
596\end{figure}
597
598%mem-1-prod-1-cons-100-rp.eps
599\begin{figure}
600\centering
601    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-rp} }
602    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-rp} }
603\caption{Memory benchmark results with Configuration-1 for rp memory allocator}
604\label{fig:mem-1-prod-1-cons-100-rp}
605\end{figure}
606
607%mem-1-prod-1-cons-100-tbb.eps
608\begin{figure}
609\centering
610    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-tbb} }
611    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-tbb} }
612\caption{Memory benchmark results with Configuration-1 for tbb memory allocator}
613\label{fig:mem-1-prod-1-cons-100-tbb}
614\end{figure}
615
616% FOR 4 THREADS
617
618%mem-4-prod-4-cons-100-llh.eps
619\begin{figure}
620\centering
621    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-llh} }
622    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-llh} }
623\caption{Memory benchmark results with Configuration-2 for llh memory allocator}
624\label{fig:mem-4-prod-4-cons-100-llh}
625\end{figure}
626
627%mem-4-prod-4-cons-100-dl.eps
628\begin{figure}
629\centering
630    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-dl} }
631    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-dl} }
632\caption{Memory benchmark results with Configuration-2 for dl memory allocator}
633\label{fig:mem-4-prod-4-cons-100-dl}
634\end{figure}
635
636%mem-4-prod-4-cons-100-glc.eps
637\begin{figure}
638\centering
639    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-glc} }
640    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-glc} }
641\caption{Memory benchmark results with Configuration-2 for glibc memory allocator}
642\label{fig:mem-4-prod-4-cons-100-glc}
643\end{figure}
644
645%mem-4-prod-4-cons-100-hrd.eps
646\begin{figure}
647\centering
648    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-hrd} }
649    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-hrd} }
650\caption{Memory benchmark results with Configuration-2 for hoard memory allocator}
651\label{fig:mem-4-prod-4-cons-100-hrd}
652\end{figure}
653
654%mem-4-prod-4-cons-100-je.eps
655\begin{figure}
656\centering
657    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-je} }
658    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-je} }
659\caption{Memory benchmark results with Configuration-2 for je memory allocator}
660\label{fig:mem-4-prod-4-cons-100-je}
661\end{figure}
662
663%mem-4-prod-4-cons-100-pt3.eps
664\begin{figure}
665\centering
666    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-pt3} }
667    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-pt3} }
668\caption{Memory benchmark results with Configuration-2 for pt3 memory allocator}
669\label{fig:mem-4-prod-4-cons-100-pt3}
670\end{figure}
671
672%mem-4-prod-4-cons-100-rp.eps
673\begin{figure}
674\centering
675    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-rp} }
676    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-rp} }
677\caption{Memory benchmark results with Configuration-2 for rp memory allocator}
678\label{fig:mem-4-prod-4-cons-100-rp}
679\end{figure}
680
681%mem-4-prod-4-cons-100-tbb.eps
682\begin{figure}
683\centering
684    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-tbb} }
685    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-tbb} }
686\caption{Memory benchmark results with Configuration-2 for tbb memory allocator}
687\label{fig:mem-4-prod-4-cons-100-tbb}
688\end{figure}
Note: See TracBrowser for help on using the repository browser.