Context Navigation

source: doc/theses/mubeen_zulfiqar_MMath/performance.tex @ ccb29b4

ADTast-experimental

Last change on this file since ccb29b4 was fb6691a, checked in by Peter A. Buhr <pabuhr@…>, 2 years ago
final proofread of Mubeen's MMath thesis
Property mode set to `100644`
File size: 30.1 KB

Line
1	\chapter{Performance}
2	\label{c:Performance}
3
4	This chapter uses the micro-benchmarks from \VRef[Chapter]{s:Benchmarks} to test a number of current memory allocators, including llheap.
5	The goal is to see if llheap is competitive with the currently popular memory allocators.
6
7
8	\section{Machine Specification}
9
10	The performance experiments were run on two different multi-core architectures (x64 and ARM) to determine if there is consistency across platforms:
11	\begin{itemize}
12	\item
13	\textbf{Algol} Huawei ARM TaiShan 2280 V2 Kunpeng 920, 24-core socket $\times$ 4, 2.6 GHz, GCC version 9.4.0
14	\item
15	\textbf{Nasus} AMD EPYC 7662, 64-core socket $\times$ 2, 2.0 GHz, GCC version 9.3.0
16	\end{itemize}
17
18
19	\section{Existing Memory Allocators}
20	\label{sec:curAllocatorSec}
21
22	With dynamic allocation being an important feature of C, there are many stand-alone memory allocators that have been designed for different purposes.
23	For this thesis, 7 of the most popular and widely used memory allocators were selected for comparison, along with llheap.
24
25	\paragraph{llheap (\textsf{llh})}
26	is the thread-safe allocator from \VRef[Chapter]{c:Allocator}
27	\\
28	\textbf{Version:} 1.0
29	\textbf{Configuration:} Compiled with dynamic linking, but without statistics or debugging.\\
30	\textbf{Compilation command:} @make@
31
32	\paragraph{glibc (\textsf{glc})}
33	\cite{glibc} is the default glibc thread-safe allocator.
34	\\
35	\textbf{Version:} Ubuntu GLIBC 2.31-0ubuntu9.7 2.31\\
36	\textbf{Configuration:} Compiled by Ubuntu 20.04.\\
37	\textbf{Compilation command:} N/A
38
39	\paragraph{dlmalloc (\textsf{dl})}
40	\cite{dlmalloc} is a thread-safe allocator that is single threaded and single heap.
41	It maintains free-lists of different sizes to store freed dynamic memory.
42	\\
43	\textbf{Version:} 2.8.6\\
44	\textbf{Configuration:} Compiled with preprocessor @USE_LOCKS@.\\
45	\textbf{Compilation command:} @gcc -g3 -O3 -Wall -Wextra -fno-builtin-malloc -fno-builtin-calloc@ @-fno-builtin-realloc -fno-builtin-free -fPIC -shared -DUSE_LOCKS -o libdlmalloc.so malloc-2.8.6.c@
46
47	\paragraph{hoard (\textsf{hrd})}
48	\cite{hoard} is a thread-safe allocator that is multi-threaded and uses a heap layer framework. It has per-thread heaps that have thread-local free-lists, and a global shared heap.
49	\\
50	\textbf{Version:} 3.13\\
51	\textbf{Configuration:} Compiled with hoard's default configurations and @Makefile@.\\
52	\textbf{Compilation command:} @make all@
53
54	\paragraph{jemalloc (\textsf{je})}
55	\cite{jemalloc} is a thread-safe allocator that uses multiple arenas. Each thread is assigned an arena.
56	Each arena has chunks that contain contagious memory regions of same size. An arena has multiple chunks that contain regions of multiple sizes.
57	\\
58	\textbf{Version:} 5.2.1\\
59	\textbf{Configuration:} Compiled with jemalloc's default configurations and @Makefile@.\\
60	\textbf{Compilation command:} @autogen.sh; configure; make; make install@
61
62	\paragraph{ptmalloc3 (\textsf{pt3})}
63	\cite{ptmalloc3} is a modification of dlmalloc.
64	It is a thread-safe multi-threaded memory allocator that uses multiple heaps.
65	ptmalloc3 heap has similar design to dlmalloc's heap.
66	\\
67	\textbf{Version:} 1.8\\
68	\textbf{Configuration:} Compiled with ptmalloc3's @Makefile@ using option ``linux-shared''.\\
69	\textbf{Compilation command:} @make linux-shared@
70
71	\paragraph{rpmalloc (\textsf{rp})}
72	\cite{rpmalloc} is a thread-safe allocator that is multi-threaded and uses per-thread heap.
73	Each heap has multiple size-classes and each size-class contains memory regions of the relevant size.
74	\\
75	\textbf{Version:} 1.4.1\\
76	\textbf{Configuration:} Compiled with rpmalloc's default configurations and ninja build system.\\
77	\textbf{Compilation command:} @python3 configure.py; ninja@
78
79	\paragraph{tbb malloc (\textsf{tbb})}
80	\cite{tbbmalloc} is a thread-safe allocator that is multi-threaded and uses a private heap for each thread.
81	Each private-heap has multiple bins of different sizes. Each bin contains free regions of the same size.
82	\\
83	\textbf{Version:} intel tbb 2020 update 2, tbb\_interface\_version == 11102\\
84	\textbf{Configuration:} Compiled with tbbmalloc's default configurations and @Makefile@.\\
85	\textbf{Compilation command:} @make@
86
87	% \section{Experiment Environment}
88	% We used our micro benchmark suite (FIX ME: cite mbench) to evaluate these memory allocators \ref{sec:curAllocatorSec} and our own memory allocator uHeap \ref{sec:allocatorSec}.
89
90	\section{Experiments}
91
92	Each micro-benchmark is configured and run with each of the allocators,
93	The less time an allocator takes to complete a benchmark the better so lower in the graphs is better, except for the Memory micro-benchmark graphs.
94	All graphs use log scale on the Y-axis, except for the Memory micro-benchmark (see \VRef{s:MemoryMicroBenchmark}).
95
96	%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
97	%% CHURN
98	%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
99
100	\subsection{Churn Micro-Benchmark}
101
102	Churn tests allocators for speed under intensive dynamic memory usage (see \VRef{s:ChurnBenchmark}).
103	This experiment was run with following configurations:
104	\begin{description}[itemsep=0pt,parsep=0pt]
105	\item[thread:]
106	1, 2, 4, 8, 16, 32, 48
107	\item[spots:]
108	16
109	\item[obj:]
110	100,000
111	\item[max:]
112	500
113	\item[min:]
114	50
115	\item[step:]
116	50
117	\item[distro:]
118	fisher
119	\end{description}
120
121	% -maxS : 500
122	% -minS : 50
123	% -stepS : 50
124	% -distroS : fisher
125	% -objN : 100000
126	% -cSpots : 16
127	% -threadN : 1, 2, 4, 8, 16
128
129	\VRef[Figure]{fig:churn} shows the results for algol and nasus.
130	The X-axis shows the number of threads;
131	the Y-axis shows the total experiment time.
132	Each allocator's performance for each thread is shown in different colors.
133
134	\begin{figure}
135	\centering
136	\subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/churn} }
137	\subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/churn} }
138	\caption{Churn}
139	\label{fig:churn}
140	\end{figure}
141
142	\paragraph{Assessment}
143	All allocators did well in this micro-benchmark, except for \textsf{dl} on the ARM.
144	\textsf{dl}'s is the slowest, indicating some small bottleneck with respect to the other allocators.
145	\textsf{je} is the fastest, with only a small benefit over the other allocators.
146	% llheap is slightly slower because it uses ownership, where many of the allocations have remote frees, which requires locking.
147	% When llheap is compiled without ownership, its performance is the same as the other allocators (not shown).
148
149	%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
150	%% THRASH
151	%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
152
153	\subsection{Cache Thrash}
154	\label{sec:cache-thrash-perf}
155
156	Thrash tests memory allocators for active false sharing (see \VRef{sec:benchThrashSec}).
157	This experiment was run with following configurations:
158	\begin{description}[itemsep=0pt,parsep=0pt]
159	\item[threads:]
160	1, 2, 4, 8, 16, 32, 48
161	\item[iterations:]
162	1,000
163	\item[cacheRW:]
164	1,000,000
165	\item[size:]
166	1
167	\end{description}
168
169	% * Each allocator was tested for its performance across different number of threads.
170	% Experiment was repeated for each allocator for 1, 2, 4, 8, and 16 threads by setting the configuration -threadN.
171
172	\VRef[Figure]{fig:cacheThrash} shows the results for algol and nasus.
173	The X-axis shows the number of threads;
174	the Y-axis shows the total experiment time.
175	Each allocator's performance for each thread is shown in different colors.
176
177	\begin{figure}
178	\centering
179	\subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/cache_thrash_0-thrash} }
180	\subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/cache_thrash_0-thrash} }
181	\caption{Cache Thrash}
182	\label{fig:cacheThrash}
183	\end{figure}
184
185	\paragraph{Assessment}
186	All allocators did well in this micro-benchmark, except for \textsf{dl} and \textsf{pt3}.
187	\textsf{dl} uses a single heap for all threads so it is understandable that it generates so much active false-sharing.
188	Requests from different threads are dealt with sequentially by the single heap (using a single lock), which can allocate objects to different threads on the same cache line.
189	\textsf{pt3} uses the T:H model, so multiple threads can use one heap, but the active false-sharing is less than \textsf{dl}.
190	The rest of the memory allocators generate little or no active false-sharing.
191
192	%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
193	%% SCRATCH
194	%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
195
196	\subsection{Cache Scratch}
197
198	Scratch tests memory allocators for program-induced allocator-preserved passive false-sharing (see \VRef{s:CacheScratch}).
199	This experiment was run with following configurations:
200	\begin{description}[itemsep=0pt,parsep=0pt]
201	\item[threads:]
202	1, 2, 4, 8, 16, 32, 48
203	\item[iterations:]
204	1,000
205	\item[cacheRW:]
206	1,000,000
207	\item[size:]
208	1
209	\end{description}
210
211	% * Each allocator was tested for its performance across different number of threads.
212	% Experiment was repeated for each allocator for 1, 2, 4, 8, and 16 threads by setting the configuration -threadN.
213
214	\VRef[Figure]{fig:cacheScratch} shows the results for algol and nasus.
215	The X-axis shows the number of threads;
216	the Y-axis shows the total experiment time.
217	Each allocator's performance for each thread is shown in different colors.
218
219	\begin{figure}
220	\centering
221	\subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/cache_scratch_0-scratch} }
222	\subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/cache_scratch_0-scratch} }
223	\caption{Cache Scratch}
224	\label{fig:cacheScratch}
225	\end{figure}
226
227	\paragraph{Assessment}
228	This micro-benchmark divides the allocators into two groups.
229	First is the high-performer group: \textsf{llh}, \textsf{je}, and \textsf{rp}.
230	These memory allocators generate little or no passive false-sharing and their performance difference is negligible.
231	Second is the low-performer group, which includes the rest of the memory allocators.
232	These memory allocators have significant program-induced passive false-sharing, where \textsf{hrd}'s is the worst performing allocator.
233	All of the allocators in this group are sharing heaps among threads at some level.
234
235	Interestingly, allocators such as \textsf{hrd} and \textsf{glc} performed well in micro-benchmark cache thrash (see \VRef{sec:cache-thrash-perf}), but, these allocators are among the low performers in the cache scratch.
236	It suggests these allocators do not actively produce false-sharing, but preserve program-induced passive false sharing.
237
238	%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
239	%% SPEED
240	%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
241
242	\subsection{Speed Micro-Benchmark}
243
244	Speed tests memory allocators for runtime latency (see \VRef{s:SpeedMicroBenchmark}).
245	This experiment was run with following configurations:
246	\begin{description}[itemsep=0pt,parsep=0pt]
247	\item[max:]
248	500
249	\item[min:]
250	50
251	\item[step:]
252	50
253	\item[distro:]
254	fisher
255	\item[objects:]
256	100,000
257	\item[workers:]
258	1, 2, 4, 8, 16, 32, 48
259	\end{description}
260
261	% -maxS : 500
262	% -minS : 50
263	% -stepS : 50
264	% -distroS : fisher
265	% -objN : 1000000
266	% -threadN : \{ 1, 2, 4, 8, 16 \} *
267
268	%* Each allocator was tested for its performance across different number of threads.
269	%Experiment was repeated for each allocator for 1, 2, 4, 8, and 16 threads by setting the configuration -threadN.
270
271	\VRefrange[Figures]{fig:speed-3-malloc}{fig:speed-14-malloc-calloc-realloc-free} show 12 figures, one figure for each chain of the speed benchmark.
272	The X-axis shows the number of threads;
273	the Y-axis shows the total experiment time.
274	Each allocator's performance for each thread is shown in different colors.
275
276	\begin{itemize}
277	\item \VRef[Figure]{fig:speed-3-malloc} shows results for chain: malloc
278	\item \VRef[Figure]{fig:speed-4-realloc} shows results for chain: realloc
279	\item \VRef[Figure]{fig:speed-5-free} shows results for chain: free
280	\item \VRef[Figure]{fig:speed-6-calloc} shows results for chain: calloc
281	\item \VRef[Figure]{fig:speed-7-malloc-free} shows results for chain: malloc-free
282	\item \VRef[Figure]{fig:speed-8-realloc-free} shows results for chain: realloc-free
283	\item \VRef[Figure]{fig:speed-9-calloc-free} shows results for chain: calloc-free
284	\item \VRef[Figure]{fig:speed-10-malloc-realloc} shows results for chain: malloc-realloc
285	\item \VRef[Figure]{fig:speed-11-calloc-realloc} shows results for chain: calloc-realloc
286	\item \VRef[Figure]{fig:speed-12-malloc-realloc-free} shows results for chain: malloc-realloc-free
287	\item \VRef[Figure]{fig:speed-13-calloc-realloc-free} shows results for chain: calloc-realloc-free
288	\item \VRef[Figure]{fig:speed-14-malloc-calloc-realloc-free} shows results for chain: malloc-realloc-free-calloc
289	\end{itemize}
290
291	\paragraph{Assessment}
292	This micro-benchmark divides the allocators into two groups: with and without @calloc@.
293	@calloc@ uses @memset@ to set the allocated memory to zero, which dominates the cost of the allocation chain (large performance increase) and levels performance across the allocators.
294	But the difference among the allocators in a @calloc@ chain still gives an idea of their relative performance.
295
296	All allocators did well in this micro-benchmark across all allocation chains, except for \textsf{dl}, \textsf{pt3}, and \textsf{hrd}.
297	Again, the low-performing allocators are sharing heaps among threads, so the contention causes performance increases with increasing numbers of threads.
298	Furthermore, chains with @free@ can trigger coalescing, which slows the fast path.
299	The high-performing allocators all illustrate low latency across the allocation chains, \ie there are no performance spikes as the chain lengths, that might be caused by contention and/or coalescing.
300	Low latency is important for applications that are sensitive to unknown execution delays.
301
302	%speed-3-malloc.eps
303	\begin{figure}
304	\centering
305	\subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/speed-3-malloc} }
306	\subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/speed-3-malloc} }
307	\caption{Speed benchmark chain: malloc}
308	\label{fig:speed-3-malloc}
309	\end{figure}
310
311	%speed-4-realloc.eps
312	\begin{figure}
313	\centering
314	\subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/speed-4-realloc} }
315	\subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/speed-4-realloc} }
316	\caption{Speed benchmark chain: realloc}
317	\label{fig:speed-4-realloc}
318	\end{figure}
319
320	%speed-5-free.eps
321	\begin{figure}
322	\centering
323	\subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/speed-5-free} }
324	\subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/speed-5-free} }
325	\caption{Speed benchmark chain: free}
326	\label{fig:speed-5-free}
327	\end{figure}
328
329	%speed-6-calloc.eps
330	\begin{figure}
331	\centering
332	\subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/speed-6-calloc} }
333	\subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/speed-6-calloc} }
334	\caption{Speed benchmark chain: calloc}
335	\label{fig:speed-6-calloc}
336	\end{figure}
337
338	%speed-7-malloc-free.eps
339	\begin{figure}
340	\centering
341	\subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/speed-7-malloc-free} }
342	\subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/speed-7-malloc-free} }
343	\caption{Speed benchmark chain: malloc-free}
344	\label{fig:speed-7-malloc-free}
345	\end{figure}
346
347	%speed-8-realloc-free.eps
348	\begin{figure}
349	\centering
350	\subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/speed-8-realloc-free} }
351	\subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/speed-8-realloc-free} }
352	\caption{Speed benchmark chain: realloc-free}
353	\label{fig:speed-8-realloc-free}
354	\end{figure}
355
356	%speed-9-calloc-free.eps
357	\begin{figure}
358	\centering
359	\subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/speed-9-calloc-free} }
360	\subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/speed-9-calloc-free} }
361	\caption{Speed benchmark chain: calloc-free}
362	\label{fig:speed-9-calloc-free}
363	\end{figure}
364
365	%speed-10-malloc-realloc.eps
366	\begin{figure}
367	\centering
368	\subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/speed-10-malloc-realloc} }
369	\subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/speed-10-malloc-realloc} }
370	\caption{Speed benchmark chain: malloc-realloc}
371	\label{fig:speed-10-malloc-realloc}
372	\end{figure}
373
374	%speed-11-calloc-realloc.eps
375	\begin{figure}
376	\centering
377	\subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/speed-11-calloc-realloc} }
378	\subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/speed-11-calloc-realloc} }
379	\caption{Speed benchmark chain: calloc-realloc}
380	\label{fig:speed-11-calloc-realloc}
381	\end{figure}
382
383	%speed-12-malloc-realloc-free.eps
384	\begin{figure}
385	\centering
386	\subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/speed-12-malloc-realloc-free} }
387	\subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/speed-12-malloc-realloc-free} }
388	\caption{Speed benchmark chain: malloc-realloc-free}
389	\label{fig:speed-12-malloc-realloc-free}
390	\end{figure}
391
392	%speed-13-calloc-realloc-free.eps
393	\begin{figure}
394	\centering
395	\subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/speed-13-calloc-realloc-free} }
396	\subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/speed-13-calloc-realloc-free} }
397	\caption{Speed benchmark chain: calloc-realloc-free}
398	\label{fig:speed-13-calloc-realloc-free}
399	\end{figure}
400
401	%speed-14-{m,c,re}alloc-free.eps
402	\begin{figure}
403	\centering
404	\subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/speed-14-m-c-re-alloc-free} }
405	\subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/speed-14-m-c-re-alloc-free} }
406	\caption{Speed benchmark chain: malloc-calloc-realloc-free}
407	\label{fig:speed-14-malloc-calloc-realloc-free}
408	\end{figure}
409
410	%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
411	%% MEMORY
412	%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
413
414	\newpage
415	\subsection{Memory Micro-Benchmark}
416	\label{s:MemoryMicroBenchmark}
417
418	This experiment is run with the following two configurations for each allocator.
419	The difference between the two configurations is the number of producers and consumers.
420	Configuration 1 has one producer and one consumer, and configuration 2 has 4 producers, where each producer has 4 consumers.
421
422	\noindent
423	Configuration 1:
424	\begin{description}[itemsep=0pt,parsep=0pt]
425	\item[producer (K):]
426	1
427	\item[consumer (M):]
428	1
429	\item[round:]
430	100,000
431	\item[max:]
432	500
433	\item[min:]
434	50
435	\item[step:]
436	50
437	\item[distro:]
438	fisher
439	\item[objects (N):]
440	100,000
441	\end{description}
442
443	% -threadA : 1
444	% -threadF : 1
445	% -maxS : 500
446	% -minS : 50
447	% -stepS : 50
448	% -distroS : fisher
449	% -objN : 100000
450	% -consumeS: 100000
451
452	\noindent
453	Configuration 2:
454	\begin{description}[itemsep=0pt,parsep=0pt]
455	\item[producer (K):]
456	4
457	\item[consumer (M):]
458	4
459	\item[round:]
460	100,000
461	\item[max:]
462	500
463	\item[min:]
464	50
465	\item[step:]
466	50
467	\item[distro:]
468	fisher
469	\item[objects (N):]
470	100,000
471	\end{description}
472
473	% -threadA : 4
474	% -threadF : 4
475	% -maxS : 500
476	% -minS : 50
477	% -stepS : 50
478	% -distroS : fisher
479	% -objN : 100000
480	% -consumeS: 100000
481
482	% \begin{table}[b]
483	% \centering
484	% \begin{tabular}{ \|c\|c\|c\| }
485	% \hline
486	% Memory Allocator & Configuration 1 Result & Configuration 2 Result\\
487	% \hline
488	% llh & \VRef[Figure]{fig:mem-1-prod-1-cons-100-llh} & \VRef[Figure]{fig:mem-4-prod-4-cons-100-llh}\\
489	% \hline
490	% dl & \VRef[Figure]{fig:mem-1-prod-1-cons-100-dl} & \VRef[Figure]{fig:mem-4-prod-4-cons-100-dl}\\
491	% \hline
492	% glibc & \VRef[Figure]{fig:mem-1-prod-1-cons-100-glc} & \VRef[Figure]{fig:mem-4-prod-4-cons-100-glc}\\
493	% \hline
494	% hoard & \VRef[Figure]{fig:mem-1-prod-1-cons-100-hrd} & \VRef[Figure]{fig:mem-4-prod-4-cons-100-hrd}\\
495	% \hline
496	% je & \VRef[Figure]{fig:mem-1-prod-1-cons-100-je} & \VRef[Figure]{fig:mem-4-prod-4-cons-100-je}\\
497	% \hline
498	% pt3 & \VRef[Figure]{fig:mem-1-prod-1-cons-100-pt3} & \VRef[Figure]{fig:mem-4-prod-4-cons-100-pt3}\\
499	% \hline
500	% rp & \VRef[Figure]{fig:mem-1-prod-1-cons-100-rp} & \VRef[Figure]{fig:mem-4-prod-4-cons-100-rp}\\
501	% \hline
502	% tbb & \VRef[Figure]{fig:mem-1-prod-1-cons-100-tbb} & \VRef[Figure]{fig:mem-4-prod-4-cons-100-tbb}\\
503	% \hline
504	% \end{tabular}
505	% \caption{Memory benchmark results}
506	% \label{table:mem-benchmark-figs}
507	% \end{table}
508	% Table \ref{table:mem-benchmark-figs} shows the list of figures that contain memory benchmark results.
509
510	\VRefrange[Figures]{fig:mem-1-prod-1-cons-100-llh}{fig:mem-4-prod-4-cons-100-tbb} show 16 figures, two figures for each of the 8 allocators, one for each configuration.
511	Each figure has 2 graphs, one for each experiment environment.
512	Each graph has following 5 subgraphs that show memory usage and statistics throughout the micro-benchmark's lifetime.
513	\begin{itemize}
514	\item \textit{\textbf{current\_req\_mem(B)}} shows the amount of dynamic memory requested and currently in-use of the benchmark.
515	\item \textit{\textbf{heap}}* shows the memory requested by the program (allocator) from the system that lies in the heap (@sbrk@) area.
516	\item \textit{\textbf{mmap\_so}}* shows the memory requested by the program (allocator) from the system that lies in the @mmap@ area.
517	\item \textit{\textbf{mmap}}* shows the memory requested by the program (allocator or shared libraries) from the system that lies in the @mmap@ area.
518	\item \textit{\textbf{total\_dynamic}} shows the total usage of dynamic memory by the benchmark program, which is a sum of \textit{heap}, \textit{mmap}, and \textit{mmap\_so}.
519	\end{itemize}
520	* These statistics are gathered by monitoring a process's @/proc/self/maps@ file.
521
522	The X-axis shows the time when the memory information is polled.
523	The Y-axis shows the memory usage in bytes.
524
525	For this experiment, the difference between the memory requested by the benchmark (\textit{current\_req\_mem(B)}) and the memory that the process has received from system (\textit{heap}, \textit{mmap}) should be minimum.
526	This difference is the memory overhead caused by the allocator and shows the level of fragmentation in the allocator.
527
528	\paragraph{Assessment}
529	First, the differences in the shape of the curves between architectures (top ARM, bottom x64) is small, where the differences are in the amount of memory used.
530	Hence, it is possible to focus on either the top or bottom graph.
531
532	Second, the heap curve is 0 for four memory allocators: \textsf{hrd}, \textsf{je}, \textsf{pt3}, and \textsf{rp}, indicating these memory allocators only use @mmap@ to get memory from the system and ignore the @sbrk@ area.
533
534	The total dynamic memory is higher for \textsf{hrd} and \textsf{tbb} than the other allocators.
535	The main reason is the use of superblocks (see \VRef{s:ObjectContainers}) containing objects of the same size.
536	These superblocks are maintained throughout the life of the program.
537
538	\textsf{pt3} is the only memory allocator where the total dynamic memory goes down in the second half of the program lifetime when the memory is freed by the benchmark program.
539	It makes pt3 the only memory allocator that gives memory back to the operating system as it is freed by the program.
540
541	% FOR 1 THREAD
542
543	%mem-1-prod-1-cons-100-llh.eps
544	\begin{figure}
545	\centering
546	\subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-llh} }
547	\subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-llh} }
548	\caption{Memory benchmark results with Configuration-1 for llh memory allocator}
549	\label{fig:mem-1-prod-1-cons-100-llh}
550	\end{figure}
551
552	%mem-1-prod-1-cons-100-dl.eps
553	\begin{figure}
554	\centering
555	\subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-dl} }
556	\subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-dl} }
557	\caption{Memory benchmark results with Configuration-1 for dl memory allocator}
558	\label{fig:mem-1-prod-1-cons-100-dl}
559	\end{figure}
560
561	%mem-1-prod-1-cons-100-glc.eps
562	\begin{figure}
563	\centering
564	\subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-glc} }
565	\subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-glc} }
566	\caption{Memory benchmark results with Configuration-1 for glibc memory allocator}
567	\label{fig:mem-1-prod-1-cons-100-glc}
568	\end{figure}
569
570	%mem-1-prod-1-cons-100-hrd.eps
571	\begin{figure}
572	\centering
573	\subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-hrd} }
574	\subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-hrd} }
575	\caption{Memory benchmark results with Configuration-1 for hoard memory allocator}
576	\label{fig:mem-1-prod-1-cons-100-hrd}
577	\end{figure}
578
579	%mem-1-prod-1-cons-100-je.eps
580	\begin{figure}
581	\centering
582	\subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-je} }
583	\subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-je} }
584	\caption{Memory benchmark results with Configuration-1 for je memory allocator}
585	\label{fig:mem-1-prod-1-cons-100-je}
586	\end{figure}
587
588	%mem-1-prod-1-cons-100-pt3.eps
589	\begin{figure}
590	\centering
591	\subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-pt3} }
592	\subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-pt3} }
593	\caption{Memory benchmark results with Configuration-1 for pt3 memory allocator}
594	\label{fig:mem-1-prod-1-cons-100-pt3}
595	\end{figure}
596
597	%mem-1-prod-1-cons-100-rp.eps
598	\begin{figure}
599	\centering
600	\subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-rp} }
601	\subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-rp} }
602	\caption{Memory benchmark results with Configuration-1 for rp memory allocator}
603	\label{fig:mem-1-prod-1-cons-100-rp}
604	\end{figure}
605
606	%mem-1-prod-1-cons-100-tbb.eps
607	\begin{figure}
608	\centering
609	\subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-tbb} }
610	\subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-tbb} }
611	\caption{Memory benchmark results with Configuration-1 for tbb memory allocator}
612	\label{fig:mem-1-prod-1-cons-100-tbb}
613	\end{figure}
614
615	% FOR 4 THREADS
616
617	%mem-4-prod-4-cons-100-llh.eps
618	\begin{figure}
619	\centering
620	\subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-llh} }
621	\subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-llh} }
622	\caption{Memory benchmark results with Configuration-2 for llh memory allocator}
623	\label{fig:mem-4-prod-4-cons-100-llh}
624	\end{figure}
625
626	%mem-4-prod-4-cons-100-dl.eps
627	\begin{figure}
628	\centering
629	\subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-dl} }
630	\subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-dl} }
631	\caption{Memory benchmark results with Configuration-2 for dl memory allocator}
632	\label{fig:mem-4-prod-4-cons-100-dl}
633	\end{figure}
634
635	%mem-4-prod-4-cons-100-glc.eps
636	\begin{figure}
637	\centering
638	\subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-glc} }
639	\subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-glc} }
640	\caption{Memory benchmark results with Configuration-2 for glibc memory allocator}
641	\label{fig:mem-4-prod-4-cons-100-glc}
642	\end{figure}
643
644	%mem-4-prod-4-cons-100-hrd.eps
645	\begin{figure}
646	\centering
647	\subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-hrd} }
648	\subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-hrd} }
649	\caption{Memory benchmark results with Configuration-2 for hoard memory allocator}
650	\label{fig:mem-4-prod-4-cons-100-hrd}
651	\end{figure}
652
653	%mem-4-prod-4-cons-100-je.eps
654	\begin{figure}
655	\centering
656	\subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-je} }
657	\subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-je} }
658	\caption{Memory benchmark results with Configuration-2 for je memory allocator}
659	\label{fig:mem-4-prod-4-cons-100-je}
660	\end{figure}
661
662	%mem-4-prod-4-cons-100-pt3.eps
663	\begin{figure}
664	\centering
665	\subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-pt3} }
666	\subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-pt3} }
667	\caption{Memory benchmark results with Configuration-2 for pt3 memory allocator}
668	\label{fig:mem-4-prod-4-cons-100-pt3}
669	\end{figure}
670
671	%mem-4-prod-4-cons-100-rp.eps
672	\begin{figure}
673	\centering
674	\subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-rp} }
675	\subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-rp} }
676	\caption{Memory benchmark results with Configuration-2 for rp memory allocator}
677	\label{fig:mem-4-prod-4-cons-100-rp}
678	\end{figure}
679
680	%mem-4-prod-4-cons-100-tbb.eps
681	\begin{figure}
682	\centering
683	\subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-tbb} }
684	\subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-tbb} }
685	\caption{Memory benchmark results with Configuration-2 for tbb memory allocator}
686	\label{fig:mem-4-prod-4-cons-100-tbb}
687	\end{figure}

Note: See TracBrowser for help on using the repository browser.

Download in other formats: