source: doc/theses/mubeen_zulfiqar_MMath/performance.tex @ f6e6a55

ADTast-experimentalpthread-emulationqualifiedEnum
Last change on this file since f6e6a55 was 2e9b59b, checked in by m3zulfiq <m3zulfiq@…>, 2 years ago

added benchmark and evaluations chapter to thesis

  • Property mode set to 100644
File size: 14.7 KB
Line 
1\chapter{Performance}
2\label{c:Performance}
3
4\section{Machine Specification}
5
6The performance experiments were run on three different multicore systems to determine if there is consistency across platforms:
7\begin{itemize}
8\item
9{\bf Nasus} AMD EPYC 7662, 64-core socket $\times$ 2, 2.0 GHz, GCC version 9.3.0
10\item
11{\bf Algol} Huawei ARM TaiShan 2280 V2 Kunpeng 920, 24-core socket $\times$ 4, 2.6 GHz, GCC version 9.4.0
12\end{itemize}
13
14
15\section{Existing Memory Allocators}\label{sec:curAllocatorSec}
16With dynamic allocation being an important feature of C, there are many stand-alone memory allocators that have been designed for different purposes. For this thesis, we chose 7 of the most popular and widely used memory allocators.
17
18\subsection{dlmalloc}
19dlmalloc (FIX ME: cite allocator with download link) is a thread-safe allocator that is single threaded and single heap. dlmalloc maintains free-lists of different sizes to store freed dynamic memory. (FIX ME: cite wasik)
20\\
21\\
22{\bf Version:} 2.8.6\\
23{\bf Configuration:} Compiled with pre-processor USE\_LOCKS.\\
24{\bf Compilation command:}\\
25cc -g3 -O3 -Wall -Wextra -fno-builtin-malloc -fno-builtin-calloc -fno-builtin-realloc -fno-builtin-free -fPIC -shared -DUSE\_LOCKS -o libdlmalloc.so malloc-2.8.6.c
26
27\subsection{hoard}
28Hoard (FIX ME: cite allocator) is a thread-safe allocator that is multi-threaded and using a heap layer framework. It has per-thread heaps that have thread-local free-lists, and a global shared heap. (FIX ME: cite wasik)
29\\
30\\
31{\bf Version:} 3.13\\
32{\bf Configuration:} Compiled with hoard's default configurations and Makefile.\\
33{\bf Compilation command:}\\
34make all
35
36\subsection{jemalloc}
37jemalloc (FIX ME: cite allocator) is a thread-safe allocator that uses multiple arenas. Each thread is assigned an arena. Each arena has chunks that contain contagious memory regions of same size. An arena has multiple chunks that contain regions of multiple sizes.
38\\
39\\
40{\bf Version:} 5.2.1\\
41{\bf Configuration:} Compiled with jemalloc's default configurations and Makefile.\\
42{\bf Compilation command:}\\
43./autogen.sh\\
44./configure\\
45make\\
46make install
47
48\subsection{pt3malloc}
49pt3malloc (FIX ME: cite allocator) is a modification of dlmalloc. It is a thread-safe multi-threaded memory allocator that uses multiple heaps. pt3malloc heap has similar design to dlmalloc's heap.
50\\
51\\
52{\bf Version:} 1.8\\
53{\bf Configuration:} Compiled with pt3malloc's Makefile using option "linux-shared".\\
54{\bf Compilation command:}\\
55make linux-shared
56
57\subsection{rpmalloc}
58rpmalloc (FIX ME: cite allocator) is a thread-safe allocator that is multi-threaded and uses per-thread heap. Each heap has multiple size-classes and each size-class contains memory regions of the relevant size.
59\\
60\\
61{\bf Version:} 1.4.1\\
62{\bf Configuration:} Compiled with rpmalloc's default configurations and ninja build system.\\
63{\bf Compilation command:}\\
64python3 configure.py\\
65ninja
66
67\subsection{tbb malloc}
68tbb malloc (FIX ME: cite allocator) is a thread-safe allocator that is multi-threaded and uses private heap for each thread. Each private-heap has multiple bins of different sizes. Each bin contains free regions of the same size.
69\\
70\\
71{\bf Version:} intel tbb 2020 update 2, tbb\_interface\_version == 11102\\
72{\bf Configuration:} Compiled with tbbmalloc's default configurations and Makefile.\\
73{\bf Compilation command:}\\
74make
75
76\section{Experiment Environment}
77We used our micro becnhmark suite (FIX ME: cite mbench) to evaluate these memory allocators \ref{sec:curAllocatorSec} and our own memory allocator uHeap \ref{sec:allocatorSec}.
78
79\section{Results}
80FIX ME: add experiment, knobs, graphs, description+analysis
81
82%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
83%% CHURN
84%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
85
86\subsection{Churn Benchmark}
87
88Churn benchmark tested memory allocators for speed under intensive dynamic memory usage.
89
90This experiment was run with following configurations:
91
92-maxS            : 500
93
94-minS            : 50
95
96-stepS           : 50
97
98-distroS         : fisher
99
100-objN            : 100000
101
102-cSpots          : 16
103
104-threadN         : \{ 1, 2, 4, 8, 16 \} *
105
106* Each allocator was tested for its performance across different number of threads. Experiment was repeated for each allocator for 1, 2, 4, 8, and 16 threads by setting the configuration -threadN.
107
108Results are shown in figure \ref{fig:churn} for both algol and nasus.
109X-axis shows number of threads. Each allocator's performance for each thread is shown in different colors.
110Y-axis shows the total time experiment took to finish.
111
112\begin{figure}
113\centering
114    \subfigure[Algol]{ \includegraphics[width=0.9\textwidth]{evaluations/algol-perf-eps/churn} }
115    \subfigure[Nasus]{ \includegraphics[width=0.9\textwidth]{evaluations/nasus-perf-eps/churn} }
116\caption{Churn}
117\label{fig:churn}
118\end{figure}
119
120%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
121%% THRASH
122%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
123
124\subsection{Cache Thrash}
125
126Thrash benchmark tested memory allocators for active false sharing.
127
128This experiment was run with following configurations:
129
130-cacheIt        : 1000
131
132-cacheRep       : 1000000
133
134-cacheObj       : 1
135
136-threadN        : \{ 1, 2, 4, 8, 16 \} *
137
138* Each allocator was tested for its performance across different number of threads. Experiment was repeated for each allocator for 1, 2, 4, 8, and 16 threads by setting the configuration -threadN.
139
140Results are shown in figure \ref{fig:cacheThrash} for both algol and nasus.
141X-axis shows number of threads. Each allocator's performance for each thread is shown in different colors.
142Y-axis shows the total time experiment took to finish.
143
144\begin{figure}
145\centering
146    \subfigure[Algol]{ \includegraphics[width=0.9\textwidth]{evaluations/algol-perf-eps/cache-time-0-thrash} }
147    \subfigure[Nasus]{ \includegraphics[width=0.9\textwidth]{evaluations/nasus-perf-eps/cache-time-0-thrash} }
148\caption{Cache Thrash}
149\label{fig:cacheThrash}
150\end{figure}
151
152%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
153%% SCRATCH
154%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
155
156\subsection{Cache Scratch}
157
158Scratch benchmark tested memory allocators for program induced allocator preserved passive false sharing.
159
160This experiment was run with following configurations:
161
162-cacheIt        : 1000
163
164-cacheRep       : 1000000
165
166-cacheObj       : 1
167
168-threadN        : \{ 1, 2, 4, 8, 16 \} *
169
170* Each allocator was tested for its performance across different number of threads. Experiment was repeated for each allocator for 1, 2, 4, 8, and 16 threads by setting the configuration -threadN.
171
172Results are shown in figure \ref{fig:cacheScratch} for both algol and nasus.
173X-axis shows number of threads. Each allocator's performance for each thread is shown in different colors.
174Y-axis shows the total time experiment took to finish.
175
176\begin{figure}
177\centering
178    \subfigure[Algol]{ \includegraphics[width=0.9\textwidth]{evaluations/algol-perf-eps/cache-time-0-scratch} }
179    \subfigure[Nasus]{ \includegraphics[width=0.9\textwidth]{evaluations/nasus-perf-eps/cache-time-0-scratch} }
180\caption{Cache Scratch}
181\label{fig:cacheScratch}
182\end{figure}
183
184%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
185%% SPEED
186%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
187
188\subsection{Speed Benchmark}
189
190Speed benchmark tested memory allocators for program induced allocator preserved passive false sharing.
191
192This experiment was run with following configurations:
193
194-threadN :  sets number of threads, K\\
195-cSpots  :  sets number of spots for churn, M\\
196-objN    :  sets number of objects per thread, N\\
197-maxS    :  sets max object size\\
198-minS    :  sets min object size\\
199-stepS   :  sets object size increment\\
200-distroS :  sets object size distribution
201
202%speed-1-malloc-null.eps
203\begin{figure}
204\centering
205\includegraphics[width=1\textwidth]{evaluations/nasus-perf-eps/speed-1-malloc-null}
206\caption{speed-1-malloc-null}
207\label{fig:speed-1-malloc-null}
208\end{figure}
209
210%speed-2-free-null.eps
211\begin{figure}
212\centering
213\includegraphics[width=1\textwidth]{evaluations/nasus-perf-eps/speed-2-free-null}
214\caption{speed-2-free-null}
215\label{fig:speed-2-free-null}
216\end{figure}
217
218%speed-3-malloc.eps
219\begin{figure}
220\centering
221\includegraphics[width=1\textwidth]{evaluations/nasus-perf-eps/speed-3-malloc}
222\caption{speed-3-malloc}
223\label{fig:speed-3-malloc}
224\end{figure}
225
226%speed-4-realloc.eps
227\begin{figure}
228\centering
229\includegraphics[width=1\textwidth]{evaluations/nasus-perf-eps/speed-4-realloc}
230\caption{speed-4-realloc}
231\label{fig:speed-4-realloc}
232\end{figure}
233
234%speed-5-free.eps
235\begin{figure}
236\centering
237\includegraphics[width=1\textwidth]{evaluations/nasus-perf-eps/speed-5-free}
238\caption{speed-5-free}
239\label{fig:speed-5-free}
240\end{figure}
241
242%speed-6-calloc.eps
243\begin{figure}
244\centering
245\includegraphics[width=1\textwidth]{evaluations/nasus-perf-eps/speed-6-calloc}
246\caption{speed-6-calloc}
247\label{fig:speed-6-calloc}
248\end{figure}
249
250%speed-7-malloc-free.eps
251\begin{figure}
252\centering
253\includegraphics[width=1\textwidth]{evaluations/nasus-perf-eps/speed-7-malloc-free}
254\caption{speed-7-malloc-free}
255\label{fig:speed-7-malloc-free}
256\end{figure}
257
258%speed-8-realloc-free.eps
259\begin{figure}
260\centering
261\includegraphics[width=1\textwidth]{evaluations/nasus-perf-eps/speed-8-realloc-free}
262\caption{speed-8-realloc-free}
263\label{fig:speed-8-realloc-free}
264\end{figure}
265
266%speed-9-calloc-free.eps
267\begin{figure}
268\centering
269\includegraphics[width=1\textwidth]{evaluations/nasus-perf-eps/speed-9-calloc-free}
270\caption{speed-9-calloc-free}
271\label{fig:speed-9-calloc-free}
272\end{figure}
273
274%speed-10-malloc-realloc.eps
275\begin{figure}
276\centering
277\includegraphics[width=1\textwidth]{evaluations/nasus-perf-eps/speed-10-malloc-realloc}
278\caption{speed-10-malloc-realloc}
279\label{fig:speed-10-malloc-realloc}
280\end{figure}
281
282%speed-11-calloc-realloc.eps
283\begin{figure}
284\centering
285\includegraphics[width=1\textwidth]{evaluations/nasus-perf-eps/speed-11-calloc-realloc}
286\caption{speed-11-calloc-realloc}
287\label{fig:speed-11-calloc-realloc}
288\end{figure}
289
290%speed-12-malloc-realloc-free.eps
291\begin{figure}
292\centering
293\includegraphics[width=1\textwidth]{evaluations/nasus-perf-eps/speed-12-malloc-realloc-free}
294\caption{speed-12-malloc-realloc-free}
295\label{fig:speed-12-malloc-realloc-free}
296\end{figure}
297
298%speed-13-calloc-realloc-free.eps
299\begin{figure}
300\centering
301\includegraphics[width=1\textwidth]{evaluations/nasus-perf-eps/speed-13-calloc-realloc-free}
302\caption{speed-13-calloc-realloc-free}
303\label{fig:speed-13-calloc-realloc-free}
304\end{figure}
305
306%speed-14-{m,c,re}alloc-free.eps
307\begin{figure}
308\centering
309\includegraphics[width=1\textwidth]{evaluations/nasus-perf-eps/speed-14-{m,c,re}alloc-free}
310\caption{speed-14-{m,c,re}alloc-free}
311\label{fig:speed-14-{m,c,re}alloc-free}
312\end{figure}
313
314%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
315%% MEMORY
316%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
317
318\subsection{Memory Benchmark}
319
320%mem-1-prod-1-cons-100-cfa.eps
321\begin{figure}
322\centering
323\includegraphics[width=1\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-cfa}
324\caption{mem-1-prod-1-cons-100-cfa}
325\label{fig:mem-1-prod-1-cons-100-cfa}
326\end{figure}
327
328%mem-1-prod-1-cons-100-dl.eps
329\begin{figure}
330\centering
331\includegraphics[width=1\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-dl}
332\caption{mem-1-prod-1-cons-100-dl}
333\label{fig:mem-1-prod-1-cons-100-dl}
334\end{figure}
335
336%mem-1-prod-1-cons-100-glc.eps
337\begin{figure}
338\centering
339\includegraphics[width=1\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-glc}
340\caption{mem-1-prod-1-cons-100-glc}
341\label{fig:mem-1-prod-1-cons-100-glc}
342\end{figure}
343
344%mem-1-prod-1-cons-100-hrd.eps
345\begin{figure}
346\centering
347\includegraphics[width=1\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-hrd}
348\caption{mem-1-prod-1-cons-100-hrd}
349\label{fig:mem-1-prod-1-cons-100-hrd}
350\end{figure}
351
352%mem-1-prod-1-cons-100-je.eps
353\begin{figure}
354\centering
355\includegraphics[width=1\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-je}
356\caption{mem-1-prod-1-cons-100-je}
357\label{fig:mem-1-prod-1-cons-100-je}
358\end{figure}
359
360%mem-1-prod-1-cons-100-pt3.eps
361\begin{figure}
362\centering
363\includegraphics[width=1\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-pt3}
364\caption{mem-1-prod-1-cons-100-pt3}
365\label{fig:mem-1-prod-1-cons-100-pt3}
366\end{figure}
367
368%mem-1-prod-1-cons-100-rp.eps
369\begin{figure}
370\centering
371\includegraphics[width=1\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-rp}
372\caption{mem-1-prod-1-cons-100-rp}
373\label{fig:mem-1-prod-1-cons-100-rp}
374\end{figure}
375
376%mem-1-prod-1-cons-100-tbb.eps
377\begin{figure}
378\centering
379\includegraphics[width=1\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-tbb}
380\caption{mem-1-prod-1-cons-100-tbb}
381\label{fig:mem-1-prod-1-cons-100-tbb}
382\end{figure}
383
384%mem-4-prod-4-cons-100-cfa.eps
385\begin{figure}
386\centering
387\includegraphics[width=1\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-cfa}
388\caption{mem-4-prod-4-cons-100-cfa}
389\label{fig:mem-4-prod-4-cons-100-cfa}
390\end{figure}
391
392%mem-4-prod-4-cons-100-dl.eps
393\begin{figure}
394\centering
395\includegraphics[width=1\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-dl}
396\caption{mem-4-prod-4-cons-100-dl}
397\label{fig:mem-4-prod-4-cons-100-dl}
398\end{figure}
399
400%mem-4-prod-4-cons-100-glc.eps
401\begin{figure}
402\centering
403\includegraphics[width=1\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-glc}
404\caption{mem-4-prod-4-cons-100-glc}
405\label{fig:mem-4-prod-4-cons-100-glc}
406\end{figure}
407
408%mem-4-prod-4-cons-100-hrd.eps
409\begin{figure}
410\centering
411\includegraphics[width=1\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-hrd}
412\caption{mem-4-prod-4-cons-100-hrd}
413\label{fig:mem-4-prod-4-cons-100-hrd}
414\end{figure}
415
416%mem-4-prod-4-cons-100-je.eps
417\begin{figure}
418\centering
419\includegraphics[width=1\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-je}
420\caption{mem-4-prod-4-cons-100-je}
421\label{fig:mem-4-prod-4-cons-100-je}
422\end{figure}
423
424%mem-4-prod-4-cons-100-pt3.eps
425\begin{figure}
426\centering
427\includegraphics[width=1\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-pt3}
428\caption{mem-4-prod-4-cons-100-pt3}
429\label{fig:mem-4-prod-4-cons-100-pt3}
430\end{figure}
431
432%mem-4-prod-4-cons-100-rp.eps
433\begin{figure}
434\centering
435\includegraphics[width=1\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-rp}
436\caption{mem-4-prod-4-cons-100-rp}
437\label{fig:mem-4-prod-4-cons-100-rp}
438\end{figure}
439
440%mem-4-prod-4-cons-100-tbb.eps
441\begin{figure}
442\centering
443\includegraphics[width=1\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-tbb}
444\caption{mem-4-prod-4-cons-100-tbb}
445\label{fig:mem-4-prod-4-cons-100-tbb}
446\end{figure}
Note: See TracBrowser for help on using the repository browser.