source: doc/theses/mubeen_zulfiqar_MMath/performance.tex @ ba897d21

ADTast-experimentalpthread-emulationqualifiedEnum
Last change on this file since ba897d21 was ba897d21, checked in by m3zulfiq <m3zulfiq@…>, 2 years ago

added benchmark and evaluations chapter to thesis

  • Property mode set to 100644
File size: 14.7 KB
Line 
1\chapter{Performance}
2
3\section{Machine Specification}
4
5The performance experiments were run on three different multicore systems to determine if there is consistency across platforms:
6\begin{itemize}
7\item
8{\bf Nasus} AMD EPYC 7662, 64-core socket $\times$ 2, 2.0 GHz, GCC version 9.3.0
9\item
10{\bf Algol} Huawei ARM TaiShan 2280 V2 Kunpeng 920, 24-core socket $\times$ 4, 2.6 GHz, GCC version 9.4.0
11\end{itemize}
12
13
14\section{Existing Memory Allocators}\label{sec:curAllocatorSec}
15With dynamic allocation being an important feature of C, there are many stand-alone memory allocators that have been designed for different purposes. For this thesis, we chose 7 of the most popular and widely used memory allocators.
16
17\subsection{dlmalloc}
18dlmalloc (FIX ME: cite allocator with download link) is a thread-safe allocator that is single threaded and single heap. dlmalloc maintains free-lists of different sizes to store freed dynamic memory. (FIX ME: cite wasik)
19\\
20\\
21{\bf Version:} 2.8.6\\
22{\bf Configuration:} Compiled with pre-processor USE\_LOCKS.\\
23{\bf Compilation command:}\\
24cc -g3 -O3 -Wall -Wextra -fno-builtin-malloc -fno-builtin-calloc -fno-builtin-realloc -fno-builtin-free -fPIC -shared -DUSE\_LOCKS -o libdlmalloc.so malloc-2.8.6.c
25
26\subsection{hoard}
27Hoard (FIX ME: cite allocator) is a thread-safe allocator that is multi-threaded and using a heap layer framework. It has per-thread heaps that have thread-local free-lists, and a global shared heap. (FIX ME: cite wasik)
28\\
29\\
30{\bf Version:} 3.13\\
31{\bf Configuration:} Compiled with hoard's default configurations and Makefile.\\
32{\bf Compilation command:}\\
33make all
34
35\subsection{jemalloc}
36jemalloc (FIX ME: cite allocator) is a thread-safe allocator that uses multiple arenas. Each thread is assigned an arena. Each arena has chunks that contain contagious memory regions of same size. An arena has multiple chunks that contain regions of multiple sizes.
37\\
38\\
39{\bf Version:} 5.2.1\\
40{\bf Configuration:} Compiled with jemalloc's default configurations and Makefile.\\
41{\bf Compilation command:}\\
42./autogen.sh\\
43./configure\\
44make\\
45make install
46
47\subsection{pt3malloc}
48pt3malloc (FIX ME: cite allocator) is a modification of dlmalloc. It is a thread-safe multi-threaded memory allocator that uses multiple heaps. pt3malloc heap has similar design to dlmalloc's heap.
49\\
50\\
51{\bf Version:} 1.8\\
52{\bf Configuration:} Compiled with pt3malloc's Makefile using option "linux-shared".\\
53{\bf Compilation command:}\\
54make linux-shared
55
56\subsection{rpmalloc}
57rpmalloc (FIX ME: cite allocator) is a thread-safe allocator that is multi-threaded and uses per-thread heap. Each heap has multiple size-classes and each size-class contains memory regions of the relevant size.
58\\
59\\
60{\bf Version:} 1.4.1\\
61{\bf Configuration:} Compiled with rpmalloc's default configurations and ninja build system.\\
62{\bf Compilation command:}\\
63python3 configure.py\\
64ninja
65
66\subsection{tbb malloc}
67tbb malloc (FIX ME: cite allocator) is a thread-safe allocator that is multi-threaded and uses private heap for each thread. Each private-heap has multiple bins of different sizes. Each bin contains free regions of the same size.
68\\
69\\
70{\bf Version:} intel tbb 2020 update 2, tbb\_interface\_version == 11102\\
71{\bf Configuration:} Compiled with tbbmalloc's default configurations and Makefile.\\
72{\bf Compilation command:}\\
73make
74
75\section{Experiment Environment}
76We used our micro becnhmark suite (FIX ME: cite mbench) to evaluate these memory allocators \ref{sec:curAllocatorSec} and our own memory allocator uHeap \ref{sec:allocatorSec}.
77
78\section{Results}
79FIX ME: add experiment, knobs, graphs, description+analysis
80
81%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
82%% CHURN
83%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
84
85\subsection{Churn Benchmark}
86
87Churn benchmark tested memory allocators for speed under intensive dynamic memory usage.
88
89This experiment was run with following configurations:
90
91-maxS            : 500
92
93-minS            : 50
94
95-stepS           : 50
96
97-distroS         : fisher
98
99-objN            : 100000
100
101-cSpots          : 16
102
103-threadN         : \{ 1, 2, 4, 8, 16 \} *
104
105* Each allocator was tested for its performance across different number of threads. Experiment was repeated for each allocator for 1, 2, 4, 8, and 16 threads by setting the configuration -threadN.
106
107Results are shown in figure \ref{fig:churn} for both algol and nasus.
108X-axis shows number of threads. Each allocator's performance for each thread is shown in different colors.
109Y-axis shows the total time experiment took to finish.
110
111\begin{figure}
112\centering
113    \subfigure[Algol]{ \includegraphics[width=0.9\textwidth]{evaluations/algol-perf-eps/churn} }
114    \subfigure[Nasus]{ \includegraphics[width=0.9\textwidth]{evaluations/nasus-perf-eps/churn} }
115\caption{Churn}
116\label{fig:churn}
117\end{figure}
118
119%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
120%% THRASH
121%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
122
123\subsection{Cache Thrash}
124
125Thrash benchmark tested memory allocators for active false sharing.
126
127This experiment was run with following configurations:
128
129-cacheIt        : 1000
130
131-cacheRep       : 1000000
132
133-cacheObj       : 1
134
135-threadN        : \{ 1, 2, 4, 8, 16 \} *
136
137* Each allocator was tested for its performance across different number of threads. Experiment was repeated for each allocator for 1, 2, 4, 8, and 16 threads by setting the configuration -threadN.
138
139Results are shown in figure \ref{fig:cacheThrash} for both algol and nasus.
140X-axis shows number of threads. Each allocator's performance for each thread is shown in different colors.
141Y-axis shows the total time experiment took to finish.
142
143\begin{figure}
144\centering
145    \subfigure[Algol]{ \includegraphics[width=0.9\textwidth]{evaluations/algol-perf-eps/cache-time-0-thrash} }
146    \subfigure[Nasus]{ \includegraphics[width=0.9\textwidth]{evaluations/nasus-perf-eps/cache-time-0-thrash} }
147\caption{Cache Thrash}
148\label{fig:cacheThrash}
149\end{figure}
150
151%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
152%% SCRATCH
153%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
154
155\subsection{Cache Scratch}
156
157Scratch benchmark tested memory allocators for program induced allocator preserved passive false sharing.
158
159This experiment was run with following configurations:
160
161-cacheIt        : 1000
162
163-cacheRep       : 1000000
164
165-cacheObj       : 1
166
167-threadN        : \{ 1, 2, 4, 8, 16 \} *
168
169* Each allocator was tested for its performance across different number of threads. Experiment was repeated for each allocator for 1, 2, 4, 8, and 16 threads by setting the configuration -threadN.
170
171Results are shown in figure \ref{fig:cacheScratch} for both algol and nasus.
172X-axis shows number of threads. Each allocator's performance for each thread is shown in different colors.
173Y-axis shows the total time experiment took to finish.
174
175\begin{figure}
176\centering
177    \subfigure[Algol]{ \includegraphics[width=0.9\textwidth]{evaluations/algol-perf-eps/cache-time-0-scratch} }
178    \subfigure[Nasus]{ \includegraphics[width=0.9\textwidth]{evaluations/nasus-perf-eps/cache-time-0-scratch} }
179\caption{Cache Scratch}
180\label{fig:cacheScratch}
181\end{figure}
182
183%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
184%% SPEED
185%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
186
187\subsection{Speed Benchmark}
188
189Speed benchmark tested memory allocators for program induced allocator preserved passive false sharing.
190
191This experiment was run with following configurations:
192
193-threadN :  sets number of threads, K\\
194-cSpots  :  sets number of spots for churn, M\\
195-objN    :  sets number of objects per thread, N\\
196-maxS    :  sets max object size\\
197-minS    :  sets min object size\\
198-stepS   :  sets object size increment\\
199-distroS :  sets object size distribution
200
201%speed-1-malloc-null.eps
202\begin{figure}
203\centering
204\includegraphics[width=1\textwidth]{evaluations/nasus-perf-eps/speed-1-malloc-null}
205\caption{speed-1-malloc-null}
206\label{fig:speed-1-malloc-null}
207\end{figure}
208
209%speed-2-free-null.eps
210\begin{figure}
211\centering
212\includegraphics[width=1\textwidth]{evaluations/nasus-perf-eps/speed-2-free-null}
213\caption{speed-2-free-null}
214\label{fig:speed-2-free-null}
215\end{figure}
216
217%speed-3-malloc.eps
218\begin{figure}
219\centering
220\includegraphics[width=1\textwidth]{evaluations/nasus-perf-eps/speed-3-malloc}
221\caption{speed-3-malloc}
222\label{fig:speed-3-malloc}
223\end{figure}
224
225%speed-4-realloc.eps
226\begin{figure}
227\centering
228\includegraphics[width=1\textwidth]{evaluations/nasus-perf-eps/speed-4-realloc}
229\caption{speed-4-realloc}
230\label{fig:speed-4-realloc}
231\end{figure}
232
233%speed-5-free.eps
234\begin{figure}
235\centering
236\includegraphics[width=1\textwidth]{evaluations/nasus-perf-eps/speed-5-free}
237\caption{speed-5-free}
238\label{fig:speed-5-free}
239\end{figure}
240
241%speed-6-calloc.eps
242\begin{figure}
243\centering
244\includegraphics[width=1\textwidth]{evaluations/nasus-perf-eps/speed-6-calloc}
245\caption{speed-6-calloc}
246\label{fig:speed-6-calloc}
247\end{figure}
248
249%speed-7-malloc-free.eps
250\begin{figure}
251\centering
252\includegraphics[width=1\textwidth]{evaluations/nasus-perf-eps/speed-7-malloc-free}
253\caption{speed-7-malloc-free}
254\label{fig:speed-7-malloc-free}
255\end{figure}
256
257%speed-8-realloc-free.eps
258\begin{figure}
259\centering
260\includegraphics[width=1\textwidth]{evaluations/nasus-perf-eps/speed-8-realloc-free}
261\caption{speed-8-realloc-free}
262\label{fig:speed-8-realloc-free}
263\end{figure}
264
265%speed-9-calloc-free.eps
266\begin{figure}
267\centering
268\includegraphics[width=1\textwidth]{evaluations/nasus-perf-eps/speed-9-calloc-free}
269\caption{speed-9-calloc-free}
270\label{fig:speed-9-calloc-free}
271\end{figure}
272
273%speed-10-malloc-realloc.eps
274\begin{figure}
275\centering
276\includegraphics[width=1\textwidth]{evaluations/nasus-perf-eps/speed-10-malloc-realloc}
277\caption{speed-10-malloc-realloc}
278\label{fig:speed-10-malloc-realloc}
279\end{figure}
280
281%speed-11-calloc-realloc.eps
282\begin{figure}
283\centering
284\includegraphics[width=1\textwidth]{evaluations/nasus-perf-eps/speed-11-calloc-realloc}
285\caption{speed-11-calloc-realloc}
286\label{fig:speed-11-calloc-realloc}
287\end{figure}
288
289%speed-12-malloc-realloc-free.eps
290\begin{figure}
291\centering
292\includegraphics[width=1\textwidth]{evaluations/nasus-perf-eps/speed-12-malloc-realloc-free}
293\caption{speed-12-malloc-realloc-free}
294\label{fig:speed-12-malloc-realloc-free}
295\end{figure}
296
297%speed-13-calloc-realloc-free.eps
298\begin{figure}
299\centering
300\includegraphics[width=1\textwidth]{evaluations/nasus-perf-eps/speed-13-calloc-realloc-free}
301\caption{speed-13-calloc-realloc-free}
302\label{fig:speed-13-calloc-realloc-free}
303\end{figure}
304
305%speed-14-{m,c,re}alloc-free.eps
306\begin{figure}
307\centering
308\includegraphics[width=1\textwidth]{evaluations/nasus-perf-eps/speed-14-{m,c,re}alloc-free}
309\caption{speed-14-{m,c,re}alloc-free}
310\label{fig:speed-14-{m,c,re}alloc-free}
311\end{figure}
312
313%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
314%% MEMORY
315%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
316
317\subsection{Memory Benchmark}
318
319%mem-1-prod-1-cons-100-cfa.eps
320\begin{figure}
321\centering
322\includegraphics[width=1\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-cfa}
323\caption{mem-1-prod-1-cons-100-cfa}
324\label{fig:mem-1-prod-1-cons-100-cfa}
325\end{figure}
326
327%mem-1-prod-1-cons-100-dl.eps
328\begin{figure}
329\centering
330\includegraphics[width=1\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-dl}
331\caption{mem-1-prod-1-cons-100-dl}
332\label{fig:mem-1-prod-1-cons-100-dl}
333\end{figure}
334
335%mem-1-prod-1-cons-100-glc.eps
336\begin{figure}
337\centering
338\includegraphics[width=1\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-glc}
339\caption{mem-1-prod-1-cons-100-glc}
340\label{fig:mem-1-prod-1-cons-100-glc}
341\end{figure}
342
343%mem-1-prod-1-cons-100-hrd.eps
344\begin{figure}
345\centering
346\includegraphics[width=1\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-hrd}
347\caption{mem-1-prod-1-cons-100-hrd}
348\label{fig:mem-1-prod-1-cons-100-hrd}
349\end{figure}
350
351%mem-1-prod-1-cons-100-je.eps
352\begin{figure}
353\centering
354\includegraphics[width=1\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-je}
355\caption{mem-1-prod-1-cons-100-je}
356\label{fig:mem-1-prod-1-cons-100-je}
357\end{figure}
358
359%mem-1-prod-1-cons-100-pt3.eps
360\begin{figure}
361\centering
362\includegraphics[width=1\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-pt3}
363\caption{mem-1-prod-1-cons-100-pt3}
364\label{fig:mem-1-prod-1-cons-100-pt3}
365\end{figure}
366
367%mem-1-prod-1-cons-100-rp.eps
368\begin{figure}
369\centering
370\includegraphics[width=1\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-rp}
371\caption{mem-1-prod-1-cons-100-rp}
372\label{fig:mem-1-prod-1-cons-100-rp}
373\end{figure}
374
375%mem-1-prod-1-cons-100-tbb.eps
376\begin{figure}
377\centering
378\includegraphics[width=1\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-tbb}
379\caption{mem-1-prod-1-cons-100-tbb}
380\label{fig:mem-1-prod-1-cons-100-tbb}
381\end{figure}
382
383%mem-4-prod-4-cons-100-cfa.eps
384\begin{figure}
385\centering
386\includegraphics[width=1\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-cfa}
387\caption{mem-4-prod-4-cons-100-cfa}
388\label{fig:mem-4-prod-4-cons-100-cfa}
389\end{figure}
390
391%mem-4-prod-4-cons-100-dl.eps
392\begin{figure}
393\centering
394\includegraphics[width=1\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-dl}
395\caption{mem-4-prod-4-cons-100-dl}
396\label{fig:mem-4-prod-4-cons-100-dl}
397\end{figure}
398
399%mem-4-prod-4-cons-100-glc.eps
400\begin{figure}
401\centering
402\includegraphics[width=1\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-glc}
403\caption{mem-4-prod-4-cons-100-glc}
404\label{fig:mem-4-prod-4-cons-100-glc}
405\end{figure}
406
407%mem-4-prod-4-cons-100-hrd.eps
408\begin{figure}
409\centering
410\includegraphics[width=1\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-hrd}
411\caption{mem-4-prod-4-cons-100-hrd}
412\label{fig:mem-4-prod-4-cons-100-hrd}
413\end{figure}
414
415%mem-4-prod-4-cons-100-je.eps
416\begin{figure}
417\centering
418\includegraphics[width=1\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-je}
419\caption{mem-4-prod-4-cons-100-je}
420\label{fig:mem-4-prod-4-cons-100-je}
421\end{figure}
422
423%mem-4-prod-4-cons-100-pt3.eps
424\begin{figure}
425\centering
426\includegraphics[width=1\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-pt3}
427\caption{mem-4-prod-4-cons-100-pt3}
428\label{fig:mem-4-prod-4-cons-100-pt3}
429\end{figure}
430
431%mem-4-prod-4-cons-100-rp.eps
432\begin{figure}
433\centering
434\includegraphics[width=1\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-rp}
435\caption{mem-4-prod-4-cons-100-rp}
436\label{fig:mem-4-prod-4-cons-100-rp}
437\end{figure}
438
439%mem-4-prod-4-cons-100-tbb.eps
440\begin{figure}
441\centering
442\includegraphics[width=1\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-tbb}
443\caption{mem-4-prod-4-cons-100-tbb}
444\label{fig:mem-4-prod-4-cons-100-tbb}
445\end{figure}
Note: See TracBrowser for help on using the repository browser.