1 | \chapter{Performance} |
---|
2 | \label{c:Performance} |
---|
3 | |
---|
4 | \noindent |
---|
5 | ==================== |
---|
6 | |
---|
7 | Writing Points: |
---|
8 | \begin{itemize} |
---|
9 | \item |
---|
10 | Machine Specification |
---|
11 | \item |
---|
12 | Allocators and their details |
---|
13 | \item |
---|
14 | Benchmarks and their details |
---|
15 | \item |
---|
16 | Results |
---|
17 | \end{itemize} |
---|
18 | |
---|
19 | \noindent |
---|
20 | ==================== |
---|
21 | |
---|
22 | \section{Machine Specification} |
---|
23 | |
---|
24 | The performance experiments were run on three different multicore systems to determine if there is consistency across platforms: |
---|
25 | \begin{itemize} |
---|
26 | \item |
---|
27 | AMD EPYC 7662, 64-core socket $\times$ 2, 2.0 GHz |
---|
28 | \item |
---|
29 | Huawei ARM TaiShan 2280 V2 Kunpeng 920, 24-core socket $\times$ 4, 2.6 GHz |
---|
30 | \item |
---|
31 | Intel Xeon Gold 5220R, 48-core socket $\times$ 2, 2.20GHz |
---|
32 | \end{itemize} |
---|
33 | |
---|
34 | |
---|
35 | \section{Existing Memory Allocators} |
---|
36 | With dynamic allocation being an important feature of C, there are many stand-alone memory allocators that have been designed for different purposes. For this thesis, we chose 7 of the most popular and widely used memory allocators. |
---|
37 | |
---|
38 | \paragraph{dlmalloc} |
---|
39 | dlmalloc (FIX ME: cite allocator) is a thread-safe allocator that is single threaded and single heap. dlmalloc maintains free-lists of different sizes to store freed dynamic memory. (FIX ME: cite wasik) |
---|
40 | |
---|
41 | \paragraph{hoard} |
---|
42 | Hoard (FIX ME: cite allocator) is a thread-safe allocator that is multi-threaded and using a heap layer framework. It has per-thread heaps that have thread-local free-lists, and a global shared heap. (FIX ME: cite wasik) |
---|
43 | |
---|
44 | \paragraph{jemalloc} |
---|
45 | jemalloc (FIX ME: cite allocator) is a thread-safe allocator that uses multiple arenas. Each thread is assigned an arena. Each arena has chunks that contain contagious memory regions of same size. An arena has multiple chunks that contain regions of multiple sizes. |
---|
46 | |
---|
47 | \paragraph{ptmalloc} |
---|
48 | ptmalloc (FIX ME: cite allocator) is a modification of dlmalloc. It is a thread-safe multi-threaded memory allocator that uses multiple heaps. ptmalloc heap has similar design to dlmalloc's heap. |
---|
49 | |
---|
50 | \paragraph{rpmalloc} |
---|
51 | rpmalloc (FIX ME: cite allocator) is a thread-safe allocator that is multi-threaded and uses per-thread heap. Each heap has multiple size-classes and each size-class contains memory regions of the relevant size. |
---|
52 | |
---|
53 | \paragraph{tbb malloc} |
---|
54 | tbb malloc (FIX ME: cite allocator) is a thread-safe allocator that is multi-threaded and uses private heap for each thread. Each private-heap has multiple bins of different sizes. Each bin contains free regions of the same size. |
---|
55 | |
---|
56 | \paragraph{tc malloc} |
---|
57 | tcmalloc (FIX ME: cite allocator) is a thread-safe allocator. It uses per-thread cache to store free objects that prevents contention on shared resources in multi-threaded application. A central free-list is used to refill per-thread cache when it gets empty. |
---|
58 | |
---|
59 | |
---|
60 | \section{Memory Allocators} |
---|
61 | For these experiments, we used 7 memory allocators excluding our standalone memory allocator uHeapLmmm. |
---|
62 | |
---|
63 | \begin{tabularx}{0.8\textwidth} { |
---|
64 | | >{\raggedright\arraybackslash}X |
---|
65 | | >{\centering\arraybackslash}X |
---|
66 | | >{\raggedleft\arraybackslash}X | |
---|
67 | } |
---|
68 | \hline |
---|
69 | Memory Allocator & Version & Configurations \\ |
---|
70 | \hline |
---|
71 | dl & & \\ |
---|
72 | \hline |
---|
73 | hoard & & \\ |
---|
74 | \hline |
---|
75 | je & & \\ |
---|
76 | \hline |
---|
77 | pt3 & & \\ |
---|
78 | \hline |
---|
79 | rp & & \\ |
---|
80 | \hline |
---|
81 | tbb & & \\ |
---|
82 | \hline |
---|
83 | tc & & \\ |
---|
84 | \end{tabularx} |
---|
85 | |
---|
86 | %(FIX ME: complete table) |
---|
87 | |
---|
88 | \section{Experiment Environment} |
---|
89 | We conducted these experiments ... (FIX ME: what machine and which specifications to add). |
---|
90 | |
---|
91 | We used our micro becnhmark suite (FIX ME: cite mbench) to evaluate other memory allocators (FIX ME: cite above memory allocators) and our uHeapLmmm. |
---|
92 | |
---|
93 | \section{Results} |
---|
94 | |
---|
95 | \subsection{Memory Benchmark} |
---|
96 | FIX ME: add experiment, knobs, graphs, and description |
---|
97 | |
---|
98 | \subsection{Speed Benchmark} |
---|
99 | FIX ME: add experiment, knobs, graphs, and description |
---|
100 | |
---|
101 | \subsubsection{Speed Time} |
---|
102 | FIX ME: add experiment, knobs, graphs, and description |
---|
103 | |
---|
104 | \subsubsection{Speed Workload} |
---|
105 | FIX ME: add experiment, knobs, graphs, and description |
---|
106 | |
---|
107 | \subsection{Cache Scratch} |
---|
108 | FIX ME: add experiment, knobs, graphs, and description |
---|
109 | |
---|
110 | \subsubsection{Cache Scratch Time} |
---|
111 | FIX ME: add experiment, knobs, graphs, and description |
---|
112 | |
---|
113 | \subsubsection{Cache Scratch Layout} |
---|
114 | FIX ME: add experiment, knobs, graphs, and description |
---|
115 | |
---|
116 | \subsection{Cache Thrash} |
---|
117 | FIX ME: add experiment, knobs, graphs, and description |
---|
118 | |
---|
119 | \subsubsection{Cache Thrash Time} |
---|
120 | FIX ME: add experiment, knobs, graphs, and description |
---|
121 | |
---|
122 | \subsubsection{Cache Thrash Layout} |
---|
123 | FIX ME: add experiment, knobs, graphs, and description |
---|