source: doc/theses/mubeen_zulfiqar_MMath/intro.tex @ c3e41cda

Last change on this file since c3e41cda was 29d8c02, checked in by m3zulfiq <m3zulfiq@…>, 2 years ago

made corrections in thesis as per feedback from the readers

  • Property mode set to 100644
File size: 12.3 KB
Line 
1\chapter{Introduction}
2
3% Shared-memory multi-processor computers are ubiquitous and important for improving application performance.
4% However, writing programs that take advantage of multiple processors is not an easy task~\cite{Alexandrescu01b}, \eg shared resources can become a bottleneck when increasing (scaling) threads.
5% One crucial shared resource is program memory, since it is used by all threads in a shared-memory concurrent-program~\cite{Berger00}.
6% Therefore, providing high-performance, scalable memory-management is important for virtually all shared-memory multi-threaded programs.
7
8\vspace*{-23pt}
9Memory management takes a sequence of program generated allocation/deallocation requests and attempts to satisfy them within a fixed-sized block of memory while minimizing the total amount of memory used.
10A general-purpose dynamic-allocation algorithm cannot anticipate future allocation requests so its output is rarely optimal.
11However, memory allocators do take advantage of regularities in allocation patterns for typical programs to produce excellent results, both in time and space (similar to LRU paging).
12In general, allocators use a number of similar techniques, each optimizing specific allocation patterns.
13Nevertheless, memory allocators are a series of compromises, occasionally with some static or dynamic tuning parameters to optimize specific program-request patterns.
14
15
16\section{Memory Structure}
17\label{s:MemoryStructure}
18
19\VRef[Figure]{f:ProgramAddressSpace} shows the typical layout of a program's address space divided into the following zones (right to left): static code/data, dynamic allocation, dynamic code/data, and stack, with free memory surrounding the dynamic code/data~\cite{memlayout}.
20Static code and data are placed into memory at load time from the executable and are fixed-sized at runtime.
21Dynamic-allocation memory starts empty and grows/shrinks as the program dynamically creates/deletes variables with independent lifetime.
22The programming-language's runtime manages this area, where management complexity is a function of the mechanism for deleting variables.
23Dynamic code/data memory is managed by the dynamic loader for libraries loaded at runtime, which is complex especially in a multi-threaded program~\cite{Huang06}.
24However, changes to the dynamic code/data space are typically infrequent, many occurring at program startup, and are largely outside of a program's control.
25Stack memory is managed by the program call-mechanism using a simple LIFO technique, which works well for sequential programs.
26For multi-threaded programs (and coroutines), a new stack is created for each thread;
27these thread stacks are commonly created in dynamic-allocation memory.
28This thesis focuses on management of the dynamic-allocation memory.
29
30\begin{figure}
31\centering
32\input{AddressSpace}
33\vspace{-5pt}
34\caption{Program Address Space Divided into Zones}
35\label{f:ProgramAddressSpace}
36\end{figure}
37
38
39\section{Dynamic Memory-Management}
40\label{s:DynamicMemoryManagement}
41
42Modern programming languages manage dynamic-allocation memory in different ways.
43Some languages, such as Lisp~\cite{CommonLisp}, Java~\cite{Java}, Haskell~\cite{Haskell}, Go~\cite{Go}, provide explicit allocation but \emph{implicit} deallocation of data through garbage collection~\cite{Wilson92}.
44In general, garbage collection supports memory compaction, where dynamic (live) data is moved during runtime to better utilize space.
45However, moving data requires finding pointers to it and updating them to reflect new data locations.
46Programming languages such as C~\cite{C}, \CC~\cite{C++}, and Rust~\cite{Rust} provide the programmer with explicit allocation \emph{and} deallocation of data.
47These languages cannot find and subsequently move live data because pointers can be created to any storage zone, including internal components of allocated objects, and may contain temporary invalid values generated by pointer arithmetic.
48Attempts have been made to perform quasi garbage collection in C/\CC~\cite{Boehm88}, but it is a compromise.
49This thesis only examines dynamic memory-management with \emph{explicit} deallocation.
50While garbage collection and compaction are not part this work, many of the work's results are applicable to the allocation phase in any memory-management approach.
51
52Most programs use a general-purpose allocator, often the one provided implicitly by the programming-language's runtime.
53When this allocator proves inadequate, programmers often write specialize allocators for specific needs.
54C and \CC allow easy replacement of the default memory allocator with an alternative specialized or general-purpose memory-allocator.
55Jikes RVM MMTk~\cite{MMTk} provides a similar generalization for the Java virtual machine.
56However, high-performance memory-allocators for kernel and user multi-threaded programs are still being designed and improved.
57For this reason, several alternative general-purpose allocators have been written for C/\CC with the goal of scaling in a multi-threaded program~\cite{Berger00,mtmalloc,streamflow,tcmalloc}.
58This thesis examines the design of high-performance allocators for use by kernel and user multi-threaded applications written in C/\CC.
59
60
61\section{Contributions}
62\label{s:Contributions}
63
64This work provides the following contributions in the area of concurrent dynamic allocation:
65\begin{enumerate}[leftmargin=*]
66\item
67Implementation of a new stand-alone concurrent low-latency memory-allocator ($\approx$1,200 lines of code) for C/\CC programs using kernel threads (1:1 threading), and specialized versions of the allocator for the programming languages \uC and \CFA using user-level threads running over multiple kernel threads (M:N threading).
68
69\item
70Extend the standard C heap functionality by preserving with each allocation:
71\begin{itemize}[itemsep=0pt]
72\item
73its request size plus the amount allocated,
74\item
75whether an allocation is zero fill,
76\item
77and allocation alignment.
78\end{itemize}
79
80\item
81Use the preserved zero fill and alignment as \emph{sticky} properties for @realloc@ to zero-fill and align when storage is extended or copied.
82Without this extension, it is unsafe to @realloc@ storage initially allocated with zero-fill/alignment as these properties are not preserved when copying.
83This silent generation of a problem is unintuitive to programmers and difficult to locate because it is transient.
84
85\item
86Provide additional heap operations to complete programmer expectation with respect to accessing different allocation properties.
87\begin{itemize}
88\item
89@resize( oaddr, size )@ re-purpose an old allocation for a new type \emph{without} preserving fill or alignment.
90\item
91@resize( oaddr, alignment, size )@ re-purpose an old allocation with new alignment but \emph{without} preserving fill.
92\item
93@realloc( oaddr, alignment, size )@ same as @realloc@ but adding or changing alignment.
94\item
95@aalloc( dim, elemSize )@ same as @calloc@ except memory is \emph{not} zero filled.
96\item
97@amemalign( alignment, dim, elemSize )@ same as @aalloc@ with memory alignment.
98\item
99@cmemalign( alignment, dim, elemSize )@ same as @calloc@ with memory alignment.
100\end{itemize}
101
102\item
103Provide additional heap wrapper functions in \CFA creating a more usable set of allocation operations and properties.
104
105\item
106Provide additional query operations to access information about an allocation:
107\begin{itemize}
108\item
109@malloc_alignment( addr )@ returns the alignment of the allocation pointed-to by @addr@.
110If the allocation is not aligned or @addr@ is the @NULL@, the minimal alignment is returned.
111\item
112@malloc_zero_fill( addr )@ returns a boolean result indicating if the memory pointed-to by @addr@ is allocated with zero fill, e.g., by @calloc@/@cmemalign@.
113\item
114@malloc_size( addr )@ returns the size of the memory allocation pointed-to by @addr@.
115\item
116@malloc_usable_size( addr )@ returns the usable (total) size of the memory pointed-to by @addr@, i.e., the bin size containing the allocation, where @malloc_size( addr )@ $\le$ @malloc_usable_size( addr )@.
117\end{itemize}
118
119\item
120Provide complete, fast, and contention-free allocation statistics to help understand allocation behaviour:
121\begin{itemize}
122\item
123@malloc_stats()@ print memory-allocation statistics on the file-descriptor set by @malloc_stats_fd@.
124\item
125@malloc_info( options, stream )@ print memory-allocation statistics as an XML string on the specified file-descriptor set by @malloc_stats_fd@.
126\item
127@malloc_stats_fd( fd )@ set file-descriptor number for printing memory-allocation statistics (default @STDERR_FILENO@).
128This file descriptor is used implicitly by @malloc_stats@ and @malloc_info@.
129\end{itemize}
130
131\item
132Provide extensive runtime checks to validate allocation operations and identify the amount of unfreed storage at program termination.
133
134\item
135Build 4 different versions of the allocator:
136\begin{itemize}
137\item
138static or dynamic linking
139\item
140statistic/debugging (testing) or no statistic/debugging (performance)
141\end{itemize}
142A program may link to any of these 4 versions of the allocator often without recompilation.
143(It is possible to separate statistics and debugging, giving 8 different versions.)
144
145\item
146A micro-benchmark test-suite for comparing allocators rather than relying on a suite of arbitrary programs.
147These micro-benchmarks have adjustment knobs to simulate allocation patterns hard-coded into arbitrary test programs
148\end{enumerate}
149
150\begin{comment}
151\noindent
152====================
153
154Writing Points:
155\begin{itemize}
156\item
157Introduce dynamic memory allocation with brief background.
158\item
159Scope of the thesis.
160\item
161Importance of memory allocation and micro-benchmark suite.
162\item
163Research problem.
164\item
165Research objectives.
166\item
167The vision behind cfa-malloc.
168\item
169An outline of the thesis.
170\end{itemize}
171
172\noindent
173====================
174
175\section{Introduction}
176Dynamic memory allocation and management is one of the core features of C. It gives programmer the freedom to allocate, free, use, and manage dynamic memory himself. The programmer is not given the complete control of the dynamic memory management instead an interface of memory allocator is given to the programmer that can be used to allocate/free dynamic memory for the application's use.
177
178Memory allocator is a layer between the programmer and the system. Allocator gets dynamic memory from the system in heap/mmap area of application storage and manages it for programmer's use.
179
180GNU C Library (FIX ME: cite this) provides an interchangeable memory allocator that can be replaced with a custom memory allocator that supports required features and fulfills application's custom needs. It also allows others to innovate in memory allocation and design their own memory allocator. GNU C Library has set guidelines that should be followed when designing a stand-alone memory allocator. GNU C Library requires new memory allocators to have at lease following set of functions in their allocator's interface:
181
182\begin{itemize}
183\item
184malloc: it allocates and returns a chunk of dynamic memory of requested size (FIX ME: cite man page).
185\item
186calloc: it allocates and returns an array in dynamic memory of requested size (FIX ME: cite man page).
187\item
188realloc: it reallocates and returns an already allocated chunk of dynamic memory to a new size (FIX ME: cite man page).
189\item
190free: it frees an already allocated piece of dynamic memory (FIX ME: cite man page).
191\end{itemize}
192
193In addition to the above functions, GNU C Library also provides some more functions to increase the usability of the dynamic memory allocator. Most stand-alone allocators also provide all or some of the above additional functions.
194
195\begin{itemize}
196\item
197aligned\_alloc
198\item
199malloc\_usable\_size
200\item
201memalign
202\item
203posix\_memalign
204\item
205pvalloc
206\item
207valloc
208\end{itemize}
209
210With the rise of concurrent applications, memory allocators should be able to fulfill dynamic memory requests from multiple threads in parallel without causing contention on shared resources. There needs to be a set of a standard benchmarks that can be used to evaluate an allocator's performance in different scenarios.
211
212\section{Research Objectives}
213Our research objective in this thesis is to:
214
215\begin{itemize}
216\item
217Design a lightweight concurrent memory allocator with added features and usability that are currently not present in the other memory allocators.
218\item
219Design a suite of benchmarks to evaluate multiple aspects of a memory allocator.
220\end{itemize}
221
222\section{An outline of the thesis}
223LAST FIX ME: add outline at the end
224\end{comment}
Note: See TracBrowser for help on using the repository browser.