Changes in / [11f92fac:de8a0a4b]


Ignore:
Files:
1 deleted
4 edited

Legend:

Unmodified
Added
Removed
  • doc/bibliography/pl.bib

    r11f92fac rde8a0a4b  
    1 
    21% Conventions: uncross-referenced entries appear first, then
    32%    cross-referenced entries.  In both groups, entries are sorted by their
     
    39313930
    39323931@article{Boehm88,
    3933     keywords    = {conservative garbage collection, C},
     3932    keywords    = {conservative garbage collection, C, storage management, debugging},
    39343933    contributer = {gjditchfield@plg},
    39353934    author      = {Hans-Juergen Boehm and Mark Weiser},
  • doc/papers/llheap/Paper.tex

    r11f92fac rde8a0a4b  
    187187\author[1]{Peter A. Buhr*}
    188188\author[2]{Bryan Chan}
     189\author[3]{Dave Dice}
    189190\authormark{ZULFIQAR \textsc{et al.}}
    190191
    191192\address[1]{\orgdiv{Cheriton School of Computer Science}, \orgname{University of Waterloo}, \orgaddress{\state{Waterloo, ON}, \country{Canada}}}
    192193\address[2]{\orgdiv{Huawei Compiler Lab}, \orgname{Huawei}, \orgaddress{\state{Markham, ON}, \country{Canada}}}
     194\address[3]{\orgdiv{Oracle Labs}, \orgname{Oracle}, \orgaddress{\state{Burlington, MA}, \country{USA}}}
     195
    193196
    194197\corres{*Peter A. Buhr, Cheriton School of Computer Science, University of Waterloo, 200 University Avenue West, Waterloo, ON N2L 3G1, Canada. \email{pabuhr{\char`\@}uwaterloo.ca}}
     
    201204llheap extends the feature set of existing C allocation by remembering zero-filled (\lstinline{calloc}) and aligned properties (\lstinline{memalign}) in an allocation.
    202205These properties can be queried, allowing programmers to write safer programs by preserving these properties in future allocations.
    203 As well, \lstinline{realloc} preserves these properties when adjusting storage size, again increasing future allocation safety.
     206As well, \lstinline{realloc}/\lstinline{reallocarray} preserve these properties when adjusting storage size, again increasing future allocation safety.
    204207llheap also extends the C allocation API with \lstinline{aalloc}, \lstinline{amemalign}, \lstinline{cmemalign}, \lstinline{resize}, and extended \lstinline{realloc}, providing orthogonal access to allocation features;
    205208hence, programmers do have to code missing combinations.
     
    226229\section{Introduction}
    227230
    228 Memory management services a series of program allocation/deallocation requests and attempts to satisfy them from a variable-sized block of memory, while minimizing total memory usage.
    229 A general-purpose dynamic-allocation algorithm cannot anticipate allocation requests so its time and space performance is rarely optimal.
    230 However, allocators take advantage of regular allocation patterns in typical programs to produce excellent results, both in time and space (similar to LRU paging).
    231 Allocators use a number of similar techniques, but each optimizes specific allocation patterns.
    232 Nevertheless, allocators are a series of compromises, occasionally with some static or dynamic tuning parameters to optimize specific program-request patterns.
     231Memory management services a series of program allocation/deallocation requests and attempts to satisfy them from a variable-sized block(s) of memory, while minimizing total memory usage.
     232A general-purpose dynamic-allocation algorithm cannot anticipate allocation requests so its time and space performance is rarely optimal (bin packing).
     233However, allocators take advantage of allocation patterns in typical programs (heuristics) to produce excellent results, both in time and space (similar to LRU paging).
     234Allocators use similar techniques, but each optimizes specific allocation patterns.
     235Nevertheless, allocators are a series of compromises, occasionally with some static or dynamic tuning parameters to optimize specific request patterns.
    233236
    234237
     
    283286\begin{enumerate}[leftmargin=*,itemsep=0pt]
    284287\item
    285 Implementation of a new stand-alone concurrent low-latency memory-allocator ($\approx$1,200 lines of code) for C/\CC programs using kernel threads (1:1 threading), and specialized versions for the concurrent languages \uC~\cite{uC++} and \CFA~\cite{Moss18,Delisle21} using user-level threads running on multiple kernel threads (M:N threading).
     288Implementation of a new stand-alone concurrent low-latency memory-allocator ($\approx$1,400 lines of code) for C/\CC programs using kernel threads (1:1 threading), and specialized versions for the concurrent languages \uC~\cite{uC++} and \CFA~\cite{Moss18,Delisle21} using user-level threads running on multiple kernel threads (M:N threading).
    286289
    287290\item
     
    289292
    290293\item
    291 Use the preserved zero fill and alignment as \emph{sticky} properties for @realloc@ to zero-fill and align when storage is extended or copied.
     294Use the preserved zero fill and alignment as \emph{sticky} properties for @realloc@ and @reallocarray@ to zero-fill and align when storage is extended or copied.
    292295Without this extension, it is unsafe to @realloc@ storage these allocations if the properties are not preserved when copying.
    293296This silent problem is unintuitive to programmers and difficult to locate because it is transient.
     
    295298\item
    296299Provide additional heap operations to make allocation properties orthogonally accessible.
    297 \begin{itemize}[topsep=2pt,itemsep=2pt,parsep=0pt]
    298 \item
    299 @aalloc( dim, elemSize )@ same as @calloc@ except memory is \emph{not} zero filled.
    300 \item
    301 @amemalign( alignment, dim, elemSize )@ same as @aalloc@ with memory alignment.
    302 \item
    303 @cmemalign( alignment, dim, elemSize )@ same as @calloc@ with memory alignment.
     300\begin{itemize}[topsep=0pt,itemsep=0pt,parsep=0pt]
     301\item
     302@aalloc( dimension, elemSize )@ same as @calloc@ except memory is \emph{not} zero filled, which is significantly faster than @calloc@.
     303\item
     304@amemalign( alignment, dimension, elemSize )@ same as @aalloc@ with memory alignment.
     305\item
     306@cmemalign( alignment, dimension, elemSize )@ same as @calloc@ with memory alignment.
    304307\item
    305308@resize( oaddr, size )@ re-purpose an old allocation for a new type \emph{without} preserving fill or alignment.
    306309\item
    307 @resize( oaddr, alignment, size )@ re-purpose an old allocation with new alignment but \emph{without} preserving fill.
    308 \item
    309 @realloc( oaddr, alignment, size )@ same as @realloc@ but adding or changing alignment.
     310@aligned_resize( oaddr, alignment, size )@ re-purpose an old allocation with new alignment but \emph{without} preserving fill.
     311\item
     312@aligned_realloc( oaddr, alignment, size )@ same as @realloc@ but adding or changing alignment.
     313\item
     314@aligned_reallocarray( oaddr, alignment, dimension, elemSize )@ same as @reallocarray@ but adding or changing alignment.
    310315\end{itemize}
    311316
    312317\item
    313318Provide additional query operations to access information about an allocation:
    314 \begin{itemize}[topsep=3pt,itemsep=2pt,parsep=0pt]
     319\begin{itemize}[topsep=0pt,itemsep=0pt,parsep=0pt]
    315320\item
    316321@malloc_alignment( addr )@ returns the alignment of the allocation.
    317322If the allocation is not aligned or @addr@ is @NULL@, the minimal alignment is returned.
    318323\item
    319 @malloc_zero_fill( addr )@ returns a boolean result indicating if the memory is allocated with zero fill, e.g., by @calloc@/@cmemalign@.
     324@malloc_zero_fill( addr )@ returns a boolean result indicating if the memory is allocated with zero fill, \eg by @calloc@/@cmemalign@.
    320325\item
    321326@malloc_size( addr )@ returns the size of the memory allocation.
    322327\item
    323 @malloc_usable_size( addr )@ returns the usable (total) size of the memory, i.e., the bin size containing the allocation, where @malloc_size( addr )@ $\le$ @malloc_usable_size( addr )@.
     328@malloc_usable_size( addr )@ returns the usable (total) size of the memory, \ie the bin size containing the allocation, where @malloc_size( addr )@ $\le$ @malloc_usable_size( addr )@.
    324329\end{itemize}
    325330
    326331\item
    327332Provide optional extensive, fast, and contention-free allocation statistics to understand allocation behaviour, accessed by:
    328 \begin{itemize}[topsep=3pt,itemsep=2pt,parsep=0pt]
     333\begin{itemize}[topsep=0pt,itemsep=0pt,parsep=0pt]
    329334\item
    330335@malloc_stats()@ print memory-allocation statistics on the file-descriptor set by @malloc_stats_fd@ (default @stderr@).
     
    359364The management goals are to make allocation/deallocation operations as fast as possible while densely packing objects to make efficient use of memory.
    360365Since objects in C/\CC cannot be moved to aid the packing process, only adjacent free storage can be \newterm{coalesced} into larger free areas.
    361 The allocator grows or shrinks the dynamic-allocation zone to obtain storage for objects and reduce memory usage via operating-system calls, such as @mmap@ or @sbrk@ in UNIX.
     366The allocator grows or shrinks the dynamic-allocation zone to obtain storage for objects and reduce memory usage via OS calls, such as @mmap@ or @sbrk@ in UNIX.
    362367
    363368
     
    984989That is, rather than requesting new storage for a single object, an entire buffer is requested from which multiple objects are allocated later.
    985990Any heap may use an allocation buffer, resulting in allocation from the buffer before requesting objects (containers) from the global heap or OS, respectively.
    986 The allocation buffer reduces contention and the number of global/operating-system calls.
     991The allocation buffer reduces contention and the number of global/OS calls.
    987992For coalescing, a buffer is split into smaller objects by allocations, and recomposed into larger buffer areas during deallocations.
    988993
     
    10211026
    10221027
    1023 \section{Allocator}
    1024 \label{c:Allocator}
    1025 
    1026 This section presents a new stand-alone concurrent low-latency memory-allocator ($\approx$1,200 lines of code), called llheap (low-latency heap), for C/\CC programs using kernel threads (1:1 threading), and specialized versions of the allocator for the programming languages \uC and \CFA using user-level threads running over multiple kernel threads (M:N threading).
    1027 The new allocator fulfills the GNU C Library allocator API~\cite{GNUallocAPI}.
    1028 
    1029 
    1030 \subsection{llheap}
    1031 
    1032 The primary design objective for llheap is low-latency across all allocator calls independent of application access-patterns and/or number of threads, \ie very seldom does the allocator have a delay during an allocator call.
     1028\section{llheap}
     1029
     1030This section presents our new stand-alone, concurrent, low-latency memory-allocator, called llheap (low-latency heap), fulfilling the GNU C Library allocator API~\cite{GNUallocAPI} for C/\CC programs using kernel threads (1:1 threading), with specialized versions for the programming languages \uC and \CFA using user-level threads running over multiple kernel threads (M:N threading).
     1031The primary design objective for llheap is low-latency across all allocator calls independent of application access-patterns and/or number of threads, \ie very seldom does the allocator delay during an allocator call.
    10331032Excluded from the low-latency objective are (large) allocations requiring initialization, \eg zero fill, and/or data copying, which are outside the allocator's purview.
    10341033A direct consequence of this objective is very simple or no storage coalescing;
    10351034hence, llheap's design is willing to use more storage to lower latency.
    1036 This objective is apropos because systems research and industrial applications are striving for low latency and computers have huge amounts of RAM memory.
     1035This objective is apropos because systems research and industrial applications are striving for low latency and modern computers have huge amounts of RAM memory.
    10371036Finally, llheap's performance should be comparable with the current best allocators, both in space and time (see performance comparison in Section~\ref{c:Performance}).
    10381037
    1039 % The objective of llheap's new design was to fulfill following requirements:
    1040 % \begin{itemize}
    1041 % \item It should be concurrent and thread-safe for multi-threaded programs.
    1042 % \item It should avoid global locks, on resources shared across all threads, as much as possible.
    1043 % \item It's performance (FIX ME: cite performance benchmarks) should be comparable to the commonly used allocators (FIX ME: cite common allocators).
    1044 % \item It should be a lightweight memory allocator.
    1045 % \end{itemize}
    1046 
    1047 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    10481038
    10491039\subsection{Design Choices}
    10501040
    1051 % Some of the rejected designs are discussed because they show the path to the final design (see discussion in Section~\ref{s:MultipleHeaps}).
    1052 % Note, a few simple tests for a design choice were compared with the current best allocators to determine the viability of a design.
    1053 
    1054 
    1055 % \paragraph{T:1 model}
    1056 % Figure~\ref{f:T1SharedBuckets} shows one heap accessed by multiple kernel threads (KTs) using a bucket array, where smaller bucket sizes are shared among N KTs.
    1057 % This design leverages the fact that usually the allocation requests are less than 1024 bytes and there are only a few different request sizes.
    1058 % When KTs $\le$ N, the common bucket sizes are uncontented;
    1059 % when KTs $>$ N, the free buckets are contented and latency increases significantly.
    1060 % In all cases, a KT must acquire/release a lock, contented or uncontented, along the fast allocation path because a bucket is shared.
    1061 % Therefore, while threads are contending for a small number of buckets sizes, the buckets are distributed among them to reduce contention, which lowers latency;
    1062 % however, picking N is workload specific.
    1063 %
    1064 % \begin{figure}
    1065 % \centering
    1066 % \input{AllocDS1}
    1067 % \caption{T:1 with Shared Buckets}
    1068 % \label{f:T1SharedBuckets}
    1069 % \end{figure}
    1070 %
    1071 % Problems:
    1072 % \begin{itemize}
    1073 % \item
    1074 % Need to know when a KT is created/destroyed to assign/unassign a shared bucket-number from the memory allocator.
    1075 % \item
    1076 % When no thread is assigned a bucket number, its free storage is unavailable.
    1077 % \item
    1078 % All KTs contend for the global-pool lock for initial allocations, before free-lists get populated.
    1079 % \end{itemize}
    1080 % Tests showed having locks along the allocation fast-path produced a significant increase in allocation costs and any contention among KTs produces a significant spike in latency.
    1081 
    1082 % \paragraph{T:H model}
    1083 % Figure~\ref{f:THSharedHeaps} shows a fixed number of heaps (N), each a local free pool, where the heaps are sharded (distributed) across the KTs.
    1084 % A KT can point directly to its assigned heap or indirectly through the corresponding heap bucket.
    1085 % When KT $\le$ N, the heaps might be uncontented;
    1086 % when KTs $>$ N, the heaps are contented.
    1087 % In all cases, a KT must acquire/release a lock, contented or uncontented along the fast allocation path because a heap is shared.
    1088 % By increasing N, this approach reduces contention but increases storage (time versus space);
    1089 % however, picking N is workload specific.
    1090 %
    1091 % \begin{figure}
    1092 % \centering
    1093 % \input{AllocDS2}
    1094 % \caption{T:H with Shared Heaps}
    1095 % \label{f:THSharedHeaps}
    1096 % \end{figure}
    1097 %
    1098 % Problems:
    1099 % \begin{itemize}
    1100 % \item
    1101 % Need to know when a KT is created/destroyed to assign/unassign a heap from the memory allocator.
    1102 % \item
    1103 % When no thread is assigned to a heap, its free storage is unavailable.
    1104 % \item
    1105 % Ownership issues arise (see Section~\ref{s:Ownership}).
    1106 % \item
    1107 % All KTs contend for the local/global-pool lock for initial allocations, before free-lists get populated.
    1108 % \end{itemize}
    1109 % Tests showed having locks along the allocation fast-path produced a significant increase in allocation costs and any contention among KTs produces a significant spike in latency.
    1110 
    1111 % \paragraph{T:H model, H = number of CPUs}
    1112 % This design is the T:H model but H is set to the number of CPUs on the computer or the number restricted to an application, \eg via @taskset@.
    1113 % (See Figure~\ref{f:THSharedHeaps} but with a heap bucket per CPU.)
    1114 % Hence, each CPU logically has its own private heap and local pool.
    1115 % A memory operation is serviced from the heap associated with the CPU executing the operation.
    1116 % This approach removes fastpath locking and contention, regardless of the number of KTs mapped across the CPUs, because only one KT is running on each CPU at a time (modulo operations on the global pool and ownership).
    1117 % This approach is essentially an M:N approach where M is the number if KTs and N is the number of CPUs.
    1118 %
    1119 % Problems:
    1120 % \begin{itemize}
    1121 % \item
    1122 % Need to know when a CPU is added/removed from the @taskset@.
    1123 % \item
    1124 % Need a fast way to determine the CPU a KT is executing on to access the appropriate heap.
    1125 % \item
    1126 % Need to prevent preemption during a dynamic memory operation because of the \newterm{serially-reusable problem}.
    1127 % \begin{quote}
    1128 % A sequence of code that is guaranteed to run to completion before being invoked to accept another input is called serially-reusable code.~\cite{SeriallyReusable}\label{p:SeriallyReusable}
    1129 % \end{quote}
    1130 % If a KT is preempted during an allocation operation, the OS can schedule another KT on the same CPU, which can begin an allocation operation before the previous operation associated with this CPU has completed, invalidating heap correctness.
    1131 % Note, the serially-reusable problem can occur in sequential programs with preemption, if the signal handler calls the preempted function, unless the function is serially reusable.
    1132 % Essentially, the serially-reusable problem is a race condition on an unprotected critical subsection, where the OS is providing the second thread via the signal handler.
    1133 %
    1134 % Library @librseq@~\cite{librseq} was used to perform a fast determination of the CPU and to ensure all memory operations complete on one CPU using @librseq@'s restartable sequences, which restart the critical subsection after undoing its writes, if the critical subsection is preempted.
    1135 % \end{itemize}
    1136 % Tests showed that @librseq@ can determine the particular CPU quickly but setting up the restartable critical-subsection along the allocation fast-path produced a significant increase in allocation costs.
    1137 % Also, the number of undoable writes in @librseq@ is limited and restartable sequences cannot deal with user-level thread (UT) migration across KTs.
    1138 % For example, UT$_1$ is executing a memory operation by KT$_1$ on CPU$_1$ and a time-slice preemption occurs.
    1139 % The signal handler context switches UT$_1$ onto the user-level ready-queue and starts running UT$_2$ on KT$_1$, which immediately calls a memory operation.
    1140 % Since KT$_1$ is still executing on CPU$_1$, @librseq@ takes no action because it assumes KT$_1$ is still executing the same critical subsection.
    1141 % Then UT$_1$ is scheduled onto KT$_2$ by the user-level scheduler, and its memory operation continues in parallel with UT$_2$ using references into the heap associated with CPU$_1$, which corrupts CPU$_1$'s heap.
    1142 % If @librseq@ had an @rseq_abort@ which:
    1143 % \begin{enumerate}
    1144 % \item
    1145 % Marked the current restartable critical-subsection as cancelled so it restarts when attempting to commit.
    1146 % \item
    1147 % Do nothing if there is no current restartable critical subsection in progress.
    1148 % \end{enumerate}
    1149 % Then @rseq_abort@ could be called on the backside of a  user-level context-switching.
    1150 % A feature similar to this idea might exist for hardware transactional-memory.
    1151 % A significant effort was made to make this approach work but its complexity, lack of robustness, and performance costs resulted in its rejection.
    1152 
    1153 % \subsubsection{Allocation Fastpath}
    1154 % \label{s:AllocationFastpath}
    1155 
    1156 llheap's design was reviewed and changed multiple times during its development, with the final choices are discussed here.
    1157 (See~\cite{Zulfiqar22} for a discussion of alternate choices and reasons for rejecting them.)
    1158 All designs were analyzed for the allocation/free \newterm{fastpath}, \ie when an allocation can immediately return free storage or returned storage is not coalesced.
    1159 The heap model chosen is 1:1, which is the T:H model with T = H, where there is one thread-local heap for each KT.
     1041llheap's design was reviewed and changed multiple times during its development, with the final choices discussed here.
     1042All designs focused on the allocation/free \newterm{fastpath}, \ie the shortest code path for the most common operations, \eg when an allocation can immediately return free storage or returned storage is not coalesced.
     1043The model chosen is 1:1, so there is one thread-local heap for each KT.
    11601044(See Figure~\ref{f:THSharedHeaps} but with a heap bucket per KT and no bucket or local-pool lock.)
    11611045Hence, immediately after a KT starts, its heap is created and just before a KT terminates, its heap is (logically) deleted.
    1162 Heaps are uncontended for a KTs memory operations as every KT has its own thread-local heap, modulo operations on the global pool and ownership.
     1046Therefore, heaps are uncontended for a KTs memory operations as every KT has its own thread-local heap, modulo operations on the global pool and ownership.
    11631047
    11641048Problems:
     
    12051089For the T:1 and T:H models, locking must exist along the allocation fastpath because the buckets or heaps might be shared by multiple threads, even when KTs $\le$ N.
    12061090For the T:H=CPU and 1:1 models, locking is eliminated along the allocation fastpath.
    1207 However, T:H=CPU has poor operating-system support to determine the CPU id (heap id) and prevent the serially-reusable problem for KTs.
     1091However, T:H=CPU has poor OS support to determine the CPU id (heap id) and prevent the serially-reusable problem for KTs.
    12081092More OS support is required to make this model viable, but there is still the serially-reusable problem with user-level threading.
    1209 So the 1:1 model had no atomic actions along the fastpath and no special operating-system support requirements.
     1093So the 1:1 model had no atomic actions along the fastpath and no special OS support requirements.
    12101094The 1:1 model still has the serially-reusable problem with user-level threading, which is addressed in Section~\ref{s:UserlevelThreadingSupport}, and the greatest potential for heap blowup for certain allocation patterns.
    12111095
     
    12411125A primary goal of llheap is low latency, hence the name low-latency heap (llheap).
    12421126Two forms of latency are internal and external.
    1243 Internal latency is the time to perform an allocation, while external latency is time to obtain/return storage from/to the OS.
     1127Internal latency is the time to perform an allocation, while external latency is time to obtain or return storage from or to the OS.
    12441128Ideally latency is $O(1)$ with a small constant.
    12451129
    1246 To obtain $O(1)$ internal latency means no searching on the allocation fastpath and largely prohibits coalescing, which leads to external fragmentation.
    1247 The mitigating factor is that most programs have well behaved allocation patterns, where the majority of allocation operations can be $O(1)$, and heap blowup does not occur without coalescing (although the allocation footprint may be slightly larger).
    1248 
    1249 To obtain $O(1)$ external latency means obtaining one large storage area from the OS and subdividing it across all program allocations, which requires a good guess at the program storage high-watermark and potential large external fragmentation.
    1250 Excluding real-time operating-systems, operating-system operations are unbounded, and hence some external latency is unavoidable.
    1251 The mitigating factor is that operating-system calls can often be reduced if a programmer has a sense of the storage high-watermark and the allocator is capable of using this information (see @malloc_expansion@ \pageref{p:malloc_expansion}).
    1252 Furthermore, while operating-system calls are unbounded, many are now reasonably fast, so their latency is tolerable and infrequent.
     1130$O(1)$ internal latency means no open searching on the allocation fastpath, which largely prohibits coalescing.
     1131The mitigating factor is that most programs have a small, fixed, allocation pattern, where the majority of allocation operations can be $O(1)$ and heap blowup does not occur without coalescing (although the allocation footprint may be slightly larger).
     1132Modern computers have large memories so a slight increase in program footprint is not a problem.
     1133
     1134$O(1)$ external latency means obtaining one large storage area from the OS and subdividing it across all program allocations, which requires a good guess at the program storage high-watermark and potential large external fragmentation.
     1135Excluding real-time OSs, OS operations are unbounded, and hence some external latency is unavoidable.
     1136The mitigating factor is that OS calls can often be reduced if a programmer has a sense of the storage high-watermark and the allocator is capable of using this information (see @malloc_expansion@ \pageref{p:malloc_expansion}).
     1137Furthermore, while OS calls are unbounded, many are now reasonably fast, so their latency is tolerable because it occurs infrequently.
    12531138
    12541139
     
    13921277\subsubsection{Alignment}
    13931278
    1394 Most dynamic memory allocations have a minimum storage alignment for the contained object(s).
    1395 Often the minimum memory alignment, M, is the bus width (32 or 64-bit) or the largest register (double, long double) or largest atomic instruction (DCAS) or vector data (MMMX).
    1396 In general, the minimum storage alignment is 8/16-byte boundary on 32/64-bit computers.
    1397 For consistency, the object header is normally aligned at this same boundary.
    1398 Larger alignments must be a power of 2, such as page alignment (4/8K).
    1399 Any alignment request, N, $\le$ the minimum alignment is handled as a normal allocation with minimal alignment.
    1400 
    1401 For alignments greater than the minimum, the obvious approach for aligning to address @A@ is: compute the next address that is a multiple of @N@ after the current end of the heap, @E@, plus room for the header before @A@ and the size of the allocation after @A@, moving the end of the heap to @E'@.
    1402 \begin{center}
    1403 \input{Alignment1}
    1404 \end{center}
    1405 The storage between @E@ and @H@ is chained onto the appropriate free list for future allocations.
    1406 The same approach is used for sufficiently large free blocks, where @E@ is the start of the free block, and any unused storage before @H@ or after the allocated object becomes free storage.
    1407 In this approach, the aligned address @A@ is the same as the allocated storage address @P@, \ie @P@ $=$ @A@ for all allocation routines, which simplifies deallocation.
    1408 However, if there are a large number of aligned requests, this approach leads to memory fragmentation from the small free areas around the aligned object.
    1409 As well, it does not work for large allocations, where many memory allocators switch from program @sbrk@ to operating-system @mmap@.
    1410 The reason is that @mmap@ only starts on a page boundary, and it is difficult to reuse the storage before the alignment boundary for other requests.
    1411 Finally, this approach is incompatible with allocator designs that funnel allocation requests through @malloc@ as it directly manipulates management information within the allocator to optimize the space/time of a request.
    1412 
    1413 Instead, llheap alignment is accomplished by making a \emph{pessimistic} allocation request for sufficient storage to ensure that \emph{both} the alignment and size request are satisfied, \eg:
     1279Allocators have a different minimum storage alignment from the hardware's basic types.
     1280Often the minimum allocator alignment, $M$, is the bus width (32 or 64-bit), the largest register (double, long double), largest atomic instruction (DCAS), or vector data (MMMX).
     1281The reason for this larger requirement is the lack of knowledge about the data type occupying the allocation.
     1282Hence, an allocator assumes the worst-case scenario for the start of data and the compiler correctly aligns items within this data because it knows their types.
     1283Often the minimum storage alignment is an 8/16-byte boundary on a 32/64-bit computer.
     1284Alignments larger than $M$ are normally a power of 2, such as page alignment (4/8K).
     1285Any alignment less than $M$ is raised to the minimal alignment.
     1286
     1287llheap aligns its header at the $M$ boundary and its size is $M$;
     1288hence, data following the header is aligned at $M$.
     1289This pattern means there is no minimal alignment computation along the allocation fastpath, \ie new storage and reused storage is always correctly aligned.
     1290An alignment $N$ greater than $M$ is accomplished with a \emph{pessimistic} request for storage that ensures \emph{both} the alignment and size request are satisfied, \eg:
    14141291\begin{center}
    14151292\input{Alignment2}
    14161293\end{center}
    1417 The amount of storage necessary is @alignment - M + size@, which ensures there is an address, @A@, after the storage returned from @malloc@, @P@, that is a multiple of @alignment@ followed by sufficient storage for the data object.
    1418 The approach is pessimistic because if @P@ already has the correct alignment @N@, the initial allocation has already requested sufficient space to move to the next multiple of @N@.
    1419 For this special case, there is @alignment - M@ bytes of unused storage after the data object, which subsequently can be used by @realloc@.
    1420 
    1421 Note, the address returned is @A@, which is subsequently returned to @free@.
    1422 However, to correctly free the allocated object, the value @P@ must be computable, since that is the value generated by @malloc@ and returned within @memalign@.
    1423 Hence, there must be a mechanism to detect when @P@ $\neq$ @A@ and how to compute @P@ from @A@.
    1424 
    1425 The llheap approach uses two headers:
    1426 the \emph{original} header associated with a memory allocation from @malloc@, and a \emph{fake} header within this storage before the alignment boundary @A@, which is returned from @memalign@, e.g.:
     1294The amount of storage necessary is $alignment - M + size$, which ensures there is an address, $A$, after the storage returned from @malloc@, $P$, that is a multiple of $alignment$ followed by sufficient storage for the data object.
     1295The approach is pessimistic if $P$ happens to have the correct alignment $N$, and the initial allocation has requested sufficient space to move to the next multiple of $N$.
     1296In this case, there is $alignment - M$ bytes of unused storage after the data object, which could be used by @realloc@.
     1297Note, the address returned by the allocation is $A$, which is subsequently returned to @free@.
     1298To correctly free the object, the value $P$ must be computable from $A$, since that is the actual start of the allocation, from which $H$ can be computed $P - M$.
     1299Hence, there must be a mechanism to detect when $P$ $\neq$ $A$ and then compute $P$ from $A$.
     1300
     1301To detect and perform this computation, llheap uses two headers:
     1302the \emph{original} header $H$ associated with the allocation, and a \emph{fake} header $F$ within this storage before the alignment boundary $A$, e.g.:
    14271303\begin{center}
    14281304\input{Alignment2Impl}
    14291305\end{center}
    1430 Since @malloc@ has a minimum alignment of @M@, @P@ $\neq$ @A@ only holds for alignments greater than @M@.
    1431 When @P@ $\neq$ @A@, the minimum distance between @P@ and @A@ is @M@ bytes, due to the pessimistic storage-allocation.
    1432 Therefore, there is always room for an @M@-byte fake header before @A@.
    1433 
    1434 The fake header must supply an indicator to distinguish it from a normal header and the location of address @P@ generated by @malloc@.
     1306Since every allocation is aligned at $M$, $P$ $\neq$ $A$ only holds for alignments greater than $M$.
     1307When $P$ $\neq$ $A$, the minimum distance between $P$ and $A$ is $M$ bytes, due to the pessimistic storage-allocation.
     1308Therefore, there is always room for an $M$-byte fake header before $A$.
     1309The fake header must supply an indicator to distinguish it from a normal header and the location of address $P$ generated by the allocation.
    14351310This information is encoded as an offset from A to P and the initialize alignment (discussed in Section~\ref{s:ReallocStickyProperties}).
    14361311To distinguish a fake header from a normal header, the least-significant bit of the alignment is used because the offset participates in multiple calculations, while the alignment is just remembered data.
     
    14431318\label{s:ReallocStickyProperties}
    14441319
    1445 The allocation routine @realloc@ provides a memory-management pattern for shrinking/enlarging an existing allocation, while maintaining some or all of the object data, rather than performing the following steps manually.
     1320The allocation routine @realloc@ provides a memory-management pattern for shrinking/enlarging an existing allocation, while maintaining some or all of the object data.
     1321The realloc pattern is simpler than the suboptimal manually steps.
    14461322\begin{flushleft}
    14471323\begin{tabular}{ll}
     
    14551331&
    14561332\begin{lstlisting}
    1457 T * naddr = (T *)malloc( newSize ); $\C[2.4in]{// new storage}$
     1333T * naddr = (T *)malloc( newSize ); $\C[2in]{// new storage}$
    14581334memcpy( naddr, addr, oldSize );  $\C{// copy old bytes}$
    14591335free( addr );                           $\C{// free old storage}$
     
    14621338\end{tabular}
    14631339\end{flushleft}
    1464 The realloc pattern leverages available storage at the end of an allocation due to bucket sizes, possibly eliminating a new allocation and copying.
    1465 This pattern is not used enough to reduce storage management costs.
    1466 In fact, if @oaddr@ is @nullptr@, @realloc@ does a @malloc@, so even the initial @malloc@ can be a @realloc@ for consistency in the allocation pattern.
    1467 
    1468 The hidden problem for this pattern is the effect of zero fill and alignment with respect to reallocation.
    1469 Are these properties transient or persistent (``sticky'')?
    1470 For example, when memory is initially allocated by @calloc@ or @memalign@ with zero fill or alignment properties, respectively, what happens when those allocations are given to @realloc@ to change size?
    1471 That is, if @realloc@ logically extends storage into unused bucket space or allocates new storage to satisfy a size change, are initial allocation properties preserved?
    1472 Currently, allocation properties are not preserved, so subsequent use of @realloc@ storage may cause inefficient execution or errors due to lack of zero fill or alignment.
    1473 This silent problem is unintuitive to programmers and difficult to locate because it is transient.
    1474 To prevent these problems, llheap preserves initial allocation properties for the lifetime of an allocation and the semantics of @realloc@ are augmented to preserve these properties, with additional query routines.
    1475 This change makes the realloc pattern efficient and safe.
     1340The manual steps are suboptimal because there may be sufficient internal fragmentation at the end of the allocation due to bucket sizes.
     1341If this storage is large enough, it eliminates a new allocation and copying.
     1342Alternatively, if the storage is made smaller, there may be a reasonable crossover point, where just increasing the internal fragmentation eliminates a new allocation and copying.
     1343This pattern should be used more frequently to reduce storage management costs.
     1344In fact, if @oaddr@ is @nullptr@, @realloc@ does a @malloc( newSize)@, and if @newSize@ is 0, @realloc@ does a @free( oaddr )@, so all allocation/deallocation can be done with @realloc@.
     1345
     1346The hidden problem with this pattern is the effect of zero fill and alignment with respect to reallocation.
     1347For safety, we argue these properties should be persistent (``sticky'') and not transient.
     1348For example, when memory is initially allocated by @calloc@ or @memalign@ with zero fill or alignment properties, any subsequent reallocations of this storage must preserve these properties.
     1349Currently, allocation properties are not preserved nor is it possible to query an allocation to maintain these properties manually.
     1350Hence, subsequent use of @realloc@ storage that assumes any initially properties may cause errors.
     1351This silent problem is unintuitive to programmers, can cause catastrophic failure, and is difficult to debug because it is transient.
     1352To prevent these problems, llheap preserves initial allocation properties within an allocation, allowing them to be queried, and the semantics of @realloc@ preserve these properties on any storage change.
     1353As a result, the realloc pattern is efficient and safe.
    14761354
    14771355
    14781356\subsubsection{Header}
    14791357
    1480 To preserve allocation properties requires storing additional information with an allocation,
    1481 The best available option is the header, where Figure~\ref{f:llheapNormalHeader} shows the llheap storage layout.
    1482 The header has two data field sized appropriately for 32/64-bit alignment requirements.
    1483 The first field is a union of three values:
     1358To preserve allocation properties requires storing additional information about an allocation.
     1359Figure~\ref{f:llheapHeader} shows llheap captures this information in the header, which has two fields (left/right) sized appropriately for 32/64-bit alignment requirements.
     1360
     1361\begin{figure}
     1362\centering
     1363\input{Header}
     1364\caption{llheap Header}
     1365\label{f:llheapHeader}
     1366\end{figure}
     1367
     1368The left field is a union of three values:
    14841369\begin{description}
    14851370\item[bucket pointer]
    1486 is for allocated storage and points back to the bucket associated with this storage requests (see Figure~\ref{f:llheapStructure} for the fields accessible in a bucket).
     1371is for deallocated of heap storage and points back to the bucket associated with this storage requests (see Figure~\ref{f:llheapStructure} for the fields accessible in a bucket).
    14871372\item[mapped size]
    1488 is for mapped storage and is the storage size for use in unmapping.
     1373is for deallocation of mapped storage and is the storage size for unmapping.
    14891374\item[next free block]
    1490 is for free storage and is an intrusive pointer chaining same-size free blocks onto a bucket's free stack.
     1375is for freed storage and is an intrusive pointer chaining same-size free blocks onto a bucket's stack of free objects.
    14911376\end{description}
    1492 The second field remembers the request size versus the allocation (bucket) size, \eg request 42 bytes which is rounded up to 64 bytes.
     1377The low-order 3-bits of this field are unused for any stored values as these values are at least 8-byte aligned.
     1378The 3 unused bits are used to represent mapped allocation, zero filled, and alignment, respectively.
     1379Note, the zero-filled/mapped bits are only used in the normal header and the alignment bit in the fake header.
     1380This implementation allows a fast test if any of the lower 3-bits are on (@&@ and compare).
     1381If no bits are on, it implies a basic allocation, which is handled quickly in the fastpath for allocation and free;
     1382otherwise, the bits are analysed and appropriate actions are taken for the complex cases.
     1383
     1384The right field remembers the request size versus the allocation (bucket) size, \eg request of 42 bytes is rounded up to 64 bytes.
    14931385Since programmers think in request sizes rather than allocation sizes, the request size allows better generation of statistics or errors and also helps in memory management.
    14941386
    1495 \begin{figure}
    1496 \centering
    1497 \input{Header}
    1498 \caption{llheap Normal Header}
    1499 \label{f:llheapNormalHeader}
    1500 \end{figure}
    1501 
    1502 The low-order 3-bits of the first field are \emph{unused} for any stored values as these values are 16-byte aligned by default, whereas the second field may use all of its bits.
    1503 The 3 unused bits are used to represent mapped allocation, zero filled, and alignment, respectively.
    1504 Note, the alignment bit is not used in the normal header and the zero-filled/mapped bits are not used in the fake header.
    1505 This implementation allows a fast test if any of the lower 3-bits are on (@&@ and compare).
    1506 If no bits are on, it implies a basic allocation, which is handled quickly;
    1507 otherwise, the bits are analysed and appropriate actions are taken for the complex cases.
    1508 Since most allocations are basic, they will take significantly less time as the memory operations will be done along the allocation and free fastpath.
    1509 
    15101387
    15111388\subsection{Statistics and Debugging}
    15121389
    1513 llheap can be built to accumulate fast and largely contention-free allocation statistics to help understand allocation behaviour.
     1390llheap can be built to accumulate fast and largely contention-free allocation statistics to help understand dynamic-memory behaviour.
    15141391Incrementing statistic counters must appear on the allocation fastpath.
    15151392As noted, any atomic operation along the fastpath produces a significant increase in allocation costs.
     
    17411618
    17421619\medskip\noindent
    1743 \lstinline{void * aalloc( size_t dim, size_t elemSize )}
     1620\lstinline{void * aalloc( size_t dimension, size_t elemSize )}
    17441621extends @calloc@ for allocating a dynamic array of objects with total size @dim@ $\times$ @elemSize@ but \emph{without} zero-filling the memory.
    17451622@aalloc@ is significantly faster than @calloc@, which is the only alternative given by the standard memory-allocation routines for array allocation.
     
    17531630
    17541631\medskip\noindent
    1755 \lstinline{void * amemalign( size_t alignment, size_t dim, size_t elemSize )}
     1632\lstinline{void * amemalign( size_t alignment, size_t dimension, size_t elemSize )}
    17561633extends @aalloc@ and @memalign@ for allocating a dynamic array of objects with the starting address on the @alignment@ boundary.
    17571634Sets sticky alignment property.
     
    17591636
    17601637\medskip\noindent
    1761 \lstinline{void * cmemalign( size_t alignment, size_t dim, size_t elemSize )}
     1638\lstinline{void * cmemalign( size_t alignment, size_t dimension, size_t elemSize )}
    17621639extends @amemalign@ with zero fill and has the same usage as @amemalign@.
    17631640Sets sticky zero-fill and alignment property.
     
    18811758
    18821759\medskip\noindent
    1883 \lstinline{T * alloc( ... )} or \lstinline{T * alloc( size_t dim, ... )}
     1760\lstinline{T * alloc( ... )} or \lstinline{T * alloc( size_t dimension, ... )}
    18841761is overloaded with a variable number of specific allocation operations, or an integer dimension parameter followed by a variable number of specific allocation operations.
    18851762These allocation operations can be passed as named arguments when calling the \lstinline{alloc} routine.
  • doc/theses/fangren_yu_MMath/.gitignore

    r11f92fac rde8a0a4b  
    1 # Intermediate Results:
    2 build/
    3 
    4 # Final Files:
     1# generated by latex
     2build/*
    53*.pdf
    64*.ps
    75
    8 # The Makefile here is not generated.
    9 !Makefile
     6# generated by xfig
     7*.bak
  • doc/theses/fangren_yu_MMath/intro.tex

    r11f92fac rde8a0a4b  
    66\begin{cfa}
    77T sum( T a[$\,$], size_t size ) {
    8         @T@ total = { @0@ };  // size, 0 for type T
     8        @T@ total = { @0@ };  $\C[1.75in]{// size, 0 for type T}$
    99        for ( size_t i = 0; i < size; i += 1 )
    10                 total @+=@ a@[@i@]@; // + and subscript for T
     10                total @+=@ a@[@i@]@; $\C{// + and subscript for T}\CRT$
    1111        return total;
    1212}
     
    2222All computers have multiple types because computer architects optimize the hardware around a few basic types with well defined (mathematical) operations: boolean, integral, floating-point, and occasionally strings.
    2323A programming language and its compiler present ways to declare types that ultimately map into the ones provided by the underlying hardware.
    24 These language types are thrust upon programmers, with their syntactic and semantic rules and restrictions.
    25 These rules are used to transform a language expression to a hardware expression.
     24These language types are thrust upon programmers with their syntactic/semantic rules and restrictions.
     25These rules are then used to transform a language expression to a hardware expression.
    2626Modern programming-languages allow user-defined types and generalize across multiple types using polymorphism.
    2727Type systems can be static, where each variable has a fixed type during execution and an expression's type is determined once at compile time, or dynamic, where each variable can change type during execution and so an expression's type is reconstructed on each evaluation.
     
    3131\section{Overloading}
    3232
    33 Overloading allows programmers to use the most meaningful names without fear of name clashes within a program or from external sources, like include files.
    3433\begin{quote}
    3534There are only two hard things in Computer Science: cache invalidation and \emph{naming things}. --- Phil Karlton
    3635\end{quote}
     36Overloading allows programmers to use the most meaningful names without fear of name clashes within a program or from external sources, like include files.
    3737Experience from \CC and \CFA developers is that the type system implicitly and correctly disambiguates the majority of overloaded names, \ie it is rare to get an incorrect selection or ambiguity, even among hundreds of overloaded (variables and) functions.
    3838In many cases, a programmer has no idea there are name clashes, as they are silently resolved, simplifying the development process.
    39 Depending on the language, any ambiguous cases are resolved using some form of qualification and/or casting.
     39Depending on the language, any ambiguous cases are resolved explicitly using some form of qualification and/or cast.
    4040
    4141
     
    4545Like \CC, \CFA maps operators to named functions and allows these operators to be overloaded with user-defined types.
    4646The syntax for operator names uses the @'?'@ character to denote a parameter, \eg left and right unary operators: @?++@ and @++?@, and binary operators @?+?@ and @?<=?@.
    47 Here, a user-defined type is extended with an addition operation with the same syntax as builtin types.
     47Here, a user-defined type is extended with an addition operation with the same syntax as a builtin type.
    4848\begin{cfa}
    4949struct S { int i, j };
     
    5555The type system examines each call site and selects the best matching overloaded function based on the number and types of arguments.
    5656If there are mixed-mode operands, @2 + 3.5@, the type system attempts (safe) conversions, like in C/\CC, converting the argument type(s) to the parameter type(s).
    57 Conversions are necessary because the hardware rarely supports mix-mode operations, so both operands must be the same type.
    58 Note, without implicit conversions, programmers must write an exponential number of functions covering all possible exact-match cases among all possible types.
     57Conversions are necessary because the hardware rarely supports mix-mode operations, so both operands must be converted to a common type.
     58Like overloading, the majority of mixed-mode conversions are silently resolved, simplifying the development process.
     59Without implicit conversions, programmers must write an exponential number of functions covering all possible exact-match cases among all possible types.
    5960This approach does not match with programmer intuition and expectation, regardless of any \emph{safety} issues resulting from converted values.
     61Depending on the language, mix-mode conversions can be explicitly controlled using some form of cast.
    6062
    6163
     
    8183double d = f( 3 );              $\C{// select (2)}\CRT$
    8284\end{cfa}
    83 Alternatively, if the type system looks at the return type, there is an exact match for each call, which again matches with programmer intuition and expectation.
    84 This capability can be taken to the extreme, where there are no function parameters.
     85Alternatively, if the type system uses the return type, there is an exact match for each call, which again matches with programmer intuition and expectation.
     86This capability can be taken to the extreme, where the only differentiating factor is the return type.
    8587\begin{cfa}
    8688int random( void );             $\C[2in]{// (1); overloaded on return type}$
     
    9092\end{cfa}
    9193Again, there is an exact match for each call.
    92 If there is no exact match, a set of minimal, safe conversions can be added to find a best match, as for operator overloading.
     94As for operator overloading, if there is no exact match, a set of minimal, an implicit conversion can be added to find a best match.
     95\begin{cfa}
     96short int = random();   $\C[2in]{// select (1), unsafe}$
     97long double = random(); $\C{// select (2), safe}\CRT$
     98\end{cfa}
    9399
    94100
     
    96102
    97103Unlike most programming languages, \CFA has variable overloading within a scope, along with shadow overloading in nested scopes.
    98 (Shadow overloading is also possible for functions, if a language supports nested function declarations, \eg \CC named, nested, lambda functions.)
     104Shadow overloading is also possible for functions, in languages supporting nested-function declarations, \eg \CC named, nested, lambda functions.
    99105\begin{cfa}
    100106void foo( double d );
     
    109115\end{cfa}
    110116It is interesting that shadow overloading is considered a normal programming-language feature with only slight software-engineering problems.
    111 However, variable overloading within a scope is often considered extremely dangerous, without any evidence to corroborate this claim.
     117However, variable overloading within a scope is considered extremely dangerous, without any evidence to corroborate this claim.
    112118In contrast, function overloading in \CC occurs silently within the global scope from @#include@ files all the time without problems.
    113119
    114 In \CFA, the type system simply treats overloaded variables as an overloaded function returning a value with no parameters.
    115 Hence, no significant effort is required to support this feature by leveraging the return type to disambiguate as variables have no parameters.
     120In \CFA, the type system simply treats an overloaded variable as an overloaded function returning a value with no parameters.
     121Hence, no effort is required to support this feature as it is available for differentiating among overloaded functions with no parameters.
    116122\begin{cfa}
    117123int MAX = 2147483647;   $\C[2in]{// (1); overloaded on return type}$
     
    125131The result is a significant reduction in names to access typed constants.
    126132
    127 As an aside, C has a separate namespace for type and variables allowing overloading between the namespaces, using @struct@ (qualification) to disambiguate.
     133As an aside, C has a separate namespace for types and variables allowing overloading between the namespaces, using @struct@ (qualification) to disambiguate.
    128134\begin{cfa}
    129135void S() {
     
    133139}
    134140\end{cfa}
     141Here the name @S@ is an aggregate type and field, and a variable and parameter of type @S@.
    135142
    136143
     
    145152for ( ; x; --x )   =>    for ( ; x @!= 0@; x @-= 1@ )
    146153\end{cfa}
    147 To generalize this feature, both constants are given types @zero_t@ and @one_t@ in \CFA, which allows overloading various operations for new types that seamlessly work with the special @0@ and @1@ contexts.
     154To generalize this feature, both constants are given types @zero_t@ and @one_t@ in \CFA, which allows overloading various operations for new types that seamlessly work within the special @0@ and @1@ contexts.
    148155The types @zero_t@ and @one_t@ have special builtin implicit conversions to the various integral types, and a conversion to pointer types for @0@, which allows standard C code involving @0@ and @1@ to work.
    149156\begin{cfa}
     
    176183\end{cfa}
    177184For type-only, the programmer specifies the initial type, which remains fixed for the variable's lifetime in statically typed languages.
    178 For type-and-initialization, the specified and initialization types may not agree.
     185For type-and-initialization, the specified and initialization types may not agree requiring an implicit/explicit conversion.
    179186For initialization-only, the compiler may select the type by melding programmer and context information.
    180187When the compiler participates in type selection, it is called \newterm{type inferencing}.
    181 Note, type inferencing is different from type conversion: type inferencing \emph{discovers} a variable's type before setting its value, whereas conversion has two typed values and performs a (possibly lossy) action to convert one value to the type of the other variable.
     188Note, type inferencing is different from type conversion: type inferencing \emph{discovers} a variable's type before setting its value, whereas conversion has two typed variables and performs a (possibly lossy) value conversion from one type to the other.
    182189Finally, for assignment, the current variable and expression types may not agree.
    183190Discovering a variable or function type is complex and has limitations.
    184 The following covers these issues, and why some schemes are not amenable with the \CFA type system.
     191The following covers these issues, and why this scheme is not amenable with the \CFA type system.
    185192
    186193One of the first and powerful type-inferencing system is Hindley--Milner~\cite{Damas82}.
     
    203210\end{cfa}
    204211In both overloads of @f@, the type system works from the constant initializations inwards and/or outwards to determine the types of all variables and functions.
    205 Note, like template meta programming, there could be a new function generated for the second @f@ depending on the types of the arguments, assuming these types are meaningful in the body of @f@.
     212Like template meta-programming, there can be a new function generated for the second @f@ depending on the types of the arguments, assuming these types are meaningful in the body of @f@.
    206213Inferring type constraints, by analysing the body of @f@ is possible, and these constraints must be satisfied at each call site by the argument types;
    207214in this case, parametric polymorphism can allow separate compilation.
     
    246253This issue is exaggerated with \CC templates, where type names are 100s of characters long, resulting in unreadable error messages.
    247254\item
    248 Ensuring the type of secondary variables, match a primary variable(s).
     255Ensuring the type of secondary variables, match a primary variable.
    249256\begin{cfa}
    250257int x; $\C{// primary variable}$
     
    252259\end{cfa}
    253260If the type of @x@ changes, the type of the secondary variables correspondingly updates.
     261There can be strong software-engineering reasons for binding the types of these variables.
    254262\end{itemize}
    255263Note, the use of @typeof@ is more restrictive, and possibly safer, than general type-inferencing.
     
    269277
    270278A restriction is the conundrum in type inferencing of when to \emph{brand} a type.
    271 That is, when is the type of the variable/function more important than the type of its initialization expression.
    272 For example, if a change is made in an initialization expression, it can cause cascading type changes and/or errors.
    273 At some point, a variable's type needs to remain constant and the initializing expression needs to be modified or in error when it changes.
     279That is, when is the type of the variable/function more important than the type of its initialization expression(s).
     280For example, if a change is made in an initialization expression, it can cascade type changes producing many other changes and/or errors.
     281At some point, a variable's type needs to remain constant and the initializing expression needs to be modified or be in error when it changes.
    274282Often type-inferencing systems allow restricting (\newterm{branding}) a variable or function type, so the complier can report a mismatch with the constant initialization.
    275283\begin{cfa}
     
    283291In Haskell, it is common for programmers to brand (type) function parameters.
    284292
    285 A confusion is large blocks of code where all declarations are @auto@, as is now common in \CC.
     293A confusion is blocks of code where all declarations are @auto@, as is now common in \CC.
    286294As a result, understanding and changing the code becomes almost impossible.
    287295Types provide important clues as to the behaviour of the code, and correspondingly to correctly change or add new code.
     
    299307In this situation, having the type name or its short alias is essential.
    300308
    301 The \CFA's type system tries to prevent type-resolution mistakes by relying heavily on the type of the left-hand side of assignment to pinpoint the right types within an expression.
     309\CFA's type system tries to prevent type-resolution mistakes by relying heavily on the type of the left-hand side of assignment to pinpoint the right types within an expression.
    302310Type inferencing defeats this goal because there is no left-hand type.
    303311Fundamentally, type inferencing tries to magic away variable types from the programmer.
     
    308316The entire area of Computer-Science data-structures is obsessed with time and space, and that obsession should continue into regular programming.
    309317Understanding space and time issues is an essential part of the programming craft.
    310 Given @typedef@ and @typeof@ in \CFA, and the strong desire to use the left-hand type in resolution, implicit type-inferencing is unsupported.
    311 Should a significant need arise, this feature can be revisited.
     318Given @typedef@ and @typeof@ in \CFA, and the strong desire to use the left-hand type in resolution, the decision was made not to support implicit type-inferencing in the type system.
     319Should a significant need arise, this decision can be revisited.
    312320
    313321
     
    334342
    335343To constrain polymorphic types, \CFA uses \newterm{type assertions}~\cite[pp.~37-44]{Alphard} to provide further type information, where type assertions may be variable or function declarations that depend on a polymorphic type variable.
    336 For example, the function @twice@ works for any type @T@ with a matching addition operator.
     344Here, the function @twice@ works for any type @T@ with a matching addition operator.
    337345\begin{cfa}
    338346forall( T @| { T ?+?(T, T); }@ ) T twice( T x ) { return x @+@ x; }
    339347int val = twice( twice( 3 ) );  $\C{// val == 12}$
    340348\end{cfa}
    341 For example. parametric polymorphism and assertions occurs in existing type-unsafe (@void *@) C @qsort@ to sort an array.
     349Parametric polymorphism and assertions occur in existing type-unsafe (@void *@) C functions, like @qsort@ for sorting an array of unknown values.
    342350\begin{cfa}
    343351void qsort( void * base, size_t nmemb, size_t size, int (*cmp)( const void *, const void * ) );
     
    386394}
    387395// select type and size from left-hand side
    388 int * ip = malloc();  double * dp = malloc();  $@$[aligned(64)] struct S {...} * sp = malloc();
     396int * ip = malloc();  double * dp = malloc();  [[aligned(64)]] struct S {...} * sp = malloc();
    389397\end{cfa}
    390398The @sized@ assertion passes size and alignment as a data object has no implicit assertions.
    391399Both assertions are used in @malloc@ via @sizeof@ and @_Alignof@.
    392 
    393 These mechanism are used to construct type-safe wrapper-libraries condensing hundreds of existing C functions into tens of \CFA overloaded functions.
    394 Hence, existing C legacy code is leveraged as much as possible;
     400In practise, this polymorphic @malloc@ is unwrapped by the C compiler and the @if@ statement is elided producing a type-safe call to @malloc@ or @memalign@.
     401
     402This mechanism is used to construct type-safe wrapper-libraries condensing hundreds of existing C functions into tens of \CFA overloaded functions.
     403Here, existing C legacy code is leveraged as much as possible;
    395404other programming languages must build supporting libraries from scratch, even in \CC.
    396405
     
    422431\end{tabular}
    423432\end{cquote}
    424 Traits are simply flatten at the use point, as if written in full by the programmer, where traits often contain overlapping assertions, \eg operator @+@.
     433Traits are implemented by flatten them at use points, as if written in full by the programmer.
     434Flattening often results in overlapping assertions, \eg operator @+@.
    425435Hence, trait names play no part in type equivalence.
    426 Note, the type @T@ is an object type, and hence, has the implicit internal trait @otype@.
     436In the example, type @T@ is an object type, and hence, has the implicit internal trait @otype@.
    427437\begin{cfa}
    428438trait otype( T & | sized(T) ) {
     
    433443};
    434444\end{cfa}
    435 The implicit routines are used by the @sumable@ operator @?+=?@ for the right side of @?+=?@ and return.
     445These implicit routines are used by the @sumable@ operator @?+=?@ for the right side of @?+=?@ and return.
    436446
    437447If the array type is not a builtin type, an extra type parameter and assertions are required, like subscripting.
     
    445455\begin{enumerate}[leftmargin=*]
    446456\item
    447 Write bespoke data structures for each context they are needed.
     457Write bespoke data structures for each context.
    448458While this approach is flexible and supports integration with the C type checker and tooling, it is tedious and error prone, especially for more complex data structures.
    449459\item
     
    452462\item
    453463Use preprocessor macros, similar to \CC @templates@, to generate code that is both generic and type checked, but errors may be difficult to interpret.
    454 Furthermore, writing and using preprocessor macros is difficult and inflexible.
     464Furthermore, writing and using complex preprocessor macros is difficult and inflexible.
    455465\end{enumerate}
    456466
    457467\CC, Java, and other languages use \newterm{generic types} to produce type-safe abstract data-types.
    458468\CFA generic types integrate efficiently and naturally with the existing polymorphic functions, while retaining backward compatibility with C and providing separate compilation.
    459 However, for known concrete parameters, the generic-type definition can be inlined, like \CC templates.
     469For concrete parameters, the generic-type definition can be inlined, like \CC templates, if its definition appears in a header file (\eg @static inline@).
    460470
    461471A generic type can be declared by placing a @forall@ specifier on a @struct@ or @union@ declaration and instantiated using a parenthesized list of types after the type name.
     
    484494\end{cquote}
    485495\CFA generic types are \newterm{fixed} or \newterm{dynamic} sized.
    486 Fixed-size types have a fixed memory layout regardless of type parameters, whereas dynamic types vary in memory layout depending on their type parameters.
     496Fixed-size types have a fixed memory layout regardless of type parameters, whereas dynamic types vary in memory layout depending on the type parameters.
    487497For example, the type variable @T *@ is fixed size and is represented by @void *@ in code generation;
    488498whereas, the type variable @T@ is dynamic and set at the point of instantiation.
    489499The difference between fixed and dynamic is the complexity and cost of field access.
    490500For fixed, field offsets are computed (known) at compile time and embedded as displacements in instructions.
    491 For dynamic, field offsets are computed at compile time at the call site, stored in an array of offset values, passed as a polymorphic parameter, and added to the structure address for each field dereference within a polymorphic routine.
     501For dynamic, field offsets are compile-time computed at the call site, stored in an array of offset values, passed as a polymorphic parameter, and added to the structure address for each field dereference within a polymorphic routine.
    492502See~\cite[\S~3.2]{Moss19} for complete implementation details.
    493503
     
    517527\section{Contributions}
    518528
     529The \CFA compiler performance and type capability have been greatly improved through my development work.
    519530\begin{enumerate}
    520 \item The \CFA compiler performance and capability have been greatly improved through recent development. The compilation time of various \CFA library units and test programs have been reduced from the order of minutes down to 10-20 seconds, which made it possible to develop and test more complicated \CFA programs that utilize sophisticated type system features. The details of compiler optimization work are covered in a previous technical report.
    521 \item The thesis presents a systematic review of the new features that have been added to the \CFA language and its type system. Some of the more recent inclusions to \CFA such as tuples and generic structure types were not well tested when they were being developed, due to the limitation of compiler performance. Several issues coming from the interactions of various language features are identified and discussed in this thesis; some of them are now fully resolved, while others are given temporary fixes and need to be reworked in the future.
    522 \item Finally, this thesis provides constructive ideas of fixing the outstanding issues in \CFA language design and implementation, and gives a path for future improvements to \CFA language and compiler.
     531\item
     532The compilation time of various \CFA library units and test programs has been reduced by an order of magnitude, from minutes to seconds \see{\VRef[Table]{t:SelectedFileByCompilerBuild}}, which made it possible to develop and test more complicated \CFA programs that utilize sophisticated type system features.
     533The details of compiler optimization work are covered in a previous technical report~\cite{Yu20}, which essentially forms part of this thesis.
     534\item
     535The thesis presents a systematic review of the new features added to the \CFA language and its type system.
     536Some of the more recent inclusions to \CFA, such as tuples and generic structure types, were not well tested during development due to the limitation of compiler performance.
     537Several issues coming from the interactions of various language features are identified and discussed in this thesis;
     538some of them I have resolved, while others are given temporary fixes and need to be reworked in the future.
     539\item
     540Finally, this thesis provides constructive ideas for fixing a number of high-level issues in the \CFA language design and implementation, and gives a path for future improvements to the language and compiler.
    523541\end{enumerate}
    524542
Note: See TracChangeset for help on using the changeset viewer.