Changes in / [49510db:c699602]


Ignore:
Files:
1 added
5 edited

Legend:

Unmodified
Added
Removed
  • doc/bibliography/pl.bib

    r49510db rc699602  
     1
    12% Conventions: uncross-referenced entries appear first, then
    23%    cross-referenced entries.  In both groups, entries are sorted by their
     
    39303931
    39313932@article{Boehm88,
    3932     keywords    = {conservative garbage collection, C, storage management, debugging},
     3933    keywords    = {conservative garbage collection, C},
    39333934    contributer = {gjditchfield@plg},
    39343935    author      = {Hans-Juergen Boehm and Mark Weiser},
  • doc/papers/llheap/Paper.tex

    r49510db rc699602  
    187187\author[1]{Peter A. Buhr*}
    188188\author[2]{Bryan Chan}
    189 \author[3]{Dave Dice}
    190189\authormark{ZULFIQAR \textsc{et al.}}
    191190
    192191\address[1]{\orgdiv{Cheriton School of Computer Science}, \orgname{University of Waterloo}, \orgaddress{\state{Waterloo, ON}, \country{Canada}}}
    193192\address[2]{\orgdiv{Huawei Compiler Lab}, \orgname{Huawei}, \orgaddress{\state{Markham, ON}, \country{Canada}}}
    194 \address[3]{\orgdiv{Oracle Labs}, \orgname{Oracle}, \orgaddress{\state{Burlington, MA}, \country{USA}}}
    195 
    196193
    197194\corres{*Peter A. Buhr, Cheriton School of Computer Science, University of Waterloo, 200 University Avenue West, Waterloo, ON N2L 3G1, Canada. \email{pabuhr{\char`\@}uwaterloo.ca}}
     
    204201llheap extends the feature set of existing C allocation by remembering zero-filled (\lstinline{calloc}) and aligned properties (\lstinline{memalign}) in an allocation.
    205202These properties can be queried, allowing programmers to write safer programs by preserving these properties in future allocations.
    206 As well, \lstinline{realloc}/\lstinline{reallocarray} preserve these properties when adjusting storage size, again increasing future allocation safety.
     203As well, \lstinline{realloc} preserves these properties when adjusting storage size, again increasing future allocation safety.
    207204llheap also extends the C allocation API with \lstinline{aalloc}, \lstinline{amemalign}, \lstinline{cmemalign}, \lstinline{resize}, and extended \lstinline{realloc}, providing orthogonal access to allocation features;
    208205hence, programmers do have to code missing combinations.
     
    229226\section{Introduction}
    230227
    231 Memory management services a series of program allocation/deallocation requests and attempts to satisfy them from a variable-sized block(s) of memory, while minimizing total memory usage.
    232 A general-purpose dynamic-allocation algorithm cannot anticipate allocation requests so its time and space performance is rarely optimal (bin packing).
    233 However, allocators take advantage of allocation patterns in typical programs (heuristics) to produce excellent results, both in time and space (similar to LRU paging).
    234 Allocators use similar techniques, but each optimizes specific allocation patterns.
    235 Nevertheless, allocators are a series of compromises, occasionally with some static or dynamic tuning parameters to optimize specific request patterns.
     228Memory management services a series of program allocation/deallocation requests and attempts to satisfy them from a variable-sized block of memory, while minimizing total memory usage.
     229A general-purpose dynamic-allocation algorithm cannot anticipate allocation requests so its time and space performance is rarely optimal.
     230However, allocators take advantage of regular allocation patterns in typical programs to produce excellent results, both in time and space (similar to LRU paging).
     231Allocators use a number of similar techniques, but each optimizes specific allocation patterns.
     232Nevertheless, allocators are a series of compromises, occasionally with some static or dynamic tuning parameters to optimize specific program-request patterns.
    236233
    237234
     
    286283\begin{enumerate}[leftmargin=*,itemsep=0pt]
    287284\item
    288 Implementation of a new stand-alone concurrent low-latency memory-allocator ($\approx$1,400 lines of code) for C/\CC programs using kernel threads (1:1 threading), and specialized versions for the concurrent languages \uC~\cite{uC++} and \CFA~\cite{Moss18,Delisle21} using user-level threads running on multiple kernel threads (M:N threading).
     285Implementation of a new stand-alone concurrent low-latency memory-allocator ($\approx$1,200 lines of code) for C/\CC programs using kernel threads (1:1 threading), and specialized versions for the concurrent languages \uC~\cite{uC++} and \CFA~\cite{Moss18,Delisle21} using user-level threads running on multiple kernel threads (M:N threading).
    289286
    290287\item
     
    292289
    293290\item
    294 Use the preserved zero fill and alignment as \emph{sticky} properties for @realloc@ and @reallocarray@ to zero-fill and align when storage is extended or copied.
     291Use the preserved zero fill and alignment as \emph{sticky} properties for @realloc@ to zero-fill and align when storage is extended or copied.
    295292Without this extension, it is unsafe to @realloc@ storage these allocations if the properties are not preserved when copying.
    296293This silent problem is unintuitive to programmers and difficult to locate because it is transient.
     
    298295\item
    299296Provide additional heap operations to make allocation properties orthogonally accessible.
    300 \begin{itemize}[topsep=0pt,itemsep=0pt,parsep=0pt]
    301 \item
    302 @aalloc( dimension, elemSize )@ same as @calloc@ except memory is \emph{not} zero filled, which is significantly faster than @calloc@.
    303 \item
    304 @amemalign( alignment, dimension, elemSize )@ same as @aalloc@ with memory alignment.
    305 \item
    306 @cmemalign( alignment, dimension, elemSize )@ same as @calloc@ with memory alignment.
     297\begin{itemize}[topsep=2pt,itemsep=2pt,parsep=0pt]
     298\item
     299@aalloc( dim, elemSize )@ same as @calloc@ except memory is \emph{not} zero filled.
     300\item
     301@amemalign( alignment, dim, elemSize )@ same as @aalloc@ with memory alignment.
     302\item
     303@cmemalign( alignment, dim, elemSize )@ same as @calloc@ with memory alignment.
    307304\item
    308305@resize( oaddr, size )@ re-purpose an old allocation for a new type \emph{without} preserving fill or alignment.
    309306\item
    310 @aligned_resize( oaddr, alignment, size )@ re-purpose an old allocation with new alignment but \emph{without} preserving fill.
    311 \item
    312 @aligned_realloc( oaddr, alignment, size )@ same as @realloc@ but adding or changing alignment.
    313 \item
    314 @aligned_reallocarray( oaddr, alignment, dimension, elemSize )@ same as @reallocarray@ but adding or changing alignment.
     307@resize( oaddr, alignment, size )@ re-purpose an old allocation with new alignment but \emph{without} preserving fill.
     308\item
     309@realloc( oaddr, alignment, size )@ same as @realloc@ but adding or changing alignment.
    315310\end{itemize}
    316311
    317312\item
    318313Provide additional query operations to access information about an allocation:
    319 \begin{itemize}[topsep=0pt,itemsep=0pt,parsep=0pt]
     314\begin{itemize}[topsep=3pt,itemsep=2pt,parsep=0pt]
    320315\item
    321316@malloc_alignment( addr )@ returns the alignment of the allocation.
    322317If the allocation is not aligned or @addr@ is @NULL@, the minimal alignment is returned.
    323318\item
    324 @malloc_zero_fill( addr )@ returns a boolean result indicating if the memory is allocated with zero fill, \eg by @calloc@/@cmemalign@.
     319@malloc_zero_fill( addr )@ returns a boolean result indicating if the memory is allocated with zero fill, e.g., by @calloc@/@cmemalign@.
    325320\item
    326321@malloc_size( addr )@ returns the size of the memory allocation.
    327322\item
    328 @malloc_usable_size( addr )@ returns the usable (total) size of the memory, \ie the bin size containing the allocation, where @malloc_size( addr )@ $\le$ @malloc_usable_size( addr )@.
     323@malloc_usable_size( addr )@ returns the usable (total) size of the memory, i.e., the bin size containing the allocation, where @malloc_size( addr )@ $\le$ @malloc_usable_size( addr )@.
    329324\end{itemize}
    330325
    331326\item
    332327Provide optional extensive, fast, and contention-free allocation statistics to understand allocation behaviour, accessed by:
    333 \begin{itemize}[topsep=0pt,itemsep=0pt,parsep=0pt]
     328\begin{itemize}[topsep=3pt,itemsep=2pt,parsep=0pt]
    334329\item
    335330@malloc_stats()@ print memory-allocation statistics on the file-descriptor set by @malloc_stats_fd@ (default @stderr@).
     
    364359The management goals are to make allocation/deallocation operations as fast as possible while densely packing objects to make efficient use of memory.
    365360Since objects in C/\CC cannot be moved to aid the packing process, only adjacent free storage can be \newterm{coalesced} into larger free areas.
    366 The allocator grows or shrinks the dynamic-allocation zone to obtain storage for objects and reduce memory usage via OS calls, such as @mmap@ or @sbrk@ in UNIX.
     361The allocator grows or shrinks the dynamic-allocation zone to obtain storage for objects and reduce memory usage via operating-system calls, such as @mmap@ or @sbrk@ in UNIX.
    367362
    368363
     
    989984That is, rather than requesting new storage for a single object, an entire buffer is requested from which multiple objects are allocated later.
    990985Any heap may use an allocation buffer, resulting in allocation from the buffer before requesting objects (containers) from the global heap or OS, respectively.
    991 The allocation buffer reduces contention and the number of global/OS calls.
     986The allocation buffer reduces contention and the number of global/operating-system calls.
    992987For coalescing, a buffer is split into smaller objects by allocations, and recomposed into larger buffer areas during deallocations.
    993988
     
    10261021
    10271022
    1028 \section{llheap}
    1029 
    1030 This section presents our new stand-alone, concurrent, low-latency memory-allocator, called llheap (low-latency heap), fulfilling the GNU C Library allocator API~\cite{GNUallocAPI} for C/\CC programs using kernel threads (1:1 threading), with specialized versions for the programming languages \uC and \CFA using user-level threads running over multiple kernel threads (M:N threading).
    1031 The primary design objective for llheap is low-latency across all allocator calls independent of application access-patterns and/or number of threads, \ie very seldom does the allocator delay during an allocator call.
     1023\section{Allocator}
     1024\label{c:Allocator}
     1025
     1026This section presents a new stand-alone concurrent low-latency memory-allocator ($\approx$1,200 lines of code), called llheap (low-latency heap), for C/\CC programs using kernel threads (1:1 threading), and specialized versions of the allocator for the programming languages \uC and \CFA using user-level threads running over multiple kernel threads (M:N threading).
     1027The new allocator fulfills the GNU C Library allocator API~\cite{GNUallocAPI}.
     1028
     1029
     1030\subsection{llheap}
     1031
     1032The primary design objective for llheap is low-latency across all allocator calls independent of application access-patterns and/or number of threads, \ie very seldom does the allocator have a delay during an allocator call.
    10321033Excluded from the low-latency objective are (large) allocations requiring initialization, \eg zero fill, and/or data copying, which are outside the allocator's purview.
    10331034A direct consequence of this objective is very simple or no storage coalescing;
    10341035hence, llheap's design is willing to use more storage to lower latency.
    1035 This objective is apropos because systems research and industrial applications are striving for low latency and modern computers have huge amounts of RAM memory.
     1036This objective is apropos because systems research and industrial applications are striving for low latency and computers have huge amounts of RAM memory.
    10361037Finally, llheap's performance should be comparable with the current best allocators, both in space and time (see performance comparison in Section~\ref{c:Performance}).
    10371038
     1039% The objective of llheap's new design was to fulfill following requirements:
     1040% \begin{itemize}
     1041% \item It should be concurrent and thread-safe for multi-threaded programs.
     1042% \item It should avoid global locks, on resources shared across all threads, as much as possible.
     1043% \item It's performance (FIX ME: cite performance benchmarks) should be comparable to the commonly used allocators (FIX ME: cite common allocators).
     1044% \item It should be a lightweight memory allocator.
     1045% \end{itemize}
     1046
     1047%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    10381048
    10391049\subsection{Design Choices}
    10401050
    1041 llheap's design was reviewed and changed multiple times during its development, with the final choices discussed here.
    1042 All designs focused on the allocation/free \newterm{fastpath}, \ie the shortest code path for the most common operations, \eg when an allocation can immediately return free storage or returned storage is not coalesced.
    1043 The model chosen is 1:1, so there is one thread-local heap for each KT.
     1051% Some of the rejected designs are discussed because they show the path to the final design (see discussion in Section~\ref{s:MultipleHeaps}).
     1052% Note, a few simple tests for a design choice were compared with the current best allocators to determine the viability of a design.
     1053
     1054
     1055% \paragraph{T:1 model}
     1056% Figure~\ref{f:T1SharedBuckets} shows one heap accessed by multiple kernel threads (KTs) using a bucket array, where smaller bucket sizes are shared among N KTs.
     1057% This design leverages the fact that usually the allocation requests are less than 1024 bytes and there are only a few different request sizes.
     1058% When KTs $\le$ N, the common bucket sizes are uncontented;
     1059% when KTs $>$ N, the free buckets are contented and latency increases significantly.
     1060% In all cases, a KT must acquire/release a lock, contented or uncontented, along the fast allocation path because a bucket is shared.
     1061% Therefore, while threads are contending for a small number of buckets sizes, the buckets are distributed among them to reduce contention, which lowers latency;
     1062% however, picking N is workload specific.
     1063%
     1064% \begin{figure}
     1065% \centering
     1066% \input{AllocDS1}
     1067% \caption{T:1 with Shared Buckets}
     1068% \label{f:T1SharedBuckets}
     1069% \end{figure}
     1070%
     1071% Problems:
     1072% \begin{itemize}
     1073% \item
     1074% Need to know when a KT is created/destroyed to assign/unassign a shared bucket-number from the memory allocator.
     1075% \item
     1076% When no thread is assigned a bucket number, its free storage is unavailable.
     1077% \item
     1078% All KTs contend for the global-pool lock for initial allocations, before free-lists get populated.
     1079% \end{itemize}
     1080% Tests showed having locks along the allocation fast-path produced a significant increase in allocation costs and any contention among KTs produces a significant spike in latency.
     1081
     1082% \paragraph{T:H model}
     1083% Figure~\ref{f:THSharedHeaps} shows a fixed number of heaps (N), each a local free pool, where the heaps are sharded (distributed) across the KTs.
     1084% A KT can point directly to its assigned heap or indirectly through the corresponding heap bucket.
     1085% When KT $\le$ N, the heaps might be uncontented;
     1086% when KTs $>$ N, the heaps are contented.
     1087% In all cases, a KT must acquire/release a lock, contented or uncontented along the fast allocation path because a heap is shared.
     1088% By increasing N, this approach reduces contention but increases storage (time versus space);
     1089% however, picking N is workload specific.
     1090%
     1091% \begin{figure}
     1092% \centering
     1093% \input{AllocDS2}
     1094% \caption{T:H with Shared Heaps}
     1095% \label{f:THSharedHeaps}
     1096% \end{figure}
     1097%
     1098% Problems:
     1099% \begin{itemize}
     1100% \item
     1101% Need to know when a KT is created/destroyed to assign/unassign a heap from the memory allocator.
     1102% \item
     1103% When no thread is assigned to a heap, its free storage is unavailable.
     1104% \item
     1105% Ownership issues arise (see Section~\ref{s:Ownership}).
     1106% \item
     1107% All KTs contend for the local/global-pool lock for initial allocations, before free-lists get populated.
     1108% \end{itemize}
     1109% Tests showed having locks along the allocation fast-path produced a significant increase in allocation costs and any contention among KTs produces a significant spike in latency.
     1110
     1111% \paragraph{T:H model, H = number of CPUs}
     1112% This design is the T:H model but H is set to the number of CPUs on the computer or the number restricted to an application, \eg via @taskset@.
     1113% (See Figure~\ref{f:THSharedHeaps} but with a heap bucket per CPU.)
     1114% Hence, each CPU logically has its own private heap and local pool.
     1115% A memory operation is serviced from the heap associated with the CPU executing the operation.
     1116% This approach removes fastpath locking and contention, regardless of the number of KTs mapped across the CPUs, because only one KT is running on each CPU at a time (modulo operations on the global pool and ownership).
     1117% This approach is essentially an M:N approach where M is the number if KTs and N is the number of CPUs.
     1118%
     1119% Problems:
     1120% \begin{itemize}
     1121% \item
     1122% Need to know when a CPU is added/removed from the @taskset@.
     1123% \item
     1124% Need a fast way to determine the CPU a KT is executing on to access the appropriate heap.
     1125% \item
     1126% Need to prevent preemption during a dynamic memory operation because of the \newterm{serially-reusable problem}.
     1127% \begin{quote}
     1128% A sequence of code that is guaranteed to run to completion before being invoked to accept another input is called serially-reusable code.~\cite{SeriallyReusable}\label{p:SeriallyReusable}
     1129% \end{quote}
     1130% If a KT is preempted during an allocation operation, the OS can schedule another KT on the same CPU, which can begin an allocation operation before the previous operation associated with this CPU has completed, invalidating heap correctness.
     1131% Note, the serially-reusable problem can occur in sequential programs with preemption, if the signal handler calls the preempted function, unless the function is serially reusable.
     1132% Essentially, the serially-reusable problem is a race condition on an unprotected critical subsection, where the OS is providing the second thread via the signal handler.
     1133%
     1134% Library @librseq@~\cite{librseq} was used to perform a fast determination of the CPU and to ensure all memory operations complete on one CPU using @librseq@'s restartable sequences, which restart the critical subsection after undoing its writes, if the critical subsection is preempted.
     1135% \end{itemize}
     1136% Tests showed that @librseq@ can determine the particular CPU quickly but setting up the restartable critical-subsection along the allocation fast-path produced a significant increase in allocation costs.
     1137% Also, the number of undoable writes in @librseq@ is limited and restartable sequences cannot deal with user-level thread (UT) migration across KTs.
     1138% For example, UT$_1$ is executing a memory operation by KT$_1$ on CPU$_1$ and a time-slice preemption occurs.
     1139% The signal handler context switches UT$_1$ onto the user-level ready-queue and starts running UT$_2$ on KT$_1$, which immediately calls a memory operation.
     1140% Since KT$_1$ is still executing on CPU$_1$, @librseq@ takes no action because it assumes KT$_1$ is still executing the same critical subsection.
     1141% Then UT$_1$ is scheduled onto KT$_2$ by the user-level scheduler, and its memory operation continues in parallel with UT$_2$ using references into the heap associated with CPU$_1$, which corrupts CPU$_1$'s heap.
     1142% If @librseq@ had an @rseq_abort@ which:
     1143% \begin{enumerate}
     1144% \item
     1145% Marked the current restartable critical-subsection as cancelled so it restarts when attempting to commit.
     1146% \item
     1147% Do nothing if there is no current restartable critical subsection in progress.
     1148% \end{enumerate}
     1149% Then @rseq_abort@ could be called on the backside of a  user-level context-switching.
     1150% A feature similar to this idea might exist for hardware transactional-memory.
     1151% A significant effort was made to make this approach work but its complexity, lack of robustness, and performance costs resulted in its rejection.
     1152
     1153% \subsubsection{Allocation Fastpath}
     1154% \label{s:AllocationFastpath}
     1155
     1156llheap's design was reviewed and changed multiple times during its development, with the final choices are discussed here.
     1157(See~\cite{Zulfiqar22} for a discussion of alternate choices and reasons for rejecting them.)
     1158All designs were analyzed for the allocation/free \newterm{fastpath}, \ie when an allocation can immediately return free storage or returned storage is not coalesced.
     1159The heap model chosen is 1:1, which is the T:H model with T = H, where there is one thread-local heap for each KT.
    10441160(See Figure~\ref{f:THSharedHeaps} but with a heap bucket per KT and no bucket or local-pool lock.)
    10451161Hence, immediately after a KT starts, its heap is created and just before a KT terminates, its heap is (logically) deleted.
    1046 Therefore, heaps are uncontended for a KTs memory operations as every KT has its own thread-local heap, modulo operations on the global pool and ownership.
     1162Heaps are uncontended for a KTs memory operations as every KT has its own thread-local heap, modulo operations on the global pool and ownership.
    10471163
    10481164Problems:
     
    10891205For the T:1 and T:H models, locking must exist along the allocation fastpath because the buckets or heaps might be shared by multiple threads, even when KTs $\le$ N.
    10901206For the T:H=CPU and 1:1 models, locking is eliminated along the allocation fastpath.
    1091 However, T:H=CPU has poor OS support to determine the CPU id (heap id) and prevent the serially-reusable problem for KTs.
     1207However, T:H=CPU has poor operating-system support to determine the CPU id (heap id) and prevent the serially-reusable problem for KTs.
    10921208More OS support is required to make this model viable, but there is still the serially-reusable problem with user-level threading.
    1093 So the 1:1 model had no atomic actions along the fastpath and no special OS support requirements.
     1209So the 1:1 model had no atomic actions along the fastpath and no special operating-system support requirements.
    10941210The 1:1 model still has the serially-reusable problem with user-level threading, which is addressed in Section~\ref{s:UserlevelThreadingSupport}, and the greatest potential for heap blowup for certain allocation patterns.
    10951211
     
    11251241A primary goal of llheap is low latency, hence the name low-latency heap (llheap).
    11261242Two forms of latency are internal and external.
    1127 Internal latency is the time to perform an allocation, while external latency is time to obtain or return storage from or to the OS.
     1243Internal latency is the time to perform an allocation, while external latency is time to obtain/return storage from/to the OS.
    11281244Ideally latency is $O(1)$ with a small constant.
    11291245
    1130 $O(1)$ internal latency means no open searching on the allocation fastpath, which largely prohibits coalescing.
    1131 The mitigating factor is that most programs have a small, fixed, allocation pattern, where the majority of allocation operations can be $O(1)$ and heap blowup does not occur without coalescing (although the allocation footprint may be slightly larger).
    1132 Modern computers have large memories so a slight increase in program footprint is not a problem.
    1133 
    1134 $O(1)$ external latency means obtaining one large storage area from the OS and subdividing it across all program allocations, which requires a good guess at the program storage high-watermark and potential large external fragmentation.
    1135 Excluding real-time OSs, OS operations are unbounded, and hence some external latency is unavoidable.
    1136 The mitigating factor is that OS calls can often be reduced if a programmer has a sense of the storage high-watermark and the allocator is capable of using this information (see @malloc_expansion@ \pageref{p:malloc_expansion}).
    1137 Furthermore, while OS calls are unbounded, many are now reasonably fast, so their latency is tolerable because it occurs infrequently.
     1246To obtain $O(1)$ internal latency means no searching on the allocation fastpath and largely prohibits coalescing, which leads to external fragmentation.
     1247The mitigating factor is that most programs have well behaved allocation patterns, where the majority of allocation operations can be $O(1)$, and heap blowup does not occur without coalescing (although the allocation footprint may be slightly larger).
     1248
     1249To obtain $O(1)$ external latency means obtaining one large storage area from the OS and subdividing it across all program allocations, which requires a good guess at the program storage high-watermark and potential large external fragmentation.
     1250Excluding real-time operating-systems, operating-system operations are unbounded, and hence some external latency is unavoidable.
     1251The mitigating factor is that operating-system calls can often be reduced if a programmer has a sense of the storage high-watermark and the allocator is capable of using this information (see @malloc_expansion@ \pageref{p:malloc_expansion}).
     1252Furthermore, while operating-system calls are unbounded, many are now reasonably fast, so their latency is tolerable and infrequent.
    11381253
    11391254
     
    12771392\subsubsection{Alignment}
    12781393
    1279 Allocators have a different minimum storage alignment from the hardware's basic types.
    1280 Often the minimum allocator alignment, $M$, is the bus width (32 or 64-bit), the largest register (double, long double), largest atomic instruction (DCAS), or vector data (MMMX).
    1281 The reason for this larger requirement is the lack of knowledge about the data type occupying the allocation.
    1282 Hence, an allocator assumes the worst-case scenario for the start of data and the compiler correctly aligns items within this data because it knows their types.
    1283 Often the minimum storage alignment is an 8/16-byte boundary on a 32/64-bit computer.
    1284 Alignments larger than $M$ are normally a power of 2, such as page alignment (4/8K).
    1285 Any alignment less than $M$ is raised to the minimal alignment.
    1286 
    1287 llheap aligns its header at the $M$ boundary and its size is $M$;
    1288 hence, data following the header is aligned at $M$.
    1289 This pattern means there is no minimal alignment computation along the allocation fastpath, \ie new storage and reused storage is always correctly aligned.
    1290 An alignment $N$ greater than $M$ is accomplished with a \emph{pessimistic} request for storage that ensures \emph{both} the alignment and size request are satisfied, \eg:
     1394Most dynamic memory allocations have a minimum storage alignment for the contained object(s).
     1395Often the minimum memory alignment, M, is the bus width (32 or 64-bit) or the largest register (double, long double) or largest atomic instruction (DCAS) or vector data (MMMX).
     1396In general, the minimum storage alignment is 8/16-byte boundary on 32/64-bit computers.
     1397For consistency, the object header is normally aligned at this same boundary.
     1398Larger alignments must be a power of 2, such as page alignment (4/8K).
     1399Any alignment request, N, $\le$ the minimum alignment is handled as a normal allocation with minimal alignment.
     1400
     1401For alignments greater than the minimum, the obvious approach for aligning to address @A@ is: compute the next address that is a multiple of @N@ after the current end of the heap, @E@, plus room for the header before @A@ and the size of the allocation after @A@, moving the end of the heap to @E'@.
     1402\begin{center}
     1403\input{Alignment1}
     1404\end{center}
     1405The storage between @E@ and @H@ is chained onto the appropriate free list for future allocations.
     1406The same approach is used for sufficiently large free blocks, where @E@ is the start of the free block, and any unused storage before @H@ or after the allocated object becomes free storage.
     1407In this approach, the aligned address @A@ is the same as the allocated storage address @P@, \ie @P@ $=$ @A@ for all allocation routines, which simplifies deallocation.
     1408However, if there are a large number of aligned requests, this approach leads to memory fragmentation from the small free areas around the aligned object.
     1409As well, it does not work for large allocations, where many memory allocators switch from program @sbrk@ to operating-system @mmap@.
     1410The reason is that @mmap@ only starts on a page boundary, and it is difficult to reuse the storage before the alignment boundary for other requests.
     1411Finally, this approach is incompatible with allocator designs that funnel allocation requests through @malloc@ as it directly manipulates management information within the allocator to optimize the space/time of a request.
     1412
     1413Instead, llheap alignment is accomplished by making a \emph{pessimistic} allocation request for sufficient storage to ensure that \emph{both} the alignment and size request are satisfied, \eg:
    12911414\begin{center}
    12921415\input{Alignment2}
    12931416\end{center}
    1294 The amount of storage necessary is $alignment - M + size$, which ensures there is an address, $A$, after the storage returned from @malloc@, $P$, that is a multiple of $alignment$ followed by sufficient storage for the data object.
    1295 The approach is pessimistic if $P$ happens to have the correct alignment $N$, and the initial allocation has requested sufficient space to move to the next multiple of $N$.
    1296 In this case, there is $alignment - M$ bytes of unused storage after the data object, which could be used by @realloc@.
    1297 Note, the address returned by the allocation is $A$, which is subsequently returned to @free@.
    1298 To correctly free the object, the value $P$ must be computable from $A$, since that is the actual start of the allocation, from which $H$ can be computed $P - M$.
    1299 Hence, there must be a mechanism to detect when $P$ $\neq$ $A$ and then compute $P$ from $A$.
    1300 
    1301 To detect and perform this computation, llheap uses two headers:
    1302 the \emph{original} header $H$ associated with the allocation, and a \emph{fake} header $F$ within this storage before the alignment boundary $A$, e.g.:
     1417The amount of storage necessary is @alignment - M + size@, which ensures there is an address, @A@, after the storage returned from @malloc@, @P@, that is a multiple of @alignment@ followed by sufficient storage for the data object.
     1418The approach is pessimistic because if @P@ already has the correct alignment @N@, the initial allocation has already requested sufficient space to move to the next multiple of @N@.
     1419For this special case, there is @alignment - M@ bytes of unused storage after the data object, which subsequently can be used by @realloc@.
     1420
     1421Note, the address returned is @A@, which is subsequently returned to @free@.
     1422However, to correctly free the allocated object, the value @P@ must be computable, since that is the value generated by @malloc@ and returned within @memalign@.
     1423Hence, there must be a mechanism to detect when @P@ $\neq$ @A@ and how to compute @P@ from @A@.
     1424
     1425The llheap approach uses two headers:
     1426the \emph{original} header associated with a memory allocation from @malloc@, and a \emph{fake} header within this storage before the alignment boundary @A@, which is returned from @memalign@, e.g.:
    13031427\begin{center}
    13041428\input{Alignment2Impl}
    13051429\end{center}
    1306 Since every allocation is aligned at $M$, $P$ $\neq$ $A$ only holds for alignments greater than $M$.
    1307 When $P$ $\neq$ $A$, the minimum distance between $P$ and $A$ is $M$ bytes, due to the pessimistic storage-allocation.
    1308 Therefore, there is always room for an $M$-byte fake header before $A$.
    1309 The fake header must supply an indicator to distinguish it from a normal header and the location of address $P$ generated by the allocation.
     1430Since @malloc@ has a minimum alignment of @M@, @P@ $\neq$ @A@ only holds for alignments greater than @M@.
     1431When @P@ $\neq$ @A@, the minimum distance between @P@ and @A@ is @M@ bytes, due to the pessimistic storage-allocation.
     1432Therefore, there is always room for an @M@-byte fake header before @A@.
     1433
     1434The fake header must supply an indicator to distinguish it from a normal header and the location of address @P@ generated by @malloc@.
    13101435This information is encoded as an offset from A to P and the initialize alignment (discussed in Section~\ref{s:ReallocStickyProperties}).
    13111436To distinguish a fake header from a normal header, the least-significant bit of the alignment is used because the offset participates in multiple calculations, while the alignment is just remembered data.
     
    13181443\label{s:ReallocStickyProperties}
    13191444
    1320 The allocation routine @realloc@ provides a memory-management pattern for shrinking/enlarging an existing allocation, while maintaining some or all of the object data.
    1321 The realloc pattern is simpler than the suboptimal manually steps.
     1445The allocation routine @realloc@ provides a memory-management pattern for shrinking/enlarging an existing allocation, while maintaining some or all of the object data, rather than performing the following steps manually.
    13221446\begin{flushleft}
    13231447\begin{tabular}{ll}
     
    13311455&
    13321456\begin{lstlisting}
    1333 T * naddr = (T *)malloc( newSize ); $\C[2in]{// new storage}$
     1457T * naddr = (T *)malloc( newSize ); $\C[2.4in]{// new storage}$
    13341458memcpy( naddr, addr, oldSize );  $\C{// copy old bytes}$
    13351459free( addr );                           $\C{// free old storage}$
     
    13381462\end{tabular}
    13391463\end{flushleft}
    1340 The manual steps are suboptimal because there may be sufficient internal fragmentation at the end of the allocation due to bucket sizes.
    1341 If this storage is large enough, it eliminates a new allocation and copying.
    1342 Alternatively, if the storage is made smaller, there may be a reasonable crossover point, where just increasing the internal fragmentation eliminates a new allocation and copying.
    1343 This pattern should be used more frequently to reduce storage management costs.
    1344 In fact, if @oaddr@ is @nullptr@, @realloc@ does a @malloc( newSize)@, and if @newSize@ is 0, @realloc@ does a @free( oaddr )@, so all allocation/deallocation can be done with @realloc@.
    1345 
    1346 The hidden problem with this pattern is the effect of zero fill and alignment with respect to reallocation.
    1347 For safety, we argue these properties should be persistent (``sticky'') and not transient.
    1348 For example, when memory is initially allocated by @calloc@ or @memalign@ with zero fill or alignment properties, any subsequent reallocations of this storage must preserve these properties.
    1349 Currently, allocation properties are not preserved nor is it possible to query an allocation to maintain these properties manually.
    1350 Hence, subsequent use of @realloc@ storage that assumes any initially properties may cause errors.
    1351 This silent problem is unintuitive to programmers, can cause catastrophic failure, and is difficult to debug because it is transient.
    1352 To prevent these problems, llheap preserves initial allocation properties within an allocation, allowing them to be queried, and the semantics of @realloc@ preserve these properties on any storage change.
    1353 As a result, the realloc pattern is efficient and safe.
     1464The realloc pattern leverages available storage at the end of an allocation due to bucket sizes, possibly eliminating a new allocation and copying.
     1465This pattern is not used enough to reduce storage management costs.
     1466In fact, if @oaddr@ is @nullptr@, @realloc@ does a @malloc@, so even the initial @malloc@ can be a @realloc@ for consistency in the allocation pattern.
     1467
     1468The hidden problem for this pattern is the effect of zero fill and alignment with respect to reallocation.
     1469Are these properties transient or persistent (``sticky'')?
     1470For example, when memory is initially allocated by @calloc@ or @memalign@ with zero fill or alignment properties, respectively, what happens when those allocations are given to @realloc@ to change size?
     1471That is, if @realloc@ logically extends storage into unused bucket space or allocates new storage to satisfy a size change, are initial allocation properties preserved?
     1472Currently, allocation properties are not preserved, so subsequent use of @realloc@ storage may cause inefficient execution or errors due to lack of zero fill or alignment.
     1473This silent problem is unintuitive to programmers and difficult to locate because it is transient.
     1474To prevent these problems, llheap preserves initial allocation properties for the lifetime of an allocation and the semantics of @realloc@ are augmented to preserve these properties, with additional query routines.
     1475This change makes the realloc pattern efficient and safe.
    13541476
    13551477
    13561478\subsubsection{Header}
    13571479
    1358 To preserve allocation properties requires storing additional information about an allocation.
    1359 Figure~\ref{f:llheapHeader} shows llheap captures this information in the header, which has two fields (left/right) sized appropriately for 32/64-bit alignment requirements.
    1360 
    1361 \begin{figure}
    1362 \centering
    1363 \input{Header}
    1364 \caption{llheap Header}
    1365 \label{f:llheapHeader}
    1366 \end{figure}
    1367 
    1368 The left field is a union of three values:
     1480To preserve allocation properties requires storing additional information with an allocation,
     1481The best available option is the header, where Figure~\ref{f:llheapNormalHeader} shows the llheap storage layout.
     1482The header has two data field sized appropriately for 32/64-bit alignment requirements.
     1483The first field is a union of three values:
    13691484\begin{description}
    13701485\item[bucket pointer]
    1371 is for deallocated of heap storage and points back to the bucket associated with this storage requests (see Figure~\ref{f:llheapStructure} for the fields accessible in a bucket).
     1486is for allocated storage and points back to the bucket associated with this storage requests (see Figure~\ref{f:llheapStructure} for the fields accessible in a bucket).
    13721487\item[mapped size]
    1373 is for deallocation of mapped storage and is the storage size for unmapping.
     1488is for mapped storage and is the storage size for use in unmapping.
    13741489\item[next free block]
    1375 is for freed storage and is an intrusive pointer chaining same-size free blocks onto a bucket's stack of free objects.
     1490is for free storage and is an intrusive pointer chaining same-size free blocks onto a bucket's free stack.
    13761491\end{description}
    1377 The low-order 3-bits of this field are unused for any stored values as these values are at least 8-byte aligned.
     1492The second field remembers the request size versus the allocation (bucket) size, \eg request 42 bytes which is rounded up to 64 bytes.
     1493Since programmers think in request sizes rather than allocation sizes, the request size allows better generation of statistics or errors and also helps in memory management.
     1494
     1495\begin{figure}
     1496\centering
     1497\input{Header}
     1498\caption{llheap Normal Header}
     1499\label{f:llheapNormalHeader}
     1500\end{figure}
     1501
     1502The low-order 3-bits of the first field are \emph{unused} for any stored values as these values are 16-byte aligned by default, whereas the second field may use all of its bits.
    13781503The 3 unused bits are used to represent mapped allocation, zero filled, and alignment, respectively.
    1379 Note, the zero-filled/mapped bits are only used in the normal header and the alignment bit in the fake header.
     1504Note, the alignment bit is not used in the normal header and the zero-filled/mapped bits are not used in the fake header.
    13801505This implementation allows a fast test if any of the lower 3-bits are on (@&@ and compare).
    1381 If no bits are on, it implies a basic allocation, which is handled quickly in the fastpath for allocation and free;
     1506If no bits are on, it implies a basic allocation, which is handled quickly;
    13821507otherwise, the bits are analysed and appropriate actions are taken for the complex cases.
    1383 
    1384 The right field remembers the request size versus the allocation (bucket) size, \eg request of 42 bytes is rounded up to 64 bytes.
    1385 Since programmers think in request sizes rather than allocation sizes, the request size allows better generation of statistics or errors and also helps in memory management.
     1508Since most allocations are basic, they will take significantly less time as the memory operations will be done along the allocation and free fastpath.
    13861509
    13871510
    13881511\subsection{Statistics and Debugging}
    13891512
    1390 llheap can be built to accumulate fast and largely contention-free allocation statistics to help understand dynamic-memory behaviour.
     1513llheap can be built to accumulate fast and largely contention-free allocation statistics to help understand allocation behaviour.
    13911514Incrementing statistic counters must appear on the allocation fastpath.
    13921515As noted, any atomic operation along the fastpath produces a significant increase in allocation costs.
     
    16181741
    16191742\medskip\noindent
    1620 \lstinline{void * aalloc( size_t dimension, size_t elemSize )}
     1743\lstinline{void * aalloc( size_t dim, size_t elemSize )}
    16211744extends @calloc@ for allocating a dynamic array of objects with total size @dim@ $\times$ @elemSize@ but \emph{without} zero-filling the memory.
    16221745@aalloc@ is significantly faster than @calloc@, which is the only alternative given by the standard memory-allocation routines for array allocation.
     
    16301753
    16311754\medskip\noindent
    1632 \lstinline{void * amemalign( size_t alignment, size_t dimension, size_t elemSize )}
     1755\lstinline{void * amemalign( size_t alignment, size_t dim, size_t elemSize )}
    16331756extends @aalloc@ and @memalign@ for allocating a dynamic array of objects with the starting address on the @alignment@ boundary.
    16341757Sets sticky alignment property.
     
    16361759
    16371760\medskip\noindent
    1638 \lstinline{void * cmemalign( size_t alignment, size_t dimension, size_t elemSize )}
     1761\lstinline{void * cmemalign( size_t alignment, size_t dim, size_t elemSize )}
    16391762extends @amemalign@ with zero fill and has the same usage as @amemalign@.
    16401763Sets sticky zero-fill and alignment property.
     
    17581881
    17591882\medskip\noindent
    1760 \lstinline{T * alloc( ... )} or \lstinline{T * alloc( size_t dimension, ... )}
     1883\lstinline{T * alloc( ... )} or \lstinline{T * alloc( size_t dim, ... )}
    17611884is overloaded with a variable number of specific allocation operations, or an integer dimension parameter followed by a variable number of specific allocation operations.
    17621885These allocation operations can be passed as named arguments when calling the \lstinline{alloc} routine.
  • doc/theses/fangren_yu_MMath/.gitignore

    r49510db rc699602  
    1 # generated by latex
    2 build/*
     1# Intermediate Results:
     2build/
     3
     4# Final Files:
    35*.pdf
    46*.ps
    57
    6 # generated by xfig
    7 *.bak
     8# The Makefile here is not generated.
     9!Makefile
  • doc/theses/fangren_yu_MMath/content1.tex

    r49510db rc699602  
    22\label{c:content1}
    33
    4 This chapter discusses \CFA features introduced over time by multiple people and their interactions with the type system.
     4This chapter discusses \CFA feature introduced over time by multiple people and their interactions with the type system.
    55
    66
     
    1919Java has mutable references but no pointers.
    2020\CC has mutable pointers but immutable references;
    21 here, references match with functional programming.
    22 However, the consequence is asymmetric semantics between pointer and reference.
     21hence, references match with functional programming.
     22However, the consequence is asymmetry semantics between the pointer and reference.
    2323\CFA adopts a uniform policy between pointers and references where mutability is a separate property made at the declaration.
    2424
     
    6464The call applies an implicit dereference once to @x@ so the call is typed @f( int & )@ with @T = int@, rather than with @T = int &@.
    6565
    66 As for a pointer type, a reference type may have qualifiers, where @const@ is most common.
     66As for a pointer type, a reference type may have qualifiers, where @const@ is most interesting.
    6767\begin{cfa}
    6868int x = 3; $\C{// mutable}$
     
    113113In the initial \CFA reference design, the goal was to make the reference type a \emph{real} data type \vs a restricted \CC reference, which is mostly used for choosing the argument-passing method, \ie by-value or by-reference.
    114114However, there is an inherent ambiguity for auto-dereferencing: every argument expression involving a reference variable can potentially mean passing the reference's value or address.
    115 For example, in
    116 \begin{cfa}
    117 int & x;
    118 forall( T ) void foo( T );
    119 forall( T ) void bar( T & );
    120 foo( x ); $\C{// means pass by value}$
    121 bar( x ); $\C{// means pass by reference}$
    122 \end{cfa}
    123 the call to @foo@ must pass @x@ by value, implying auto-dereference, while the call to @bar@ must pass @x@ by reference, implying no auto-dereference.
    124115Without any restrictions, this ambiguity limits the behaviour of reference types in \CFA polymorphic functions, where a type @T@ can bind to a reference or non-reference type.
    125116This ambiguity prevents the type system treating reference types the same way as other types, even if type variables could be bound to reference types.
     
    159150Even if the object trait can be made optional, the current type system often misbehaves by adding undesirable auto-dereference on the referenced-to value rather than the reference variable itself, as intended.
    160151Some tweaks are necessary to accommodate reference types in polymorphic contexts and it is unclear what can or cannot be achieved.
    161 Currently, there are contexts where the \CFA programmer is forced to use a pointer type, giving up the benefits of auto-dereference operations and better syntax with reference types.
     152Currently, there are contexts where \CFA programmer is forced to use a pointer type, giving up the benefits of auto-dereference operations and better syntax with reference types.
    162153
    163154
     
    171162\begin{tabular}{@{}l@{\hspace{20pt}}l@{}}
    172163\begin{cfa}
    173 int foo( int &p1, int &p2 );  // in/out parameters
     164
     165int foo( int &p2, int &p3 );  // in/out parameters
    174166int x, y = 3, z = 4;
    175 x = foo( y, z );  // return 3 values: 1 out, 2 in/out
    176 \end{cfa}
    177 &
    178 \begin{cfa}
    179 struct Ret { int x, y, z; } ret;
    180 Ret foo( int p1, int p2 );  // return structure
    181 ret = foo( 3, 4 );  // return 3 values: 3 out
     167x = foo( y, z );  // return 3 values
     168\end{cfa}
     169&
     170\begin{cfa}
     171struct Ret { int x, y, z; };
     172Ret foo( int p2, int p3 );  // multiple return values
     173Ret ret = { .y = 3, .z = 4 };
     174ret = foo( ret.y, ret.z );  // return 3 values
    182175\end{cfa}
    183176\end{tabular}
    184177\end{cquote}
    185 Like Go, K-W C allows direct return of multiple values into a tuple.
    186 \begin{cfa}
    187 @[int, int, int]@ foo( int p1, int p2 );
    188 @[x, y, z]@ = foo( 3, 4 );  // return 3 values into a tuple
     178K-W C allows direct return of multiple values into a tuple.
     179\begin{cfa}
     180@[int, int, int]@ foo( int p2, int p3 );
     181@[x, y, z]@ = foo( y, z );  // return 3 values into a tuple
    189182\end{cfa}
    190183Along with making returning multiple values a first-class feature, tuples were extended to simplify a number of other common context that normally require multiple statements and/or additional declarations, all of which reduces coding time and errors.
     
    212205bar( @foo@( 3 ), @foo@( 3 ) );
    213206\end{cfa}
    214 The type resolver only has the tuple return types to resolve the call to @bar@ as the @foo@ parameters are identical.
    215 The resultion involves unifying the flattened @foo@ return values with @bar@'s parameter list.
     207The type resolver only has the tuple return types to resolve the call to @bar@ as the @foo@ parameters are identical, which involves unifying the flattened @foo@ return values with @bar@'s parameter list.
    216208However, no combination of @foo@s is an exact match with @bar@'s parameters;
    217209thus, the resolver applies C conversions to obtain a best match.
    218210The resulting minimal cost expression is @bar( foo@$_1$@( 3 ), foo@$_2$@( 3 ) )@, where the two possible coversions are (@int@, {\color{red}@int@}, @double@) to (@int@, {\color{red}@double@}, @double@) with a safe (widening) conversion from @int@ to @double@ versus ({\color{red}@double@}, {\color{red}@int@}, {\color{red}@int@}) to ({\color{red}@int@}, {\color{red}@double@}, {\color{red}@double@}) with one unsafe (narrowing) conversion from @double@ to @int@ and two safe conversions from @int@ to @double@.
    219 Go provides a simplified mechanism where only one tuple returning function call is allowed and there are no implicit type conversions.
    220 \begin{lstlisting}[language=Go]
    221 func foo( int ) ( int, int, int ) { return 3, 7, 8 }
    222 func bar( int, int, int ) { ... } // types must match
    223 bar( foo( 3 ) ) // only one tuple returning call
    224 \end{lstlisting}
    225 Hence, programers cannot take advantage of the full power of tuples but type match is straightforward.
     211The programming language Go provides a similar but simplier tuple mechanism, as it does not have overloaded functions.
    226212
    227213K-W C also supported tuple variables, but with a strong distinction between tuples and tuple values/variables.
     
    319305\end{cfa}
    320306\VRef[Figure]{f:AlternateTupleImplementation} shows the two implementation approaches.
    321 In the left approach, the return statement is rewritten to pack the return values into a structure, which is returned by value, and the structure fields are individually assigned to the left-hand side of the assignment.
     307In the left approach, the return statement is rewritten to pack the return values into a structure, which is returned by value, and the structure fields are indiviually assigned to the left-hand side of the assignment.
    322308In the right approach, the return statement is rewritten as direct assignments into the passed-in argument addresses.
    323 The upside of the right implementation is consistence and no copying.
     309The right imlementation looks more concise and saves unnecessary copying.
    324310The downside is indirection within @gives_two@ to access values, unless values get hoisted into registers for some period of time, which is common.
    325311
     
    328314\setlength{\tabcolsep}{20pt}
    329315\begin{tabular}{@{}ll@{}}
    330 Till K-W C implementation & Esteves \CFA implementation \\
     316Till K-W C implementation & Rodolfo \CFA implementation \\
    331317\begin{cfa}
    332318struct _tuple2 { int _0; int _1; }
     
    357343
    358344Interestingly, in the third implementation of \CFA tuples by Robert Schluntz~\cite[\S~3]{Schluntz17}, the MVR functions revert back to structure based, where it remains in the current version of \CFA.
    359 The reason for the reversion is a uniform approach for tuple values/variables making tuples first-class types in \CFA, \ie allow tuples with corresponding tuple variables.
    360 This reversion was possible, because in parallel with Schluntz's work, generic types were added independently by Moss~\cite{Moss19}, and the tuple variables leveraged the same implementation techniques as for generic variables~\cite[\S~3.7]{Schluntz17}.
    361 For example, these two tuples:
    362 \begin{cfa}
    363 [double, double] x;
    364 [int, double, int] y;
    365 \end{cfa}
    366 are transformed internally into two generic structures:
    367 \begin{cfa}
    368 forall( T0 &, & T1 | sized( T0 ) | sized( T1 ) )
    369 struct _tuple2_ {
    370         T0 field_0 ;   T1 field_1 ;
    371 };
    372 forall( T0 &, T1 &, T2 & | sized( T0 ) | sized( T1 ) | sized( T2 ) )
    373 struct _tuple3_ {
    374         T0 field_0 ;   T1 field_1 ;   T2 field_2 ;
    375 };
    376 \end{cfa}
    377 and the declarations become instances of these generic structure types:
    378 \begin{cfa}
    379 _tuple2_( double, double ) x;
    380 _tuple3_( int, double, int ) y;
    381 \end{cfa}
    382 Now types @_tuple2_@ and @_tuple3_@ are available for any further 2 or 3 tuple-types in the translation unit, simplifying internal code transformations by memoizing a small set of tuple structures.
    383 Ultimately, these generic types are lowered to specific C structures during code generation.
    384 Scala, like \CC, provides tuple types through a library using this structural expansion, \eg Scala provides tuple sizes 1 through 22 via hand-coded generic data-structures.
     345The reason for the reversion was to have a uniform approach for tuple values/variables making tuples first-class types in \CFA, \ie allow tuples with corresponding tuple variables.
     346This extension was possible, because in parallel with Schluntz's work, generic types were added independently by Moss~\cite{Moss19}, and the tuple variables leveraged the same implementation techniques as the generic variables.
     347\PAB{I'm not sure about the connection here. Do you have an example of what you mean?}
    385348
    386349However, after experience gained building the \CFA runtime system, making tuple-types first-class seems to add little benefit.
     
    398361Furthermore, since operator overloading in \CFA is implemented by treating operators as overloadable functions, tuple types are very rarely used in a structured way.
    399362When a tuple-type expression appears in a function call (except assignment expressions, which are handled differently by mass- or multiple-assignment expansions), it is always flattened, and the tuple structure of function parameter is not considered a part of the function signatures.
    400 For example, these two prototypes for @foo@:
     363For example,
    401364\begin{cfa}
    402365void f( int, int );
     
    404367f( 3, 4 );  // ambiguous call
    405368\end{cfa}
    406 have the same signature (a function taking two @int@s and returning nothing), and therefore invalid overloads.
     369the two prototypes for @foo@ have the same signature (a function taking two @int@s and returning nothing), and therefore invalid overloads.
    407370Note, the ambiguity error occurs at the call rather than at the second declaration of @f@, because it is possible to have multiple equivalent prototype definitions of a function.
    408371Furthermore, ordinary polymorphic type-parameters are not allowed to have tuple types.
     
    422385Therefore, tuple types are never present in any fixed-argument function calls, because of the flattening.
    423386
    424 \begin{comment}
    425 Date: Mon, 13 Jan 2025 10:09:06 -0500
    426 Subject: Re: structure / tuple
    427 To: "Peter A. Buhr" <pabuhr@uwaterloo.ca>
    428 CC: Andrew Beach <ajbeach@uwaterloo.ca>,
    429         Michael Brooks <mlbrooks@uwaterloo.ca>,
    430         Fangren Yu <f37yu@uwaterloo.ca>, Jiada Liang <j82liang@uwaterloo.ca>,
    431         Alvin Zhang <alvin.zhang@uwaterloo.ca>,
    432         Kyoung Seo <lseo@plg.uwaterloo.ca>
    433 From: Gregor Richards <gregor.richards@uwaterloo.ca>
    434 
    435 Languages support tuples to abbreviate syntax where the meaning of several
    436 values is obvious from context, such as returns from functions, or where the
    437 effort of creating a dedicated type is not worth the reward of using that type
    438 in exactly one location. The positions always have meanings which could be
    439 given names, and are only not given names for brevity. Whether that brevity is
    440 a good idea or not is the programmer's problem to deal with. I don't think
    441 there's any pragmatic value to tuples beyond brevity. (From a theoretical
    442 perspective, having the empty tuple is useful for type-theoretical reasons, and
    443 tuples are usually easier to reason about than structures, but that only
    444 applies to theoretical reasoning, not to actual programming.)
    445 
    446 Your distinction unstructured tuples could just as well be made for structs as
    447 well, if you had named arguments (or named returns?).  Personally, I think that
    448 having these be a syntactic distinction is a mistake. Other languages return
    449 fully codified tuples, and if you immediately destructure them, even the most
    450 naive optimizer will manage to never create an actual tuple in memory. In my
    451 opinion, since tuples are for brevity, they should always be declared with your
    452 "unstructured" syntax, and it's up to the optimizer to realize when you've
    453 never stored them. But, you live closer to the metal in CFA than most
    454 languages, so perhaps communicating that intent is of sufficient value.
    455 
    456 The only value of tuples beyond that is to make it possible for annoying
    457 students to use std::pair in place of ever creating their own class hierarchy
    458 or naming things. Then again, I hear that that is one of the hard problems in
    459 computer science.
    460 
    461 With valediction,
    462   - Gregor Richards
    463 
    464 On 1/13/25 09:11, Peter A. Buhr wrote:
    465 > The CFA team has been discussing the difference between a structure and
    466 > tuple.  Basically, a structure has named fields and a tuple has anonymous
    467 > fields. As a result, structure access uses field names and tuple access uses
    468 > position.
    469 >
    470 >    struct S { int i, j, k ; };
    471 >    S s;
    472 >    s.i; s.j; // field access
    473 >
    474 >    tuple T { int, int };
    475 >    T t;
    476 >    t.0; t.1; // position access, zero origin
    477 >    t[0]; t[1]; // alternate access
    478 >
    479 > Hence the difference is small.
    480 >
    481 > In CFA, we differentiate between unstructured and structured tuples. An
    482 > unstructured tuple is a lexical grouping of potentially disjoint variables.
    483 >
    484 >    [ int, int, int ] f();
    485 >    void g( int, int, int );
    486 >    x, y, z = f(); // Go unstructured tuple, flatten tuple
    487 >    g( foo() ); // flatten tuple
    488 >
    489 > Here, the tuple returned from f is flattened into disjoint variables.  A
    490 > structured tuple is like above and has contiguous memory.
    491 >
    492 > CFA has fancy unstructured stuff like
    493 >
    494 >    s.[i,k] += 1; // add 1 to each field
    495 >    t.[1,0] = 1; // don't think this works but could
    496 >
    497 > which is just an unstructured tuple access (sugar).
    498 >
    499 > What is your opinion of structures and tuples since the difference is
    500 > small. Why do many languages support both features? Are we missing some
    501 > important aspect of tuples that differentiates them from structures?  Is CFA
    502 > unique in having both unstructured and structured tuples?
    503 \end{comment}
    504 
    505387Finally, a type-safe variadic argument signature was added by Robert Schluntz~\cite[\S~4.1.2]{Schluntz17} using @forall@ and a new tuple parameter-type, denoted by the keyword @ttype@ in Schluntz's implementation, but changed to the ellipsis syntax similar to \CC's template parameter pack.
    506388For C variadics, \eg @va_list@, the number and types of the arguments must be conveyed in some way, \eg @printf@ uses a format string indicating the number and types of the arguments.
    507 \begin{cfa}
    508 int printf( const char * format, ${\color{red}\LARGE ...}$ );  // variadic list of variables to print
    509 \end{cfa}
    510389\VRef[Figure]{f:CVariadicMaxFunction} shows an $N$ argument @maxd@ function using the C untyped @va_list@ interface.
    511390In the example, the first argument is the number of following arguments, and the following arguments are assumed to be @double@;
     
    517396\begin{cfa}
    518397double maxd( int @count@, @...@ ) { // ellipse parameter
    519         double max = 0;
    520         va_list args;
    521         va_start( args, count );
    522         for ( int i = 0; i < count; i += 1 ) {
    523                 double num = va_arg( args, double );
    524                 if ( num > max ) max = num;
    525         }
    526         va_end(args);
    527         return max;
     398    double max = 0;
     399    va_list args;
     400    va_start( args, count );
     401    for ( int i = 0; i < count; i += 1 ) {
     402        double num = va_arg( args, double );
     403        if ( num > max ) max = num;
     404    }
     405    va_end(args);
     406    return max;
    528407}
    529408printf( "%g\n", maxd( @4@, 25.0, 27.3, 26.9, 25.7 ) );
     
    533412\end{figure}
    534413
    535 There are two common patterns for using variadic functions in \CFA.
     414There are two common patterns for using the variadic functions in \CFA.
    536415\begin{enumerate}[leftmargin=*]
    537416\item
     
    551430Structural recursion for processing the argument-pack values one at a time, \eg:
    552431\begin{cfa}
    553 forall( T | { int ?<?( T, T ); } )
    554 T max( T v1, T v2 ) { return v1 < v2 ? v2 : v1; }
     432forall( T | { int ?>?( T, T ); } )
     433T max( T v1, T v2 ) { return v1 > v2 ? v1 : v2; }
    555434$\vspace{-10pt}$
    556435forall( T, TT ... | { T max( T, T ); T max( TT ); } )
    557436T max( T arg, TT args ) { return max( arg, max( args ) ); }
    558437\end{cfa}
    559 The first non-recursive @max@ function is the polymorphic base-case for the recursion, \ie, find the maximum of two identically typed values with a less-than (@<@) operator.
    560 The second recursive @max@ function takes two parameters, @T@ and the @TT@ tuple pack, handling all argument lengths greater than two.
     438The first non-recursive @max@ function is the polymorphic base-case for the recursion, \ie, find the maximum of two identically typed values with a greater-than (@>@) operator.
     439The second recursive @max@ function takes two parameters, a @T@ and a @TT@ tuple pack, handling all argument lengths greater than two.
    561440The recursive function computes the maximum for the first argument and the maximum value of the rest of the tuple pack.
    562441The call of @max@ with one argument is the recursive call, where the tuple pack is converted into two arguments by taking the first value (lisp @car@) from the tuple pack as the first argument (flattening) and the remaining pack becomes the second argument (lisp @cdr@).
     
    573452And because \CFA compiles polymorphic functions versus template expansion, many wrapper functions are generated to implement both user-defined generic-types and polymorphism with variadics.
    574453Fortunately, the only permitted operations on polymorphic function parameters are given by the list of assertion (trait) functions.
    575 Nevertheless, this small set of functions eventually needs to be called with flattened tuple arguments.
     454Nevertheless, this small set of functions eventually need to be called with flattened tuple arguments.
    576455Unfortunately, packing the variadic arguments into a rigid @struct@ type and generating all the required wrapper functions is significant work and largely wasted because most are never called.
    577456Interested readers can refer to pages 77-80 of Robert Schluntz's thesis to see how verbose the translator output is to implement a simple variadic call with 3 arguments.
    578457As the number of arguments increases, \eg a call with 5 arguments, the translator generates a concrete @struct@ types for a 4-tuple and a 3-tuple along with all the polymorphic type data for them.
    579458An alternative approach is to put the variadic arguments into an array, along with an offset array to retrieve each individual argument.
    580 This method is similar to how the C @va_list@ object is used (and how \CFA accesses polymorphic fields in a generic type), but the \CFA variadics generate the required type information to guarantee type safety (like the @printf@ format string).
    581 For example, given the following heterogeneous, variadic, typed @print@ and usage:
     459This method is similar to how the C @va_list@ object is used (and how \CFA accesses polymorphic fields in a generic type), but the \CFA variadics generate the required type information to guarantee type safety.
     460For example, given the following heterogeneous, variadic, typed @print@ and usage.
    582461\begin{cquote}
    583462\begin{tabular}{@{}ll@{}}
     
    608487}
    609488\end{cfa}
    610 where the fixed-arg polymorphism for @T@ can be handled by the standard @void *@-based \CFA polymorphic calling conventions, and the type information can be deduced at the call site.
     489where the fixed-arg polymorphism for @T@ can be handled by the standard @void *@-based \CFA polymorphic calling conventions, and the type information can all be deduced at the call site.
    611490Note, the variadic @print@ supports heterogeneous types because the polymorphic @T@ is not returned (unlike variadic @max@), so there is no cascade of type relationships.
    612491
    613492Turning tuples into first-class values in \CFA does have a few benefits, namely allowing pointers to tuples and arrays of tuples to exist.
    614 However, it seems unlikely that these types have realistic use cases that cannot be achieved with structures.
     493However, it seems unlikely that these types have realistic use cases that cannot be achieved without them.
    615494And having a pointer-to-tuple type potentially forbids the simple offset-array implementation of variadic polymorphism.
    616495For example, in the case where a type assertion requests the pointer type @TT *@ in the above example, it forces the tuple type to be a @struct@, and thus incurring a high cost.
    617496My conclusion is that tuples should not be structured (first-class), rather they should be unstructured.
    618 This agrees with Rodolfo's original description:
     497This agrees with Rodolfo's original describes
    619498\begin{quote}
    620499As such, their [tuples] use does not enforce a particular memory layout, and in particular, does not guarantee that the components of a tuple occupy a contiguous region of memory.~\cite[pp.~74--75]{Esteves04}
     
    630509However, this forces the programer to use a tuple variable and possibly a tuple type to support a constructor, when they actually want separate variables with separate constructors.
    631510And as stated previously, type variables (structured tuples) are rare in general \CFA programming so far.
    632 To address this issue, while retaining the ability to leverage constructors, I proposed the following new tuple-like declaration syntax.
     511To address this issue, while retaining the ability to leverage constructors, the following new tuple-like declaration syntax is proposed.
    633512\begin{cfa}
    634513[ int x, int y ] = gives_two();
     
    642521\end{cfa}
    643522and the implementation performs as much copy elision as possible.
    644 Currently, this new declaration form is parsed by \CFA, showing its syntax is viable, but it is unimplemented because of downstream resolver issues.
    645523
    646524
     
    648526\label{s:inlineSubstructure}
    649527
    650 As mentioned, C allows an anonymous aggregate type (@struct@ or @union@) to be embedded (nested) within another one \see{\VRef[Figure]{f:Nesting}}, \eg a tagged union.
     528As mentioned \see{\VRef[Figure]{f:Nesting}}, C allows an anonymous aggregate type (@struct@ or @union@) to be embedded (nested) within another one, \eg a tagged union.
    651529\begin{cfa}
    652530struct S {
    653531        unsigned int tag;
    654         union { // anonymous nested aggregate
     532        union { $\C{// anonymous nested aggregate}$
    655533                int x;  double y;  char z;
    656534        };
    657535} s;
    658536\end{cfa}
    659 Here, the @union@ combines its field into a common block of storage, and because there is no variable-name overloading in C, all of the union field names must be unique.
    660 Furthermore, because the union is unnamed, these field-names are hoisted into the @struct@, giving direct access, \eg @s.x@;
    661 hence, the union field names must be unique with the structure field names.
    662 The same semantics applies to a nested anonymous @struct@:
     537The @union@ field-names are hoisted into the @struct@, so there is direct access, \eg @s.x@;
     538hence, field names must be unique.
     539For a nested anonymous @struct@, both field names and values are hoisted.
    663540\begin{cquote}
    664541\begin{tabular}{@{}l@{\hspace{35pt}}l@{}}
     
    679556\end{tabular}
    680557\end{cquote}
    681 However, unlike the union which provides storage sharing, there is no semantic difference between the nested anonymous structure and its rewritten counterpart.
    682 Hence, the nested anonymous structure provides no useful capability.
    683 
    684 Nested \emph{named} aggregates are allowed in C but there is no qualification operator, like the \CC type operator `@::@', to access an inner type.
    685 \emph{To compensate for the missing type operator, all named nested aggregates are hoisted to global scope, regardless of the nesting depth, and type usages within the nested type are replaced with global type name.}
    686 Hoisting nested types can result in name collisions among types at the global level, which defeats the purpose of nesting the type.
    687 \VRef[Figure]{f:NestedNamedAggregate} shows the nested type @T@ is hoisted to the global scope and the declaration rewrites within structure @S@.
    688 Hence, the possible accesses are:
    689 \begin{cfa}
    690 struct S s;
    691 s.i = 1;
    692 s.t.i = 2;
    693 s.w = (struct T){ 7, 8 };
    694 struct T x = { 5, 6 }; // use (un)nested type name
    695 s.t = (struct T){ 2, 3 };
    696 \end{cfa}
    697 where @T@ is used without qualification even though it is nested in @S@.
    698 It is for these reasons that nested types are not used in C, and if used, are extremely confusing.
    699 
    700 \begin{figure}
     558
     559As an aside, C nested \emph{named} aggregates behave in a (mysterious) way because the nesting is allowed but there is no ability to use qualification to access an inner type, like the \CC type operator `@::@'.
     560\emph{In fact, all named nested aggregates are hoisted to global scope, regardless of the nesting depth.}
    701561\begin{cquote}
    702562\begin{tabular}{@{}l@{\hspace{35pt}}l@{}}
     
    704564\begin{cfa}
    705565struct S {
    706         @struct T@ {
     566        struct T {
    707567                int i, j;
    708         } t; // warning without declaration
    709         struct T w;
    710         int k;
    711 };
    712 
    713 \end{cfa}
    714 &
    715 \begin{cfa}
    716 @struct T@ {
     568        };
     569        struct U {
     570                int k, l;
     571        };
     572};
     573\end{cfa}
     574&
     575\begin{cfa}
     576struct T {
    717577        int i, j;
    718578};
     579struct U {
     580        int k, l;
     581};
    719582struct S {
    720         @struct T t@;
    721         struct T w;
    722         int k;
    723583};
    724584\end{cfa}
    725585\end{tabular}
    726586\end{cquote}
    727 \caption{Nested Named Aggregate}
    728 \label{f:NestedNamedAggregate}
    729 \end{figure}
    730 
     587Hence, the possible accesses are:
     588\begin{cfa}
     589struct S s; // s cannot access any fields
     590struct T t;  t.i;  t.j;
     591struct U u;  u.k;  u.l;
     592\end{cfa}
     593and the hoisted type names can clash with global type names.
    731594For good reasons, \CC chose to change this semantics:
    732595\begin{cquote}
     
    741604\hfill ISO/IEC 14882:1998 (\CC Programming Language Standard)~\cite[C.1.2.3.3]{ANSI98:C++}
    742605\end{cquote}
     606However, there is no syntax to access from a variable through a type to a field.
     607\begin{cfa}
     608struct S s;  @s::T@.i;  @s::U@.k;
     609\end{cfa}
    743610\CFA chose to adopt the \CC non-compatible change for nested types, since \CC's change has already forced certain coding changes in C libraries that must be parsed by \CC.
    744611\CFA also added the ability to access from a variable through a type to a field.
    745612\begin{cfa}
    746 struct S s;  @s.i@;  @s.T@.i;
    747 \end{cfa}
    748 See the use case for this feature at the end of this section.
     613struct S s;  @s.T@.i;  @s.U@.k;
     614\end{cfa}
    749615
    750616% https://gcc.gnu.org/onlinedocs/gcc/Unnamed-Fields.html
    751617
    752 A polymorphic extension to nested aggregates appears in the Plan-9 C dialect, used in the Bell Labs' Plan-9 research operating-system.
     618A polymorphic extension to nested aggregates appears in the Plan-9 C dialect, used in the Bell Labs' Plan-9 research operating system.
    753619The feature is called \newterm{unnamed substructures}~\cite[\S~3.3]{Thompson90new}, which continues to be supported by @gcc@ and @clang@ using the extension (@-fplan9-extensions@).
    754 The goal is to provided the same effect as a nested aggregate with the aggregate type defined elsewhere, which requires it be named.
     620The goal is to provided the same effect of the nested aggregate with the aggregate type defined elsewhere, which requires it be named.
    755621\begin{cfa}
    756622union U {  $\C{// unnested named}$
     
    767633\end{cfa}
    768634Note, the position of the substructure is normally unimportant, unless there is some form of memory or @union@ overlay.
    769 Like an anonymous nested type, a named Plan-9 nested type has its field names hoisted into @struct S@, so there is direct access, \eg @s.x@ and @s.i@.
     635Like an anonymous nested type, a named nested Plan-9 type has its field names hoisted into @struct S@, so there is direct access, \eg @s.x@ and @s.i@.
    770636Hence, the field names must be unique, unlike \CC nested types, but the type names are at a nested scope level, unlike type nesting in C.
    771637In addition, a pointer to a structure is automatically converted to a pointer to an anonymous field for assignments and function calls, providing containment inheritance with implicit subtyping, \ie @U@ $\subset$ @S@ and @W@ $\subset$ @S@, \eg:
     
    823689However, the Plan-9 semantics allow implicit conversions from the outer type to the inner type, which means the \CFA type resolver must take this information into account.
    824690Therefore, the \CFA resolver must implement the Plan-9 features and insert necessary type conversions into the translated code output.
    825 In the current version of \CFA, this is the only kind of implicit type conversion other than the standard C arithmetic conversions.
     691In the current version of \CFA, this is the only kind of implicit type conversion other than the standard C conversions.
    826692
    827693Plan-9 polymorphism can result in duplicate field names.
     
    848714and again the expression @d.x@ is ambiguous.
    849715While \CC has no direct syntax to disambiguate @x@, \ie @d.B.x@ or @d.C.x@, it is possible with casts, @((B)d).x@ or @((C)d).x@.
    850 Like \CC, \CFA compiles the Plan-9 version and provides direct qualification and casts to disambiguate @x@.
     716Like \CC, \CFA compiles the Plan-9 version and provides direct syntax and casts to disambiguate @x@.
    851717While ambiguous definitions are allowed, duplicate field names is poor practice and should be avoided if possible.
    852 However, when a programmer does not control all code, this problem can occur and a naming workaround must exist.
     718However, when a programmer does not control all code, this problem can occur and a naming workaround should exist.
  • doc/theses/fangren_yu_MMath/intro.tex

    r49510db rc699602  
    66\begin{cfa}
    77T sum( T a[$\,$], size_t size ) {
    8         @T@ total = { @0@ };  $\C[1.75in]{// size, 0 for type T}$
     8        @T@ total = { @0@ };  // size, 0 for type T
    99        for ( size_t i = 0; i < size; i += 1 )
    10                 total @+=@ a@[@i@]@; $\C{// + and subscript for T}\CRT$
     10                total @+=@ a@[@i@]@; // + and subscript for T
    1111        return total;
    1212}
     
    2222All computers have multiple types because computer architects optimize the hardware around a few basic types with well defined (mathematical) operations: boolean, integral, floating-point, and occasionally strings.
    2323A programming language and its compiler present ways to declare types that ultimately map into the ones provided by the underlying hardware.
    24 These language types are thrust upon programmers with their syntactic/semantic rules and restrictions.
    25 These rules are then used to transform a language expression to a hardware expression.
     24These language types are thrust upon programmers, with their syntactic and semantic rules and restrictions.
     25These rules are used to transform a language expression to a hardware expression.
    2626Modern programming-languages allow user-defined types and generalize across multiple types using polymorphism.
    2727Type systems can be static, where each variable has a fixed type during execution and an expression's type is determined once at compile time, or dynamic, where each variable can change type during execution and so an expression's type is reconstructed on each evaluation.
     
    3131\section{Overloading}
    3232
     33Overloading allows programmers to use the most meaningful names without fear of name clashes within a program or from external sources, like include files.
    3334\begin{quote}
    3435There are only two hard things in Computer Science: cache invalidation and \emph{naming things}. --- Phil Karlton
    3536\end{quote}
    36 Overloading allows programmers to use the most meaningful names without fear of name clashes within a program or from external sources, like include files.
    3737Experience from \CC and \CFA developers is that the type system implicitly and correctly disambiguates the majority of overloaded names, \ie it is rare to get an incorrect selection or ambiguity, even among hundreds of overloaded (variables and) functions.
    3838In many cases, a programmer has no idea there are name clashes, as they are silently resolved, simplifying the development process.
    39 Depending on the language, any ambiguous cases are resolved explicitly using some form of qualification and/or cast.
     39Depending on the language, any ambiguous cases are resolved using some form of qualification and/or casting.
    4040
    4141
     
    4545Like \CC, \CFA maps operators to named functions and allows these operators to be overloaded with user-defined types.
    4646The syntax for operator names uses the @'?'@ character to denote a parameter, \eg left and right unary operators: @?++@ and @++?@, and binary operators @?+?@ and @?<=?@.
    47 Here, a user-defined type is extended with an addition operation with the same syntax as a builtin type.
     47Here, a user-defined type is extended with an addition operation with the same syntax as builtin types.
    4848\begin{cfa}
    4949struct S { int i, j };
     
    5555The type system examines each call site and selects the best matching overloaded function based on the number and types of arguments.
    5656If there are mixed-mode operands, @2 + 3.5@, the type system attempts (safe) conversions, like in C/\CC, converting the argument type(s) to the parameter type(s).
    57 Conversions are necessary because the hardware rarely supports mix-mode operations, so both operands must be converted to a common type.
    58 Like overloading, the majority of mixed-mode conversions are silently resolved, simplifying the development process.
    59 Without implicit conversions, programmers must write an exponential number of functions covering all possible exact-match cases among all possible types.
     57Conversions are necessary because the hardware rarely supports mix-mode operations, so both operands must be the same type.
     58Note, without implicit conversions, programmers must write an exponential number of functions covering all possible exact-match cases among all possible types.
    6059This approach does not match with programmer intuition and expectation, regardless of any \emph{safety} issues resulting from converted values.
    61 Depending on the language, mix-mode conversions can be explicitly controlled using some form of cast.
    6260
    6361
     
    8381double d = f( 3 );              $\C{// select (2)}\CRT$
    8482\end{cfa}
    85 Alternatively, if the type system uses the return type, there is an exact match for each call, which again matches with programmer intuition and expectation.
    86 This capability can be taken to the extreme, where the only differentiating factor is the return type.
     83Alternatively, if the type system looks at the return type, there is an exact match for each call, which again matches with programmer intuition and expectation.
     84This capability can be taken to the extreme, where there are no function parameters.
    8785\begin{cfa}
    8886int random( void );             $\C[2in]{// (1); overloaded on return type}$
     
    9290\end{cfa}
    9391Again, there is an exact match for each call.
    94 As for operator overloading, if there is no exact match, a set of minimal, an implicit conversion can be added to find a best match.
    95 \begin{cfa}
    96 short int = random();   $\C[2in]{// select (1), unsafe}$
    97 long double = random(); $\C{// select (2), safe}\CRT$
    98 \end{cfa}
     92If there is no exact match, a set of minimal, safe conversions can be added to find a best match, as for operator overloading.
    9993
    10094
     
    10296
    10397Unlike most programming languages, \CFA has variable overloading within a scope, along with shadow overloading in nested scopes.
    104 Shadow overloading is also possible for functions, in languages supporting nested-function declarations, \eg \CC named, nested, lambda functions.
     98(Shadow overloading is also possible for functions, if a language supports nested function declarations, \eg \CC named, nested, lambda functions.)
    10599\begin{cfa}
    106100void foo( double d );
     
    115109\end{cfa}
    116110It is interesting that shadow overloading is considered a normal programming-language feature with only slight software-engineering problems.
    117 However, variable overloading within a scope is considered extremely dangerous, without any evidence to corroborate this claim.
     111However, variable overloading within a scope is often considered extremely dangerous, without any evidence to corroborate this claim.
    118112In contrast, function overloading in \CC occurs silently within the global scope from @#include@ files all the time without problems.
    119113
    120 In \CFA, the type system simply treats an overloaded variable as an overloaded function returning a value with no parameters.
    121 Hence, no effort is required to support this feature as it is available for differentiating among overloaded functions with no parameters.
     114In \CFA, the type system simply treats overloaded variables as an overloaded function returning a value with no parameters.
     115Hence, no significant effort is required to support this feature by leveraging the return type to disambiguate as variables have no parameters.
    122116\begin{cfa}
    123117int MAX = 2147483647;   $\C[2in]{// (1); overloaded on return type}$
     
    131125The result is a significant reduction in names to access typed constants.
    132126
    133 As an aside, C has a separate namespace for types and variables allowing overloading between the namespaces, using @struct@ (qualification) to disambiguate.
     127As an aside, C has a separate namespace for type and variables allowing overloading between the namespaces, using @struct@ (qualification) to disambiguate.
    134128\begin{cfa}
    135129void S() {
     
    139133}
    140134\end{cfa}
    141 Here the name @S@ is an aggregate type and field, and a variable and parameter of type @S@.
    142135
    143136
     
    152145for ( ; x; --x )   =>    for ( ; x @!= 0@; x @-= 1@ )
    153146\end{cfa}
    154 To generalize this feature, both constants are given types @zero_t@ and @one_t@ in \CFA, which allows overloading various operations for new types that seamlessly work within the special @0@ and @1@ contexts.
     147To generalize this feature, both constants are given types @zero_t@ and @one_t@ in \CFA, which allows overloading various operations for new types that seamlessly work with the special @0@ and @1@ contexts.
    155148The types @zero_t@ and @one_t@ have special builtin implicit conversions to the various integral types, and a conversion to pointer types for @0@, which allows standard C code involving @0@ and @1@ to work.
    156149\begin{cfa}
     
    183176\end{cfa}
    184177For type-only, the programmer specifies the initial type, which remains fixed for the variable's lifetime in statically typed languages.
    185 For type-and-initialization, the specified and initialization types may not agree requiring an implicit/explicit conversion.
     178For type-and-initialization, the specified and initialization types may not agree.
    186179For initialization-only, the compiler may select the type by melding programmer and context information.
    187180When the compiler participates in type selection, it is called \newterm{type inferencing}.
    188 Note, type inferencing is different from type conversion: type inferencing \emph{discovers} a variable's type before setting its value, whereas conversion has two typed variables and performs a (possibly lossy) value conversion from one type to the other.
     181Note, type inferencing is different from type conversion: type inferencing \emph{discovers} a variable's type before setting its value, whereas conversion has two typed values and performs a (possibly lossy) action to convert one value to the type of the other variable.
    189182Finally, for assignment, the current variable and expression types may not agree.
    190183Discovering a variable or function type is complex and has limitations.
    191 The following covers these issues, and why this scheme is not amenable with the \CFA type system.
     184The following covers these issues, and why some schemes are not amenable with the \CFA type system.
    192185
    193186One of the first and powerful type-inferencing system is Hindley--Milner~\cite{Damas82}.
     
    210203\end{cfa}
    211204In both overloads of @f@, the type system works from the constant initializations inwards and/or outwards to determine the types of all variables and functions.
    212 Like template meta-programming, there can be a new function generated for the second @f@ depending on the types of the arguments, assuming these types are meaningful in the body of @f@.
     205Note, like template meta programming, there could be a new function generated for the second @f@ depending on the types of the arguments, assuming these types are meaningful in the body of @f@.
    213206Inferring type constraints, by analysing the body of @f@ is possible, and these constraints must be satisfied at each call site by the argument types;
    214207in this case, parametric polymorphism can allow separate compilation.
     
    253246This issue is exaggerated with \CC templates, where type names are 100s of characters long, resulting in unreadable error messages.
    254247\item
    255 Ensuring the type of secondary variables, match a primary variable.
     248Ensuring the type of secondary variables, match a primary variable(s).
    256249\begin{cfa}
    257250int x; $\C{// primary variable}$
     
    259252\end{cfa}
    260253If the type of @x@ changes, the type of the secondary variables correspondingly updates.
    261 There can be strong software-engineering reasons for binding the types of these variables.
    262254\end{itemize}
    263255Note, the use of @typeof@ is more restrictive, and possibly safer, than general type-inferencing.
     
    277269
    278270A restriction is the conundrum in type inferencing of when to \emph{brand} a type.
    279 That is, when is the type of the variable/function more important than the type of its initialization expression(s).
    280 For example, if a change is made in an initialization expression, it can cascade type changes producing many other changes and/or errors.
    281 At some point, a variable's type needs to remain constant and the initializing expression needs to be modified or be in error when it changes.
     271That is, when is the type of the variable/function more important than the type of its initialization expression.
     272For example, if a change is made in an initialization expression, it can cause cascading type changes and/or errors.
     273At some point, a variable's type needs to remain constant and the initializing expression needs to be modified or in error when it changes.
    282274Often type-inferencing systems allow restricting (\newterm{branding}) a variable or function type, so the complier can report a mismatch with the constant initialization.
    283275\begin{cfa}
     
    291283In Haskell, it is common for programmers to brand (type) function parameters.
    292284
    293 A confusion is blocks of code where all declarations are @auto@, as is now common in \CC.
     285A confusion is large blocks of code where all declarations are @auto@, as is now common in \CC.
    294286As a result, understanding and changing the code becomes almost impossible.
    295287Types provide important clues as to the behaviour of the code, and correspondingly to correctly change or add new code.
     
    307299In this situation, having the type name or its short alias is essential.
    308300
    309 \CFA's type system tries to prevent type-resolution mistakes by relying heavily on the type of the left-hand side of assignment to pinpoint the right types within an expression.
     301The \CFA's type system tries to prevent type-resolution mistakes by relying heavily on the type of the left-hand side of assignment to pinpoint the right types within an expression.
    310302Type inferencing defeats this goal because there is no left-hand type.
    311303Fundamentally, type inferencing tries to magic away variable types from the programmer.
     
    316308The entire area of Computer-Science data-structures is obsessed with time and space, and that obsession should continue into regular programming.
    317309Understanding space and time issues is an essential part of the programming craft.
    318 Given @typedef@ and @typeof@ in \CFA, and the strong desire to use the left-hand type in resolution, the decision was made not to support implicit type-inferencing in the type system.
    319 Should a significant need arise, this decision can be revisited.
     310Given @typedef@ and @typeof@ in \CFA, and the strong desire to use the left-hand type in resolution, implicit type-inferencing is unsupported.
     311Should a significant need arise, this feature can be revisited.
    320312
    321313
     
    342334
    343335To constrain polymorphic types, \CFA uses \newterm{type assertions}~\cite[pp.~37-44]{Alphard} to provide further type information, where type assertions may be variable or function declarations that depend on a polymorphic type variable.
    344 Here, the function @twice@ works for any type @T@ with a matching addition operator.
     336For example, the function @twice@ works for any type @T@ with a matching addition operator.
    345337\begin{cfa}
    346338forall( T @| { T ?+?(T, T); }@ ) T twice( T x ) { return x @+@ x; }
    347339int val = twice( twice( 3 ) );  $\C{// val == 12}$
    348340\end{cfa}
    349 Parametric polymorphism and assertions occur in existing type-unsafe (@void *@) C functions, like @qsort@ for sorting an array of unknown values.
     341For example. parametric polymorphism and assertions occurs in existing type-unsafe (@void *@) C @qsort@ to sort an array.
    350342\begin{cfa}
    351343void qsort( void * base, size_t nmemb, size_t size, int (*cmp)( const void *, const void * ) );
     
    394386}
    395387// select type and size from left-hand side
    396 int * ip = malloc();  double * dp = malloc();  [[aligned(64)]] struct S {...} * sp = malloc();
     388int * ip = malloc();  double * dp = malloc();  $@$[aligned(64)] struct S {...} * sp = malloc();
    397389\end{cfa}
    398390The @sized@ assertion passes size and alignment as a data object has no implicit assertions.
    399391Both assertions are used in @malloc@ via @sizeof@ and @_Alignof@.
    400 In practise, this polymorphic @malloc@ is unwrapped by the C compiler and the @if@ statement is elided producing a type-safe call to @malloc@ or @memalign@.
    401 
    402 This mechanism is used to construct type-safe wrapper-libraries condensing hundreds of existing C functions into tens of \CFA overloaded functions.
    403 Here, existing C legacy code is leveraged as much as possible;
     392
     393These mechanism are used to construct type-safe wrapper-libraries condensing hundreds of existing C functions into tens of \CFA overloaded functions.
     394Hence, existing C legacy code is leveraged as much as possible;
    404395other programming languages must build supporting libraries from scratch, even in \CC.
    405396
     
    431422\end{tabular}
    432423\end{cquote}
    433 Traits are implemented by flatten them at use points, as if written in full by the programmer.
    434 Flattening often results in overlapping assertions, \eg operator @+@.
     424Traits are simply flatten at the use point, as if written in full by the programmer, where traits often contain overlapping assertions, \eg operator @+@.
    435425Hence, trait names play no part in type equivalence.
    436 In the example, type @T@ is an object type, and hence, has the implicit internal trait @otype@.
     426Note, the type @T@ is an object type, and hence, has the implicit internal trait @otype@.
    437427\begin{cfa}
    438428trait otype( T & | sized(T) ) {
     
    443433};
    444434\end{cfa}
    445 These implicit routines are used by the @sumable@ operator @?+=?@ for the right side of @?+=?@ and return.
     435The implicit routines are used by the @sumable@ operator @?+=?@ for the right side of @?+=?@ and return.
    446436
    447437If the array type is not a builtin type, an extra type parameter and assertions are required, like subscripting.
     
    455445\begin{enumerate}[leftmargin=*]
    456446\item
    457 Write bespoke data structures for each context.
     447Write bespoke data structures for each context they are needed.
    458448While this approach is flexible and supports integration with the C type checker and tooling, it is tedious and error prone, especially for more complex data structures.
    459449\item
     
    462452\item
    463453Use preprocessor macros, similar to \CC @templates@, to generate code that is both generic and type checked, but errors may be difficult to interpret.
    464 Furthermore, writing and using complex preprocessor macros is difficult and inflexible.
     454Furthermore, writing and using preprocessor macros is difficult and inflexible.
    465455\end{enumerate}
    466456
    467457\CC, Java, and other languages use \newterm{generic types} to produce type-safe abstract data-types.
    468458\CFA generic types integrate efficiently and naturally with the existing polymorphic functions, while retaining backward compatibility with C and providing separate compilation.
    469 For concrete parameters, the generic-type definition can be inlined, like \CC templates, if its definition appears in a header file (\eg @static inline@).
     459However, for known concrete parameters, the generic-type definition can be inlined, like \CC templates.
    470460
    471461A generic type can be declared by placing a @forall@ specifier on a @struct@ or @union@ declaration and instantiated using a parenthesized list of types after the type name.
     
    494484\end{cquote}
    495485\CFA generic types are \newterm{fixed} or \newterm{dynamic} sized.
    496 Fixed-size types have a fixed memory layout regardless of type parameters, whereas dynamic types vary in memory layout depending on the type parameters.
     486Fixed-size types have a fixed memory layout regardless of type parameters, whereas dynamic types vary in memory layout depending on their type parameters.
    497487For example, the type variable @T *@ is fixed size and is represented by @void *@ in code generation;
    498488whereas, the type variable @T@ is dynamic and set at the point of instantiation.
    499489The difference between fixed and dynamic is the complexity and cost of field access.
    500490For fixed, field offsets are computed (known) at compile time and embedded as displacements in instructions.
    501 For dynamic, field offsets are compile-time computed at the call site, stored in an array of offset values, passed as a polymorphic parameter, and added to the structure address for each field dereference within a polymorphic routine.
     491For dynamic, field offsets are computed at compile time at the call site, stored in an array of offset values, passed as a polymorphic parameter, and added to the structure address for each field dereference within a polymorphic routine.
    502492See~\cite[\S~3.2]{Moss19} for complete implementation details.
    503493
     
    527517\section{Contributions}
    528518
    529 The \CFA compiler performance and type capability have been greatly improved through my development work.
    530519\begin{enumerate}
    531 \item
    532 The compilation time of various \CFA library units and test programs has been reduced by an order of magnitude, from minutes to seconds \see{\VRef[Table]{t:SelectedFileByCompilerBuild}}, which made it possible to develop and test more complicated \CFA programs that utilize sophisticated type system features.
    533 The details of compiler optimization work are covered in a previous technical report~\cite{Yu20}, which essentially forms part of this thesis.
    534 \item
    535 The thesis presents a systematic review of the new features added to the \CFA language and its type system.
    536 Some of the more recent inclusions to \CFA, such as tuples and generic structure types, were not well tested during development due to the limitation of compiler performance.
    537 Several issues coming from the interactions of various language features are identified and discussed in this thesis;
    538 some of them I have resolved, while others are given temporary fixes and need to be reworked in the future.
    539 \item
    540 Finally, this thesis provides constructive ideas for fixing a number of high-level issues in the \CFA language design and implementation, and gives a path for future improvements to the language and compiler.
     520\item The \CFA compiler performance and capability have been greatly improved through recent development. The compilation time of various \CFA library units and test programs have been reduced from the order of minutes down to 10-20 seconds, which made it possible to develop and test more complicated \CFA programs that utilize sophisticated type system features. The details of compiler optimization work are covered in a previous technical report.
     521\item The thesis presents a systematic review of the new features that have been added to the \CFA language and its type system. Some of the more recent inclusions to \CFA such as tuples and generic structure types were not well tested when they were being developed, due to the limitation of compiler performance. Several issues coming from the interactions of various language features are identified and discussed in this thesis; some of them are now fully resolved, while others are given temporary fixes and need to be reworked in the future.
     522\item Finally, this thesis provides constructive ideas of fixing the outstanding issues in \CFA language design and implementation, and gives a path for future improvements to \CFA language and compiler.
    541523\end{enumerate}
    542524
Note: See TracChangeset for help on using the changeset viewer.