Context Navigation

← Previous Change
Next Change →

Changeset ce02877 for doc

Timestamp:

Jul 29, 2024, 1:32:51 PM (7 months ago)

Author:

JiadaL <j82liang@…>

Branches:

Children:

Parents:

38e20a80 (diff), 1661ad7 (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the (diff) links above to see all the changes relative to each parent.

Message:

Merge branch 'master' of plg.uwaterloo.ca:software/cfa/cfa-cc

Location:

Files:

: 7 edited

theses/mike_brooks_MMath/array.tex (modified) (25 diffs)
theses/mike_brooks_MMath/programs/hello-accordion.cfa (modified) (5 diffs)
theses/mike_brooks_MMath/programs/hello-array.cfa (modified) (2 diffs)
theses/mike_brooks_MMath/programs/hello-md.cfa (modified) (1 diff)
theses/mike_brooks_MMath/uw-ethesis-frontpgs.tex (modified) (1 diff)
theses/mike_brooks_MMath/uw-ethesis.tex (modified) (1 diff)
user/user.tex (modified) (7 diffs)

Legend:

: Unmodified
: Added
: Removed

doc/theses/mike_brooks_MMath/array.tex

-                      r38e20a80
+                      rce02877
 Arrays in C are possible the single most misunderstood and incorrectly used features in the language, resulting in the largest proportion of runtime errors and security violations.
+This chapter describes the new \CFA language and library features that introduce a length-checked array-type to the \CFA standard library~\cite{Cforall}, \eg:
+\begin{cfa}
+@array( float, 99 )@ x;                                 $\C{// x contains 99 floats}$
+This chapter describes the new \CFA language and library features that introduce a length-checked array-type to the \CFA standard library~\cite{Cforall}.
+Specifically, a new \CFA array is declared:
+\begin{cfa}
+@array( float, 99 )@ x;                                 $\C[2.75in]{// x contains 99 floats}$
+\end{cfa}
+using generic type @array@ with arguments @float@ and @99@.
+A function @f@ is declared with an @array@ parameter of length @42@.
+\begin{cfa}
 void f( @array( float, 42 )@ & p ) {}   $\C{// p accepts 42 floats}$
 f( x );                                                                 $\C{// statically rejected: types are different, 99 != 42}$
+forall( T, [N] )
+void g( @array( T, N )@ & p, int i ) {
+test2.cfa:3:1 error: Invalid application of existing declaration(s) in expression.
+Applying untyped:  Name: f ... to:  Name: x
+\end{cfa}
+The call @f( x )@ is invalid because the @array@ lengths @99@ and @42@ do not match.
+Next, function @g@ introduces a @forall@ prefix on type parameter @T@ and arbitrary \emph{dimension parameter} @N@, the new feature that represents a count of elements managed by the type system.
+\begin{cfa}
+forall( T, @[N]@ )
+void g( array( T, @N@ ) & p, int i ) {
         T elem = p[i];                                          $\C{// dynamically checked: requires 0 <= i < N}$
+}
 g( x, 0 );                                                              $\C{// T is float, N is 99, dynamic subscript check succeeds}$
+g( x, 1000 );                                                   $\C{// T is float, N is 99, dynamic subscript check fails}$
+\end{cfa}
+This example declares variable @x@, with generic type @array@ using arguments @float@ and @99@.
+Function @f@ is declared with an @array@ parameter of length @42@.
+The call @f( x )@ is invalid because the @array@ lengths @99@ and @42@ do not match.
+Next, function @g@ introduces a @forall@ prefix on type parameter @T@ and arbitrary \emph{dimension parameter} @N@, the new feature that represents a count of elements managed by the type system.
+g( x, 1000 );                                                   $\C{// T is float, N is 99, dynamic subscript check fails}\CRT$
+Cforall Runtime error: subscript 1000 exceeds dimension range [0,99) $for$ array 0x555555558020.
+\end{cfa}
 The call @g( x, 0 )@ is valid because @g@ accepts any length of array, where the type system infers @float@ for @T@ and length @99@ for @N@.
 Inferring values for @T@ and @N@ is implicit without programmer involvement.
 …
 forall( [N] )
 void declDemo() {
         float x1[N];                            $\C{// built-in type ("C array")}$
         array(float, N) x2;                     $\C{// type from library}$
+        float x1[N];                                            $\C{// built-in type ("C array")}$
+        array(float, N) x2;                                     $\C{// type from library}$
+}
 \end{cfa}
 Both of the locally-declared array variables, @x1@ and @x2@, have 42 elements, each element being a @float@.
 The two variables have identical size and layout; they both encapsulate 42-float, stack \vs heap allocations with no additional ``bookkeeping'' allocations or headers.
 Providing this explicit generic approach required a significant extension to the \CFA type system to support a full-feature, safe, efficient (space and time) array-type, which forms the foundation for more complex array forms in \CFA.
+Providing this explicit generic approach requires a significant extension to the \CFA type system to support a full-feature, safe, efficient (space and time) array-type, which forms the foundation for more complex array forms in \CFA.
 Admittedly, the @array@ library type (type for @x2@) is syntactically different from its C counterpart.
 A future goal (TODO xref) is to provide a built-in array type with syntax approaching C's (type for @x1@);
 then, the library @array@ type can be removed giving \CFA a largely uniform array type.
 At present, the built-in array is only partially supported, so the generic @array@ is used exclusively in the discussion;
+At present, the C syntax @array@ is only partially supported, so the generic @array@ is used exclusively in the discussion;
 feature support and C compatibility are revisited in Section ? TODO.
 Offering an @array@ type, as a distinct alternative to the C array, is consistent with \CFA's goal of backwards compatibility, \ie virtually all existing C (gcc) programs can be compiled by \CFA with only a small number of changes, similar to \CC (g++).
+Offering the @array@ type, as a distinct alternative to the C array, is consistent with \CFA's goal of backwards compatibility, \ie virtually all existing C (@gcc@) programs can be compiled by \CFA with only a small number of changes, similar to \CC (@g++@).
 However, a few compatibility-breaking changes to the behaviour of the C array are necessary, both as an implementation convenience and to fix C's lax treatment of arrays.
 Hence, the @array@ type is an opportunity to start from a clean slate and show a cohesive selection of features, making it unnecessary to deal with every inherited complexity introduced by the C array TODO xref.
 My contributions are:
+My contributions in this chapter are:
 \begin{enumerate}
 \item A type system enhancement that lets polymorphic functions and generic types be parameterized by a numeric value: @forall( [N] )@.
 …
 \end{figure}
 \VRef[Figure]{f:fHarness} shows a harness that uses the @f@ function illustrating how dynamic values are fed into the @array@ type.
 Here, the dimension of the @x@, @y@, and @result@ arrays is specified from a command-line value and these arrays are allocated on the stack.
+\VRef[Figure]{f:fHarness} shows a harness that uses function @f@ to illustrate how dynamic values are fed into the @array@ type.
+Here, the dimension of arrays @x@, @y@, and @result@ is specified from a command-line value, @dim@, and these arrays are allocated on the stack.
 Then the @x@ array is initialized with decreasing values, and the @y@ array with amounts offset by constant @0.005@, giving relative differences within tolerance initially and diverging for later values.
 The program main is run (see figure bottom) with inputs @5@ and @7@ for sequence lengths.
 The loops follow the familiar pattern of using the variable @n@ to iterate through the arrays.
 Most importantly, the type system implicitly captures @n@ at the call of @f@ and makes it available throughout @f@ as @N@.
 The example shows @n@ adapting into a type-system managed length at the declarations of @x@, @y@, and @result@, @N@ adapting in the same way at @f@'s loop bound, and a pass-thru use of @n@ at @f@'s declaration of @ret@.
+The loops follow the familiar pattern of using the variable @dim@ to iterate through the arrays.
+Most importantly, the type system implicitly captures @dim@ at the call of @f@ and makes it available throughout @f@ as @N@.
+The example shows @dim@ adapting into a type-system managed length at the declarations of @x@, @y@, and @result@, @N@ adapting in the same way at @f@'s loop bound, and a pass-thru use of @dim@ at @f@'s declaration of @ret@.
 Except for the lifetime-management issue of @result@, \ie explicit @free@, this program has eliminated both the syntactic and semantic problems associated with C arrays and their usage.
 These benefits cannot be underestimated.
 …
 \CC has a (mistaken) belief that references are not objects, but pointers are objects.
 In the \CC example, the arrays fall back on C arrays, which have a duality with references with respect to automatic dereferencing.
 The \CFA array is a contiguous object with an address, which can stored as a reference or pointer.
+The \CFA array is a contiguous object with an address, which can be stored as a reference or pointer.
 \item
 C/\CC arrays cannot be copied, while \CFA arrays can be copied, making them a first-class object (although array copy is often avoided for efficiency).
 …
 @template< typename T, size_t N >@
 void copy( T ret[N], T x[N] ) {
+void copy( T ret[@N@], T x[@N@] ) {
         for ( int i = 0; i < N; i += 1 ) ret[i] = x[i];
+}
 …
 int main() {
         @forall( T, [N] )@   // nested function
         void copy( array( T, N ) & ret, array( T, N ) & x ) {
+        void copy( array( T, @N@ ) & ret, array( T, @N@ ) & x ) {
                 for ( i; 10 ) ret[i] = x[i];
+        }
 …
 Continuing the discussion of \VRef[Figure]{f:fHarness}, the example has @f@ expecting two arrays of the same length.
 A compile-time error occurs when attempting to call @f@ with arrays of differing lengths.
+As stated previous, a compile-time error occurs when attempting to call @f@ with arrays of differing lengths.
 % removing leading whitespace
 \lstinput[tabsize=1]{52-53}{hello-array.cfa}
 \lstinput[tabsize=1,aboveskip=0pt]{62-64}{hello-array.cfa}
 As is common practice in C, the programmer is free to cast, \ie to assert knowledge not shared with the type system.
+C allows casting to assert knowledge not shared with the type system.
 \lstinput{70-74}{hello-array.cfa}
 …
 \lstinput{10-15}{hello-accordion.cfa}
 This structure's layout has the starting offset of @municipalities@ varying in @NprovTerty@, and the offset of @total_pt@ and @total_mun@ varying in both generic parameters.
 For a function that operates on a @CanadaPop@ structure, the type system handles this variation transparently.
+For a function that operates on a @CanPop@ structure, the type system handles this variation transparently.
 \lstinput{40-45}{hello-accordion.cfa}
+\VRef[Figure]{f:checkHarness} shows program results where different offset values being used.
+The output values show that @summarize@ and its caller agree on both the offsets (where the callee starts reading @cost_contribs@ and where the callee writes @total_cost@).
+Yet the call site just says, ``pass the request.''
+\VRef[Figure]{f:checkHarness} shows the @CanPop@ harness and results with different array sizes, if the municipalities changed after a census.
 \begin{figure}
 \lstinput{60-68}{hello-accordion.cfa}
 \lstinput{70-72}{hello-accordion.cfa}
+\lstinput{70-75}{hello-accordion.cfa}
 \caption{\lstinline{check} Harness}
 \label{f:checkHarness}
 …
 In general, storage layout is hidden by subscripting, and only appears when passing arrays among different programming languages or accessing specific hardware.
 \VRef[Figure]{f:FixedVariable} shows two C90 approaches for manipulating contiguous arrays.
+\VRef[Figure]{f:FixedVariable} shows two C90 approaches for manipulating a contiguous matrix.
 Note, C90 does not support VLAs.
 The fixed-dimension approach uses the type system;
+The fixed-dimension approach (left) uses the type system;
 however, it requires all dimensions except the first to be specified at compile time, \eg @m[][6]@, allowing all subscripting stride calculations to be generated with constants.
 Hence, every matrix passed to @fp1@ must have exactly 6 columns but the row size can vary.
 The variable-dimension approach ignores (violates) the type system, \ie argument and parameters types do not match, and manually performs pointer arithmetic for subscripting in the macro @sub@.
+The variable-dimension approach (right) ignores (violates) the type system, \ie argument and parameters types do not match, and subscripting is performed manually using pointer arithmetic in the macro @sub@.
 \begin{figure}
 …
         ...  printf( "%d ", @sub( m, r, c )@ );  ...
+}
 int vm1[4][4], vm2[6][8]; // no VLA
+int vm1[@4@][@4@], vm2[@6@][@8@]; // no VLA
 // initialize matrixes
 fp2( 4, 4, vm1 );
 …
 The language decides if the matrix and array-of-array are laid out the same or differently.
 For example, an array-of-array may be an array of row pointers to arrays of columns, so the rows may not be contiguous in memory nor even the same length (triangular matrix).
 Regardless, there is usually a uniform subscripting syntax masking the memory layout, even though the two array forms could be differentiated at the subscript level, \eg @m[1,2]@ \vs @aa[1][2]@.
+Regardless, there is usually a uniform subscripting syntax masking the memory layout, even though a language could differentiated between the two forms using subscript syntax, \eg @m[1,2]@ \vs @aa[1][2]@.
 Nevertheless, controlling memory layout can make a difference in what operations are allowed and in performance (caching/NUMA effects).
 …
 The focus of this work is on the contiguous multidimensional arrays in C.
 The reason is that programmers are often forced to use the more complex array-of-array form when a contiguous array would be simpler, faster, and safer.
 Nevertheless, the C array-of-array form continues to be useful for special circumstances.
+Nevertheless, the C array-of-array form is still important for special circumstances.
 \VRef[Figure]{f:ContiguousNon-contiguous} shows the extensions made in C99 for manipulating contiguous \vs non-contiguous arrays.\footnote{C90 also supported non-contiguous arrays.}
 …
 While this contiguous-array capability is a step forward, it is still the programmer's responsibility to manually manage the number of dimensions and their sizes, both at the function definition and call sites.
 That is, the array does not automatically carry its structure and sizes for use in computing subscripts.
 While the non-contiguous style in @faa@ looks very similar to @fc@, the compiler only understands the unknown-sized array of row pointers, and it relies on the programmer to traverse the columns in a row correctly.
+While the non-contiguous style in @faa@ looks very similar to @fc@, the compiler only understands the unknown-sized array of row pointers, and it relies on the programmer to traverse the columns in a row correctly with a correctly bounded loop index.
 Specifically, there is no requirement that the rows are the same length, like a poem with different length lines.
 …
 this model has no awareness of dimensions just the ability to access memory at a distance from a reference point (base-displacement addressing), \eg @x + 23@ or @x[23}@ $\Rightarrow$ 23rd element from the start of @x@.
 A programmer must manually build any notion of dimensions using other tools;
 hence, this style is not offering multidimensional arrays \see{\VRef[Figure]{f:FixedVariable}}.
+hence, this style is not offering multidimensional arrays \see{\VRef[Figure]{f:FixedVariable} right example}.
 \end{enumerate}
 …
 A C/\CFA array interface includes the resulting memory layout.
 The defining requirement of a type-2 system is the ability to slice a column from a column-finest matrix.
 The required memory shape of such a slice is set, before any discussion of implementation.
+The required memory shape of such a slice is fixed, before any discussion of implementation.
 The implementation presented here is how the \CFA array library wrangles the C type system, to make it do memory steps that are consistent with this layout.
 TODO: do I have/need a presentation of just this layout, just the semantics of -[all]?
 …
 Beyond what C's array type offers, the new array brings direct support for working with a noncontiguous array slice, allowing a program to work with dimension subscripts given in a non-physical order.
 The following examples use an @array( float, 5, 7) m@, loaded with values incremented by $0.1$, when stepping across the length-7 finely-strided column dimension, and stepping across the length-5 coarsely-strided row dimension.
 \par\noindent
+The following examples use the matrix declaration @array( float, 5, 7 ) m@, loaded with values incremented by $0.1$, when stepping across the length-7 finely-strided column dimension, and stepping across the length-5 coarsely-strided row dimension.
+\par
 \mbox{\lstinput{121-126}{hello-md.cfa}}
 \par\noindent
 …
 Specifically, declaring the parameter @r@ with type @array@ means that @r@ is contiguous, which is unnecessarily restrictive.
 That is, @r@ need only be of a container type that offers a subscript operator (of type @ptrdiff_t@ $\rightarrow$ @float@) with managed length @N@.
 The new-array library provides the trait @ix@, so-defined.
+The new-array library provides the trait @ar@, so-defined.
 With it, the original declaration can be generalized with the same body.
 \lstinput{43-44}{hello-md.cfa}
 \lstinput[aboveskip=0pt]{145-145}{hello-md.cfa}
 The nontrivial slicing in this example now allows passing a \emph{noncontiguous} slice to @print1d@, where the new-array library provides a ``subscript by all'' operation for this purpose.
 In a multi-dimensional subscript operation, any dimension given as @all@ is a placeholder, \ie ``not yet subscripted by a value'', waiting for such a value, implementing the @ix@ trait.
+In a multi-dimensional subscript operation, any dimension given as @all@ is a placeholder, \ie ``not yet subscripted by a value'', waiting for such a value, implementing the @ar@ trait.
 \lstinput{150-151}{hello-md.cfa}
 …
 In both cases, value 2 selects from the coarser dimension (rows of @x@),
 while the value 3 selects from the finer dimension (columns of @x@).
 The figure illustrates the value of each subexpression, comparing how numeric subscripting proceeds from @x@, vs from @x[all]@.
+The figure illustrates the value of each subexpression, comparing how numeric subscripting proceeds from @x@, \vs from @x[all]@.
 Proceeding from @x@ gives the numeric indices as coarse then fine, while proceeding from @x[all]@ gives them fine then coarse.
 These two starting expressions, which are the example's only multidimensional subexpressions
 (those that received zero numeric indices so far), are illustrated with vertical steps where a \emph{first} numeric index would select.
 The figure's presentation offers an intuition answering, What is an atomic element of @x[all]@?
 From there, @x[all]@ itself is simply a two-dimensional array, in the strict C sense, of these strange building blocks.
+The figure's presentation offers an intuition answering to: What is an atomic element of @x[all]@?
+From there, @x[all]@ itself is simply a two-dimensional array, in the strict C sense, of these building blocks.
 An atom (like the bottommost value, @x[all][3][2]@), is the contained value (in the square box)
 and a lie about its size (the wedge above it, growing upward).
+An array of these atoms (like the intermediate @x[all][3]@) is just a contiguous arrangement of them,
+done according to their size, as announced.  Call such an array a column.
+A column is almost ready to be arranged into a matrix; it is the \emph{contained value} of the next-level building block,
+but another lie about size is required.
+At first, an atom needed to be arranged as if it were bigger,
+but now a column needs to be arranged as if it is smaller (the wedge above it, shrinking upward).
+An array of these atoms (like the intermediate @x[all][3]@) is just a contiguous arrangement of them, done according to their size;
+call such an array a column.
+A column is almost ready to be arranged into a matrix;
+it is the \emph{contained value} of the next-level building block, but another lie about size is required.
+At first, an atom needs to be arranged as if it were bigger, but now a column needs to be arranged as if it is smaller (the wedge above it, shrinking upward).
 These lying columns, arranged contiguously according to their size (as announced) form the matrix @x[all]@.
+Because @x[all]@ takes indices, first for the fine stride, then for the coarse stride,
+it achieves the requirement of representing the transpose of @x@.
+Yet every time the programmer presents an index, a mere C-array subscript is achieving the offset calculation.
+Because @x[all]@ takes indices, first for the fine stride, then for the coarse stride, it achieves the requirement of representing the transpose of @x@.
+Yet every time the programmer presents an index, a C-array subscript is achieving the offset calculation.
 In the @x[all]@ case, after the finely strided subscript is done (column 3 is selected),
 …
 compared with where analogous rows appear when the row-level option is presented for @x@.
 These size lies create an appearance of overlap.
 For example, in @x[all]@, the shaded band touches atoms 2.0, 2.1, 2.2, 2.3, 1.4, 1.5 and 1.6.
+\PAB{I don't understand this paragraph: These size lies create an appearance of overlap.
+For example, in \lstinline{x[all]}, the shaded band touches atoms 2.0, 2.1, 2.2, 2.3, 1.4, 1.5 and 1.6.
 But only the atom 2.3 is storing its value there.
 The rest are lying about (conflicting) claims on this location, but never exercising these alleged claims.
+The rest are lying about (conflicting) claims on this location, but never exercising these alleged claims.}
 Lying is implemented as casting.
 …
 This structure uses one type in its internal field declaration and offers a different type as the return of its subscript operator.
 The field within is a plain-C array of the fictional type, which is 7 floats long for @x[all][3][2]@ and 1 float long for @x[all][3]@.
 The subscript operator presents what's really inside, by casting to the type below the wedge of lie.
+The subscript operator presents what is really inside, by casting to the type below the wedge of the lie.
 %  Does x[all] have to lie too?  The picture currently glosses over how it it advertises a size of 7 floats.  I'm leaving that as an edge case benignly misrepresented in the picture.  Edge cases only have to be handled right in the code.
+Casting, overlapping and lying are unsafe.
+The mission here is to implement a style-2 feature that the type system helps the programmer use safely.
+The offered style-2 system is allowed to be internally unsafe,
+just as C's implementation of a style-3 system (upon a style-4 system) is unsafe within,
+even when the programmer is using it without casts or pointer arithmetic.
+Having a style-2 system relieves the programmer from resorting to unsafe pointer arithmetic when working with noncontiguous slices.
+The choice to implement this style-2 system upon C's style-3 arrays, rather than its style-4 pointer arithmetic,
+reduces the attack surface of unsafe code.
+My casting is unsafe, but I do not do any pointer arithmetic.
+When a programmer works in the common-case style-3 subset (in the no-@[all]@ top of Figure~\ref{fig:subscr-all}),
+my casts are identities, and the C compiler is doing its usual displacement calculations.
+If I had implemented my system upon style-4 pointer arithmetic,
+then this common case would be circumventing C's battle-hardened displacement calculations in favour of my own.
+\noindent END: Paste looking for a home
+Casting, overlapping, and lying are unsafe.
+The mission is to implement a style-1 feature in the type system for safe use by a programmer.
+The offered style-1 system is allowed to be internally unsafe,
+just as C's implementation of a style-2 system (upon a style-3 system) is unsafe within, even when the programmer is using it without casts or pointer arithmetic.
+Having a style-1 system relieves the programmer from resorting to unsafe pointer arithmetic when working with noncontiguous slices.
+% PAB: repeat from previous paragraph.
+% The choice to implement this style-1 system upon C's style-2 arrays, rather than its style-3 pointer arithmetic, reduces the attack surface of unsafe code.
+% My casting is unsafe, but I do not do any pointer arithmetic.
+% When a programmer works in the common-case style-2 subset (in the no-@[all]@ top of Figure~\ref{fig:subscr-all}), my casts are identities, and the C compiler is doing its usual displacement calculations.
+% If I had implemented my system upon style-3 pointer arithmetic, then this common case would be circumventing C's battle-hardened displacement calculations in favour of my own.
+% \noindent END: Paste looking for a home
 The new-array library defines types and operations that ensure proper elements are accessed soundly in spite of the overlapping.
+The private @arpk@ structure (array with explicit packing) is generic over these two types (and more): the contained element, what it is masquerading as.
+This structure's public interface is the @array(...)@ construction macro and the two subscript operators.
+Construction by @array@ initializes the masquerading-as type information to be equal to the contained-element information.
+Subscripting by @all@ rearranges the order of masquerading-as types to achieve, in general, nontrivial striding.
+Subscripting by a number consumes the masquerading-as size of the contained element type, does normal array stepping according to that size, and returns there element found there, in unmasked form.
+The @arpk@ structure and its @-[i]@ operator are thus defined as:
+\begin{cfa}
+forall( ztype(N),                       $\C{// length of current dimension}$
+        dtype(S) | sized(S),    $\C{// masquerading-as}$
+        dtype E_im,                             $\C{// immediate element, often another array}$
+        dtype E_base                    $\C{// base element, e.g. float, never array}$
+The @arpk@ structure and its @-[i]@ operator are defined as:
+\begin{cfa}
+forall(
+        [N],                                    $\C{// length of current dimension}$
+        S & | sized(S),                 $\C{// masquerading-as}$
+        Timmed &,                               $\C{// immediate element, often another array}$
+        Tbase &                                 $\C{// base element, e.g. float, never array}$
 ) { // distribute forall to each element
         struct arpk {
                 S strides[N];           $\C{// so that sizeof(this) is N of S}$
         };
+        // expose E_im, stride by S
+        E_im & ?[?]( arpk(N, S, E_im, E_base) & a, ptrdiff_t i ) {
+                return (E_im &) a.strides[i];
+        // expose Timmed, stride by S
+        static inline Timmed & ?[?]( arpk( N, S, Timmed, Tbase ) & a, long int i ) {
+                subcheck( a, i, 0, N );
+                return (Timmed &)a.strides[i];
+        }
+}
 \end{cfa}
+The private @arpk@ structure (array with explicit packing) is generic over four types: dimension length, masquerading-as, ...
+This structure's public interface is hidden behind the @array(...)@ macro and the subscript operator.
+Construction by @array@ initializes the masquerading-as type information to be equal to the contained-element information.
+Subscripting by @all@ rearranges the order of masquerading-as types to achieve, in general, nontrivial striding.
+Subscripting by a number consumes the masquerading-as size of the contained element type, does normal array stepping according to that size, and returns there element found there, in unmasked form.
 An instantiation of the @arpk@ generic is given by the @array(E_base, N0, N1, ...)@ expansion, which is @arpk( N0, Rec, Rec, E_base )@, where @Rec@ is @array(E_base, N1, ...)@.
 …
 This section provides a demonstration of the effect.
 The experiment compares the \CFA array system with the padded-room system [TODO:xref] most typically exemplified by Java arrays, but also reflected in the C++ pattern where restricted vector usage models a checked array.
+The experiment compares the \CFA array system with the padded-room system [TODO:xref] most typically exemplified by Java arrays, but also reflected in the \CC pattern where restricted vector usage models a checked array.
 The essential feature of this padded-room system is the one-to-one correspondence between array instances and the symbolic bounds on which dynamic checks are based.
 The experiment compares with the C++ version to keep access to generated assembly code simple.
 As a control case, a simple loop (with no reused dimension sizes) is seen to get the same optimization treatment in both the \CFA and C++ versions.
+The experiment compares with the \CC version to keep access to generated assembly code simple.
+As a control case, a simple loop (with no reused dimension sizes) is seen to get the same optimization treatment in both the \CFA and \CC versions.
 When the programmer treats the array's bound correctly (making the subscript ``obviously fine''), no dynamic bound check is observed in the program's optimized assembly code.
 But when the bounds are adjusted, such that the subscript is possibly invalid, the bound check appears in the optimized assembly, ready to catch an occurrence the mistake.
 …
 \section{Comparison with other arrays}
+\subsection{Rust}
 \CFA's array is the first lightweight application of dependently-typed bound tracking to an extension of C.
 Other extensions of C that apply dependently-typed bound tracking are heavyweight, in that the bound tracking is part of a linearly typed ownership system that further helps guarantee statically the validity of every pointer deference.
+Other extensions of C that apply dependently-typed bound tracking are heavyweight, in that the bound tracking is part of a linearly-typed ownership-system, which further helps guarantee statically the validity of every pointer deference.
 These systems, therefore, ask the programmer to convince the type checker that every pointer dereference is valid.
 \CFA imposes the lighter-weight obligation, with the more limited guarantee, that initially-declared bounds are respected thereafter.
 …
 The \CFA array, applied to accordion structures [TOD: cross-reference] \emph{implies} the necessary pointer arithmetic, generated automatically, and not appearing at all in a user's program.
+\subsection{Safety in a padded room}
+Java's array [TODO:cite] is a straightforward example of assuring safety against undefined behaviour, at a cost of expressiveness for more applied properties.
+Consider the array parameter declarations in:
+\subsection{Java}
+Java arrays are arrays-of-arrays because all objects are references \see{\VRef{toc:mdimpl}}.
+For each array, Java implicitly storages the array dimension in a descriptor, supporting array length, subscript checking, and allowing dynamically-sized array-parameter declarations.
+\begin{cquote}
 \begin{tabular}{rl}
 C      &  @void f( size_t n, size_t m, float x[n][m] );@ \\
 Java   &  @void f( float[][] a );@
+Java   &  @void f( float x[][] );@
 \end{tabular}
+Java's safety against undefined behaviour assures the callee that, if @x@ is non-null, then @a.length@ is a valid access (say, evaluating to the number $\ell$) and if @i@ is in $[0, \ell)$ then @x[i]@ is a valid access.
+If a value of @i@ outside this range is used, a runtime error is guaranteed.
+In these respects, C offers no guarantees at all.
+Notably, the suggestion that @n@ is the intended size of the first dimension of @x@ is documentation only.
+Indeed, many might prefer the technically equivalent declarations @float x[][m]@ or @float (*a)[m]@ as emphasizing the ``no guarantees'' nature of an infrequently used language feature, over using the opportunity to explain a programmer intention.
+Moreover, even if @x[0][0]@ is valid for the purpose intended, C's basic infamous feature is the possibility of an @i@, such that @x[i][0]@ is not valid for the same purpose, and yet, its evaluation does not produce an error.
+Java's lack of expressiveness for more applied properties means these outcomes are possible:
+\begin{itemize}
+\item @x[0][17]@ and @x[2][17]@ are valid accesses, yet @x[1][17]@ is a runtime error, because @x[1]@ is a null pointer
+\item the same observation, now because @x[1]@ refers to an array of length 5
+\item execution times vary, because the @float@ values within @x@ are sometimes stored nearly contiguously, and other times, not at all
+\end{itemize}
+C's array has none of these limitations, nor do any of the ``array language'' comparators discussed in this section.
+This Java level of safety and expressiveness is also exemplified in the C family, with the commonly given advice [TODO:cite example], for C++ programmers to use @std::vector@ in place of the C++ language's array, which is essentially the C array.
+The advice is that, while a vector is also more powerful (and quirky) than an array, its capabilities include options to preallocate with an upfront size, to use an available bound-checked accessor (@a.at(i)@ in place of @x[i]@), to avoid using @push_back@, and to use a vector of vectors.
+Used with these restrictions, out-of-bound accesses are stopped, and in-bound accesses never exercise the vector's ability to grow, which is to say, they never make the program slow to reallocate and copy, and they never invalidate the program's other references to the contained values.
+Allowing this scheme the same referential integrity assumption that \CFA enjoys [TODO:xref], this scheme matches Java's safety and expressiveness exactly.
+[TODO: decide about going deeper; some of the Java expressiveness concerns have mitigations, up to even more tradeoffs.]
+\end{cquote}
+However, in the C prototype, the parameters @n@ and @m@  are documentation only as the intended size of the first and second dimension of @x@.
+\VRef[Figure]{f:JavaVsCTriangularMatrix} compares a triangular matrix (array-of-arrays) in dynamically safe Java to unsafe C.
+Each dynamically sized row in Java stores its dimension, while C requires the programmer to manage these sizes explicitly (@rlnth@).
+All subscripting is Java has bounds checking, while C has none.
+Both Java and C require explicit null checking, otherwise there is a runtime failure.
+\begin{figure}
+\setlength{\tabcolsep}{15pt}
+\begin{tabular}{ll@{}}
+\begin{java}
+int m[][] = {  // triangular matrix
+        new int [4],
+        new int [3],
+        new int [2],
+        new int [1],
+        null
+};
+for ( int r = 0; r < m.length; r += 1 ) {
+        if ( m[r] == null ) continue;
+        for ( int c = 0; c < m[r].length; c += 1 ) {
+                m[r][c] = c + r; // subscript checking
+        }
+}
+for ( int r = 0; r < m.length; r += 1 ) {
+        if ( m[r] == null ) {
+                System.out.println( "null row" );
+                continue;
+        }
+        for ( int c = 0; c < m[r].length; c += 1 ) {
+                System.out.print( m[r][c] + " " );
+        }
+        System.out.println();
+}
+\end{java}
+&
+\begin{cfa}
+int * m[5] = {  // triangular matrix
+        calloc( 4, sizeof(int) ),
+        calloc( 3, sizeof(int) ),
+        calloc( 2, sizeof(int) ),
+        calloc( 1, sizeof(int) ),
+        NULL
+};
+int rlnth = 4;
+for ( int r = 0; r < 5; r += 1 ) {
+        if ( m[r] == NULL ) continue;
+        for ( int c = 0; c < rlnth; c += 1 ) {
+                m[r][c] = c + r; // no subscript checking
+        }
+        rlnth -= 1;
+}
+rlnth = 4;
+for ( int r = 0; r < 5; r += 1 ) {
+        if ( m[r] == NULL ) {
+                printf( "null row\n" );
+                continue;
+        }
+        for ( int c = 0; c < rlnth; c += 1 ) {
+                printf( "%d ", m[r][c] );
+        }
+        printf( "\n" );
+        rlnth -= 1;
+}
+\end{cfa}
+\end{tabular}
+\caption{Java (left) \vs C (right) Triangular Matrix}
+\label{f:JavaVsCTriangularMatrix}
+\end{figure}
+The downside of the arrays-of-arrays approach is performance due to pointer chasing versus pointer arithmetic for a contiguous arrays.
+Furthermore, there is the cost of managing the implicit array descriptor.
+It is unlikely that a JIT can dynamically rewrite an arrays-of-arrays form into a contiguous form.
+\subsection{\CC}
+Because C arrays are difficult and dangerous, the mantra for \CC programmers is to use @std::vector@ in place of the C array.
+While the vector size can grow and shrink dynamically, \vs a fixed-size dynamic size with VLAs, the cost of this extra feature is mitigated by preallocating the maximum size (like the VLA) at the declaration (one dynamic call) to avoid using @push_back@.
+\begin{c++}
+vector< vector< int > > m( 5, vector<int>(8) ); // initialize size of 5 x 8 with 6 dynamic allocations
+\end{c++}
+Multidimensional arrays are arrays-of-arrays with associated costs.
+Each @vector@ array has an array descriptor contain the dimension, which allows bound checked using @x.at(i)@ in place of @x[i]@.
+Used with these restrictions, out-of-bound accesses are caught, and in-bound accesses never exercise the vector's ability to grow, preventing costly reallocate and copy, and never invalidate references to contained values.
+This scheme matches Java's safety and expressiveness exactly, but with the inherent costs.
 \subsection{Levels of dependently typed arrays}
 …
 it can also do these other cool checks, but watch how I can mess with its conservativeness and termination
 Two current, state-of-the-art array languages, Dex\cite{arr:dex:long} and Futhark\cite{arr:futhark:tytheory}, offer offer novel contributions concerning similar, restricted dependent types for tracking array length.
+Two current, state-of-the-art array languages, Dex\cite{arr:dex:long} and Futhark\cite{arr:futhark:tytheory}, offer novel contributions concerning similar, restricted dependent types for tracking array length.
 Unlike \CFA, both are garbage-collected functional languages.
 Because they are garbage-collected, referential integrity is built-in, meaning that the heavyweight analysis, that \CFA aims to avoid, is unnecessary.
 …
 [TODO: introduce Ada in the comparators]
 In Ada and Dex, an array is conceived as a function whose domain must satisfy only certain structural assumptions, while in C, C++, Java, Futhark and \CFA today, the domain is a prefix of the natural numbers.
+In Ada and Dex, an array is conceived as a function whose domain must satisfy only certain structural assumptions, while in C, \CC, Java, Futhark and \CFA today, the domain is a prefix of the natural numbers.
 The generality has obvious aesthetic benefits for programmers working on scheduling resources to weekdays, and for programmers who prefer to count from an initial number of their own choosing.

doc/theses/mike_brooks_MMath/programs/hello-accordion.cfa

-                      r38e20a80
+                      rce02877
 forall( T, @[NprovTerty]@, @[Nmunicipalities]@ )
 struct CanadaPop {
+struct CanPop {
         array( T, @NprovTerty@ ) provTerty; $\C{// nested VLA}$
         array( T, @Nmunicipalities@ ) municipalities; $\C{// nested VLA}$
 …
 forall( T, [NprovTerty], [Nmunicipalities] )
         void ?{}( T &, CanadaPop( T, NprovTerty, Nmunicipalities ) & this ) {}
+        void ?{}( T &, CanPop( T, NprovTerty, Nmunicipalities ) & this ) {}
 forall( T &, [NprovTerty], [Nmunicipalities] )
         void ^?{}( CanadaPop( T, NprovTerty, Nmunicipalities ) & this ) {}
+        void ^?{}( CanPop( T, NprovTerty, Nmunicipalities ) & this ) {}
 …
 forall( T, [NprovTerty], [Nmunicipalities] )
 void check( CanadaPop( T, NprovTerty, Nmunicipalities ) & pop ) with( pop ) {
+void check( CanPop( T, NprovTerty, Nmunicipalities ) & pop ) with( pop ) {
         total_pt = total_mun = 0;
         for ( i; NprovTerty ) total_pt += provTerty[i];
 …
 int main( int argc, char * argv[] ) {
         const int npt = ato( argv[1] ), nmun = ato( argv[2] );
         @CanadaPop( int, npt, nmun ) pop;@
+        @CanPop( int, npt, nmun ) pop;@
         // read in population numbers
         @check( pop );@
 …
 Total province/territory: 36,991,981
 Total municipalities: 36,991,981
+$\$$ ./a.out  13  3654
+Total province/territory: 36,991,981
+Total municipalities: 36,991,981
 */

doc/theses/mike_brooks_MMath/programs/hello-array.cfa

-                      r38e20a80
+                      rce02877
 forall( [@N@] )                                                         $\C{// array dimension}$
 array( bool, @N@) & f( array( float, @N@ ) & x, array( float, @N@ ) & y ) {
+array( bool, @N@ ) & f( array( float, @N@ ) & x, array( float, @N@ ) & y ) {
         array( bool, @N@ ) & ret = *@alloc@();  $\C{// sizeof ret  used by alloc}$
         for ( i; @N@ ) {
 …
 int main( int argc, char * argv[] ) {
         const int @n@ = ato( argv[1] );                 $\C{// deduce conversion type}$
         array( float, @n@ ) x, y;                               $\C{// VLAs}$
         for ( i; n ) {                                                  $\C{// initialize arrays}$
+        const int @dim@ = ato( argv[1] );               $\C{// deduce conversion type}$
+        array( float, @dim@ ) x, y;                             $\C{// VLAs}$
+        for ( i; dim ) {                                                $\C{// initialize arrays}$
                 x[i] = 3.14 / (i + 1);
                 y[i] = x[i] + 0.005 ;
+        }
         array( bool, @n@ ) & result = @f( x, y )@; $\C{// call}$
+        array( bool, @dim@ ) & result = @f( x, y )@; $\C{// call}$
         sout | "result: " | nonl;                               $\C{// print result}$
         for ( i; n )
+        for ( i; dim )
                 sout | result[i] | nonl;
         sout | nl;

doc/theses/mike_brooks_MMath/programs/hello-md.cfa

r38e20a80	rce02877
138	138
139	139
140		print1d_cstyle( ~~m[ 2 ]~~ ); $\C{// row 2: 2.0 2.1 2.2 2.3 2.4 2.5 2.6}$
	140	print1d_cstyle( @m[ 2 ]@ ); $\C{// row 2: 2.0 2.1 2.2 2.3 2.4 2.5 2.6}$
141	141
142	142

doc/theses/mike_brooks_MMath/uw-ethesis-frontpgs.tex

r38e20a80	rce02877
140	140
141	141	I would like to thank all the little people who made this thesis possible.
	142
	143	Finally, a special thank you to Huawei Canada for funding this work.
142	144	\cleardoublepage
143	145	\phantomsection % allows hyperref to link to the correct page

doc/theses/mike_brooks_MMath/uw-ethesis.tex

r38e20a80	rce02877
105	105	\lstnewenvironment{c++}[1][]{\lstset{language=[GNU]C++,escapechar=\$,moredelim=**[is][\color{red}]{@}{@},}\lstset{#1}}{}
106	106	\lstnewenvironment{pascal}[1][]{\lstset{language=pascal,escapechar=\$,moredelim=**[is][\color{red}]{@}{@},}\lstset{#1}}{}
	107	\lstnewenvironment{java}[1][]{\lstset{language=java,escapechar=\$,moredelim=**[is][\color{red}]{@}{@},}\lstset{#1}}{}
107	108	\lstset{inputpath={programs}}
108	109

doc/user/user.tex

-                      r38e20a80
+                      rce02877
 %% Created On       : Wed Apr  6 14:53:29 2016
 %% Last Modified By : Peter A. Buhr
 %% Last Modified On : Tue Jul  9 10:43:40 2024
 %% Update Count     : 6887
+%% Last Modified On : Fri Jul 26 06:56:11 2024
+%% Update Count     : 6955
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 …
 and implicitly opened \emph{after} a function-body open, to give them higher priority:
 \begin{cfa}
 void f( S & s, char ®c® ) with ( s ) ®with( §\emph{\R{params}}§ )® { // syntax not allowed, illustration only
+void f( S & s, char ®c® ) with ( s ) ®with( §\emph{\R{params}}§ )® { // syntax disallowed, illustration only
         s.c = ®c;®  i = 3;  d = 5.5;
+}
 …
 for example, the following is incorrect:
 \begin{cfa}
 * [ int x ] f () fp; §\C{// routine name "f" is not allowed}§
 \end{cfa}
 \section{Named and Default Arguments}
 Named\index{named arguments}\index{arguments!named} and default\index{default arguments}\index{arguments!default} arguments~\cite{Hardgrave76}\footnote{
+* [ int x ] f () fp; §\C{// routine name "f" is disallowed}§
+\end{cfa}
+\section{Default and Named Parameter}
+Default\index{default parameter}\index{parameter!default} and named\index{named parameter}\index{parameter!named} parameters~\cite{Hardgrave76}\footnote{
 Francez~\cite{Francez77} proposed a further extension to the named-parameter passing style, which specifies what type of communication (by value, by reference, by name) the argument is passed to the routine.}
 are two mechanisms to simplify routine call.
+Both mechanisms are discussed with respect to \CFA.
+\begin{description}
+\item[Named (or Keyword) Arguments:]
+provide the ability to specify an argument to a routine call using the parameter name rather than the position of the parameter.
+For example, given the routine:
+\begin{cfa}
+void p( int x, int y, int z ) {...}
+\end{cfa}
+a positional call is:
+\begin{cfa}
+p( 4, 7, 3 );
+\end{cfa}
+whereas a named (keyword) call may be:
+\begin{cfa}
+p( z : 3, x : 4, y : 7 );  §\C{// rewrite \(\Rightarrow\) p( 4, 7, 3 )}§
+\end{cfa}
+Here the order of the arguments is unimportant, and the names of the parameters are used to associate argument values with the corresponding parameters.
+The compiler rewrites a named call into a positional call.
+The advantages of named parameters are:
+\begin{itemize}
+\item
+Remembering the names of the parameters may be easier than the order in the routine definition.
+\item
+Parameter names provide documentation at the call site (assuming the names are descriptive).
+\item
+Changes can be made to the order or number of parameters without affecting the call (although the call must still be recompiled).
+\end{itemize}
+Unfortunately, named arguments do not work in C-style programming-languages because a routine prototype is not required to specify parameter names, nor do the names in the prototype have to match with the actual definition.
+For example, the following routine prototypes and definition are all valid.
+\begin{cfa}
+void p( int, int, int ); §\C{// equivalent prototypes}§
+void p( int x, int y, int z );
+void p( int y, int x, int z );
+void p( int z, int y, int x );
+void p( int q, int r, int s ) {} §\C{// match with this definition}§
+\end{cfa}
+Forcing matching parameter names in routine prototypes with corresponding routine definitions is possible, but goes against a strong tradition in C programming.
+Alternatively, prototype definitions can be eliminated by using a two-pass compilation, and implicitly creating header files for exports.
+The former is easy to do, while the latter is more complex.
+Furthermore, named arguments do not work well in a \CFA-style programming-languages because they potentially introduces a new criteria for type matching.
+For example, it is technically possible to disambiguate between these two overloaded definitions of ©f© based on named arguments at the call site:
+\begin{cfa}
+int f( int i, int j );
+int f( int x, double y );
+f( j : 3, i : 4 ); §\C{// 1st f}§
+f( x : 7, y : 8.1 ); §\C{// 2nd f}§
+f( 4, 5 );  §\C{// ambiguous call}§
+\end{cfa}
+However, named arguments compound routine resolution in conjunction with conversions:
+\begin{cfa}
+f( i : 3, 5.7 ); §\C{// ambiguous call ?}§
+\end{cfa}
+Depending on the cost associated with named arguments, this call could be resolvable or ambiguous.
+Adding named argument into the routine resolution algorithm does not seem worth the complexity.
+Therefore, \CFA does \emph{not} attempt to support named arguments.
+\item[Default Arguments]
+provide the ability to associate a default value with a parameter so it can be optionally specified in the argument list.
+For example, given the routine:
+\begin{cfa}
+void p( int x = 1, int y = 2, int z = 3 ) {...}
+\end{cfa}
+the allowable positional calls are:
+\begin{cfa}
+p(); §\C{// rewrite \(\Rightarrow\) p( 1, 2, 3 )}§
+p( 4 ); §\C{// rewrite \(\Rightarrow\) p( 4, 2, 3 )}§
+p( 4, 4 ); §\C{// rewrite \(\Rightarrow\) p( 4, 4, 3 )}§
+p( 4, 4, 4 ); §\C{// rewrite \(\Rightarrow\) p( 4, 4, 4 )}§
+// empty arguments
+p(  , 4, 4 ); §\C{// rewrite \(\Rightarrow\) p( 1, 4, 4 )}§
+p( 4,  , 4 ); §\C{// rewrite \(\Rightarrow\) p( 4, 2, 4 )}§
+p( 4, 4,   ); §\C{// rewrite \(\Rightarrow\) p( 4, 4, 3 )}§
+p( 4,  ,   ); §\C{// rewrite \(\Rightarrow\) p( 4, 2, 3 )}§
+p(  , 4,   ); §\C{// rewrite \(\Rightarrow\) p( 1, 4, 3 )}§
+p(  ,  , 4 ); §\C{// rewrite \(\Rightarrow\) p( 1, 2, 4 )}§
+p(  ,  ,   ); §\C{// rewrite \(\Rightarrow\) p( 1, 2, 3 )}§
+\end{cfa}
+\subsection{Default}
+A default parameter associates a default value with a parameter so it can be optionally specified in the argument list.
+For example, given the routine prototype:
+\begin{cfa}
+void f( int x ®= 1®, int y ®= 2®, int z ®= 3® );
+\end{cfa}
+allowable calls are:
+\begin{cquote}
+\setlength{\tabcolsep}{0.75in}
+\begin{tabular}{@{}ll@{}}
+\textbf{positional arguments} & \textbf{empty arguments} \\
+\begin{cfa}
+f();                    §\C[0.75in]{// rewrite \(\Rightarrow\) f( 1, 2, 3 )}§
+f( 4 );                 §\C{// rewrite \(\Rightarrow\) f( 4, 2, 3 )}§
+f( 4, 4 );              §\C{// rewrite \(\Rightarrow\) f( 4, 4, 3 )}§
+f( 4, 4, 4 );   §\C{// rewrite \(\Rightarrow\) f( 4, 4, 4 )}\CRT§
+\end{cfa}
+&
+\begin{cfa}
+f( ®?®, 4, 4 );         §\C[1.0in]{// rewrite \(\Rightarrow\) f( 1, 4, 4 )}§
+f( 4, ®?®, 4 );         §\C{// rewrite \(\Rightarrow\) f( 4, 2, 4 )}§
+f( 4, 4, ®?® );         §\C{// rewrite \(\Rightarrow\) f( 4, 4, 3 )}§
+f( 4, ®?®, ®?® );       §\C{// rewrite \(\Rightarrow\) f( 4, 2, 3 )}§
+f( ®?®, 4, ®?® );       §\C{// rewrite \(\Rightarrow\) f( 1, 4, 3 )}§
+f( ®?®, ®?®, 4 );       §\C{// rewrite \(\Rightarrow\) f( 1, 2, 4 )}§
+f( ®?®, ®?®, ®?® );     §\C{// rewrite \(\Rightarrow\) f( 1, 2, 3 )}\CRT§
+\end{cfa}
+\end{tabular}
+\end{cquote}
+where the ©?© selects the default value as the argument.
 Here the missing arguments are inserted from the default values in the parameter list.
 The compiler rewrites missing default values into explicit positional arguments.
 …
 \item
 Routines with a large number of parameters are often very generalized, giving a programmer a number of different options on how a computation is performed.
 For many of these kinds of routines, there are standard or default settings that work for the majority of computations.
+For many of these routines, there are standard or default settings that work for the majority of computations.
 Without default values for parameters, a programmer is forced to specify these common values all the time, resulting in long argument lists that are error prone.
 \item
 …
 Instead, a default value is used, which may not be the programmer's intent.
 Default values may only appear in a prototype versus definition context:
 \begin{cfa}
 void p( int x, int y = 2, int z = 3 ); §\C{// prototype: allowed}§
 void p( int, int = 2, int = 3 ); §\C{// prototype: allowed}§
 void p( int x, int y = 2, int z = 3 ) {} §\C{// definition: not allowed}§
+Default parameters may only appear in a prototype versus definition context:
+\begin{cfa}
+void f( int x, int y = 2, int z = 3 );  §\C{// prototype: allowed}§
+void f( int, int = 2, int = 3 );                §\C{// prototype: allowed}§
+void f( int x, int y = 2, int z = 3 ) ®{}® §\C{// definition: disallowed}§
 \end{cfa}
 The reason for this restriction is to allow separate compilation.
+Multiple prototypes with different default values is an error.
+\end{description}
+Ellipse (``...'') arguments present problems when used with default arguments.
+The conflict occurs because both named and ellipse arguments must appear after positional arguments, giving two possibilities:
+\begin{cfa}
+p( /* positional */, ... , /* named */ );
+p( /* positional */, /* named */, ... );
+\end{cfa}
+While it is possible to implement both approaches, the first possibly is more complex than the second, \eg:
+\begin{cfa}
+p( int x, int y, int z, ... );
+p( 1, 4, 5, 6, z : 3, y : 2 ); §\C{// assume p( /* positional */, ... , /* named */ );}§
+p( 1, z : 3, y : 2, 4, 5, 6 ); §\C{// assume p( /* positional */, /* named */, ... );}§
+\end{cfa}
+In the first call, it is necessary for the programmer to conceptually rewrite the call, changing named arguments into positional, before knowing where the ellipse arguments begin.
+Hence, this approach seems significantly more difficult, and hence, confusing and error prone.
+In the second call, the named arguments separate the positional and ellipse arguments, making it trivial to read the call.
+The problem is exacerbated with default arguments, \eg:
+\begin{cfa}
+void p( int x, int y = 2, int z = 3... );
+p( 1, 4, 5, 6, z : 3 ); §\C{// assume p( /* positional */, ... , /* named */ );}§
+p( 1, z : 3, 4, 5, 6 ); §\C{// assume p( /* positional */, /* named */, ... );}§
+\end{cfa}
+The first call is an error because arguments 4 and 5 are actually positional not ellipse arguments;
+therefore, argument 5 subsequently conflicts with the named argument z : 3.
+In the second call, the default value for y is implicitly inserted after argument 1 and the named arguments separate the positional and ellipse arguments, making it trivial to read the call.
+For these reasons, \CFA requires named arguments before ellipse arguments.
+Finally, while ellipse arguments are needed for a small set of existing C routines, like ©printf©, the extended \CFA type system largely eliminates the need for ellipse arguments \see{\VRef{s:Overloading}}, making much of this discussion moot.
+Multiple prototypes with different default values is undefined.
 Default arguments and overloading \see{\VRef{s:Overloading}} are complementary.
 …
 \multicolumn{1}{c@{\hspace{3em}}}{\textbf{default arguments}}   & \multicolumn{1}{c}{\textbf{overloading}}      \\
 \begin{cfa}
 void p( int x, int y = 2, int z = 3 ) {...}
+void f( int x, int y = 2, int z = 3 ) {...}
 …
+&
 \begin{cfa}
 void p( int x, int y, int z ) {...}
 void p( int x ) { p( x, 2, 3 ); }
 void p( int x, int y ) { p( x, y, 3 ); }
+void f( int x, int y, int z ) {...}
+void f( int x ) { f( x, 2, 3 ); }
+void f( int x, int y ) { f( x, y, 3 ); }
 \end{cfa}
 \end{tabular}
 \end{cquote}
 the number of required overloaded routines is linear in the number of default values, which is unacceptable growth.
+In general, overloading should only be used over default arguments if the body of the routine is significantly different.
+Furthermore, overloading cannot handle accessing default arguments in the middle of a positional list, via a missing argument, such as:
+\begin{cfa}
+p( 1, /* default */, 5 ); §\C{// rewrite \(\Rightarrow\) p( 1, 2, 5 )}§
+\end{cfa}
+Given the \CFA restrictions above, both named and default arguments are backwards compatible.
+\Index*[C++]{\CC{}} only supports default arguments;
+\Index*{Ada} supports both named and default arguments.
+In general, overloading is used over default parameters, if the body of the routine is significantly different.
+Furthermore, overloading cannot handle accessing default arguments in the middle of a positional list.
+\begin{cfa}
+f( 1, ®?®, 5 );                                                 §\C{// rewrite \(\Rightarrow\) f( 1, 2, 5 )}§
+\end{cfa}
+\subsection{Named (or Keyword)}
+A named (keyword) parameter provides the ability to specify an argument to a routine call using the parameter name rather than the position of the parameter.
+For example, given the routine prototype:
+\begin{cfa}
+void f( int ®?®x, int ®?®y, int ®?®z );
+\end{cfa}
+allowable calls are:
+\begin{cfa}
+f( ?x = 3, ?y = 4, ?z = 5 );                    §\C{// rewrite \(\Rightarrow\) f( 3, 4, 5 )}§
+f( ?y = 4, ?z = 5, ?x = 3 );                    §\C{// rewrite \(\Rightarrow\) f( 3, 4, 5 )}§
+f( ?z = 5, ?x = 3, ?y = 4 );                    §\C{// rewrite \(\Rightarrow\) f( 3, 4, 5 )}§
+f( ?x = 3, ?z = 5, ?y = 4 );                    §\C{// rewrite \(\Rightarrow\) f( 3, 4, 5 )}§
+\end{cfa}
+Here the ordering of the the parameters and arguments is unimportant, and the names of the parameters are used to associate argument values with the corresponding parameters.
+The compiler rewrites a named call into a positional call.
+Note, the syntax ©?x = 3© is necessary for the argument, because ©x = 3© has an existing meaning, \ie assign ©3© to ©x© and pass the value of ©x©.
+The advantages of named parameters are:
+\begin{itemize}
+\item
+Remembering the names of the parameters may be easier than the order in the routine definition.
+\item
+Parameter names provide documentation at the call site (assuming the names are descriptive).
+\item
+Changes can be made to the order or number of parameters without affecting the call (although the call must still be recompiled).
+\end{itemize}
+Named parameters may only appear in a prototype versus definition context:
+\begin{cfa}
+void f( int  x, int ?y, int ?z );               §\C{// prototype: allowed}§
+void f( int ?x, int , int ?z );                 §\C{// prototype: allowed}§
+void f( int x, int ?y, int ?z ) ®{}®    §\C{// definition: disallowed}§
+\end{cfa}
+The reason for this restriction is to allow separate compilation.
+Multiple prototypes with different positional parameter names is an error.
+The named parameter is not part of type resolution;
+only the type of the expression assigned to the named parameter affects type resolution.
+\begin{cfa}
+int f( int ?i, int ?j );
+int f( int ?i, double ?j );
+f( ?j = 3, ?i = 4 );                                    §\C{// 1st f}§
+f( ?i = 7, ?j = 8.1 );                                  §\C{// 2nd f}§
+\end{cfa}
+\subsection{Mixed Default/Named}
+Default and named parameters can be intermixed and named parameters can have a default value.
+For example, given the routine prototype:
+\begin{cfa}
+void f( int x, int y ®= 1®, int ®?®z ®= 2® );
+\end{cfa}
+allowable calls are:
+\begin{cfa}
+f( 3 );                                                                 §\C{// rewrite \(\Rightarrow\) f( 3, 1, 2 )}§
+f( 3, 4 );                                                              §\C{// rewrite \(\Rightarrow\) f( 3, 4, 2 )}§
+f( 3, ?z = 5 );                                                 §\C{// rewrite \(\Rightarrow\) f( 3, 1, 5 )}§
+f( 3, 4, ?z = 5 );                                              §\C{// rewrite \(\Rightarrow\) f( 3, 4, 5 )}§
+f( ?z = 5, 3 );                                                 §\C{// rewrite \(\Rightarrow\) f( 3, 1, 5 )}§
+f( 3, ?z = 5, 4 );                                              §\C{// rewrite \(\Rightarrow\) f( 3, 4, 5 )}§
+\end{cfa}
+Finally, the ellipse (``...'') parameter must appear after positional and named parameters in a routine prototype.
+\begin{cfa}
+void f( int i = 1, int ?j = 2, ®...® );
+\end{cfa}
+\CFA named and default arguments are backwards compatible with C.
+\Index*[C++]{\CC{}} only supports default parameters;
+\Index*{Ada} supports both named and default parameters.

Note: See TracChangeset for help on using the changeset viewer.

Download in other formats: