Ignore:
Timestamp:
Jul 29, 2024, 9:49:07 AM (4 months ago)
Author:
Peter A. Buhr <pabuhr@…>
Branches:
master
Children:
b67d7a5b
Parents:
b8e047a
Message:

proofread array chapter and update example programs

Location:
doc/theses/mike_brooks_MMath
Files:
4 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/mike_brooks_MMath/array.tex

    rb8e047a rcaa3e2c  
    66
    77Arrays in C are possible the single most misunderstood and incorrectly used features in the language, resulting in the largest proportion of runtime errors and security violations.
    8 This chapter describes the new \CFA language and library features that introduce a length-checked array-type to the \CFA standard library~\cite{Cforall}, \eg:
    9 \begin{cfa}
    10 @array( float, 99 )@ x;                                 $\C{// x contains 99 floats}$
     8This chapter describes the new \CFA language and library features that introduce a length-checked array-type to the \CFA standard library~\cite{Cforall}.
     9
     10Specifically, a new \CFA array is declared:
     11\begin{cfa}
     12@array( float, 99 )@ x;                                 $\C[2.75in]{// x contains 99 floats}$
     13\end{cfa}
     14using generic type @array@ with arguments @float@ and @99@.
     15A function @f@ is declared with an @array@ parameter of length @42@.
     16\begin{cfa}
    1117void f( @array( float, 42 )@ & p ) {}   $\C{// p accepts 42 floats}$
    1218f( x );                                                                 $\C{// statically rejected: types are different, 99 != 42}$
    1319
    14 forall( T, [N] )
    15 void g( @array( T, N )@ & p, int i ) {
     20test2.cfa:3:1 error: Invalid application of existing declaration(s) in expression.
     21Applying untyped:  Name: f ... to:  Name: x
     22\end{cfa}
     23The call @f( x )@ is invalid because the @array@ lengths @99@ and @42@ do not match.
     24
     25Next, function @g@ introduces a @forall@ prefix on type parameter @T@ and arbitrary \emph{dimension parameter} @N@, the new feature that represents a count of elements managed by the type system.
     26\begin{cfa}
     27forall( T, @[N]@ )
     28void g( array( T, @N@ ) & p, int i ) {
    1629        T elem = p[i];                                          $\C{// dynamically checked: requires 0 <= i < N}$
    1730}
    1831g( x, 0 );                                                              $\C{// T is float, N is 99, dynamic subscript check succeeds}$
    19 g( x, 1000 );                                                   $\C{// T is float, N is 99, dynamic subscript check fails}$
    20 \end{cfa}
    21 This example declares variable @x@, with generic type @array@ using arguments @float@ and @99@.
    22 Function @f@ is declared with an @array@ parameter of length @42@.
    23 The call @f( x )@ is invalid because the @array@ lengths @99@ and @42@ do not match.
    24 Next, function @g@ introduces a @forall@ prefix on type parameter @T@ and arbitrary \emph{dimension parameter} @N@, the new feature that represents a count of elements managed by the type system.
     32g( x, 1000 );                                                   $\C{// T is float, N is 99, dynamic subscript check fails}\CRT$
     33
     34Cforall Runtime error: subscript 1000 exceeds dimension range [0,99) $for$ array 0x555555558020.
     35\end{cfa}
    2536The call @g( x, 0 )@ is valid because @g@ accepts any length of array, where the type system infers @float@ for @T@ and length @99@ for @N@.
    2637Inferring values for @T@ and @N@ is implicit without programmer involvement.
     
    3546forall( [N] )
    3647void declDemo() {
    37         float x1[N];                            $\C{// built-in type ("C array")}$
    38         array(float, N) x2;                     $\C{// type from library}$
     48        float x1[N];                                            $\C{// built-in type ("C array")}$
     49        array(float, N) x2;                                     $\C{// type from library}$
    3950}
    4051\end{cfa}
    4152Both of the locally-declared array variables, @x1@ and @x2@, have 42 elements, each element being a @float@.
    4253The two variables have identical size and layout; they both encapsulate 42-float, stack \vs heap allocations with no additional ``bookkeeping'' allocations or headers.
    43 Providing this explicit generic approach required a significant extension to the \CFA type system to support a full-feature, safe, efficient (space and time) array-type, which forms the foundation for more complex array forms in \CFA.
     54Providing this explicit generic approach requires a significant extension to the \CFA type system to support a full-feature, safe, efficient (space and time) array-type, which forms the foundation for more complex array forms in \CFA.
    4455
    4556Admittedly, the @array@ library type (type for @x2@) is syntactically different from its C counterpart.
    4657A future goal (TODO xref) is to provide a built-in array type with syntax approaching C's (type for @x1@);
    4758then, the library @array@ type can be removed giving \CFA a largely uniform array type.
    48 At present, the built-in array is only partially supported, so the generic @array@ is used exclusively in the discussion;
     59At present, the C syntax @array@ is only partially supported, so the generic @array@ is used exclusively in the discussion;
    4960feature support and C compatibility are revisited in Section ? TODO.
    5061
    51 Offering an @array@ type, as a distinct alternative to the C array, is consistent with \CFA's goal of backwards compatibility, \ie virtually all existing C (gcc) programs can be compiled by \CFA with only a small number of changes, similar to \CC (g++).
     62Offering the @array@ type, as a distinct alternative to the C array, is consistent with \CFA's goal of backwards compatibility, \ie virtually all existing C (@gcc@) programs can be compiled by \CFA with only a small number of changes, similar to \CC (@g++@).
    5263However, a few compatibility-breaking changes to the behaviour of the C array are necessary, both as an implementation convenience and to fix C's lax treatment of arrays.
    5364Hence, the @array@ type is an opportunity to start from a clean slate and show a cohesive selection of features, making it unnecessary to deal with every inherited complexity introduced by the C array TODO xref.
    5465
    55 My contributions are:
     66My contributions in this chapter are:
    5667\begin{enumerate}
    5768\item A type system enhancement that lets polymorphic functions and generic types be parameterized by a numeric value: @forall( [N] )@.
     
    100111\end{figure}
    101112
    102 \VRef[Figure]{f:fHarness} shows a harness that uses the @f@ function illustrating how dynamic values are fed into the @array@ type.
    103 Here, the dimension of the @x@, @y@, and @result@ arrays is specified from a command-line value and these arrays are allocated on the stack.
     113\VRef[Figure]{f:fHarness} shows a harness that uses function @f@ to illustrate how dynamic values are fed into the @array@ type.
     114Here, the dimension of arrays @x@, @y@, and @result@ is specified from a command-line value, @dim@, and these arrays are allocated on the stack.
    104115Then the @x@ array is initialized with decreasing values, and the @y@ array with amounts offset by constant @0.005@, giving relative differences within tolerance initially and diverging for later values.
    105116The program main is run (see figure bottom) with inputs @5@ and @7@ for sequence lengths.
    106 The loops follow the familiar pattern of using the variable @n@ to iterate through the arrays.
    107 Most importantly, the type system implicitly captures @n@ at the call of @f@ and makes it available throughout @f@ as @N@.
    108 The example shows @n@ adapting into a type-system managed length at the declarations of @x@, @y@, and @result@, @N@ adapting in the same way at @f@'s loop bound, and a pass-thru use of @n@ at @f@'s declaration of @ret@.
     117The loops follow the familiar pattern of using the variable @dim@ to iterate through the arrays.
     118Most importantly, the type system implicitly captures @dim@ at the call of @f@ and makes it available throughout @f@ as @N@.
     119The example shows @dim@ adapting into a type-system managed length at the declarations of @x@, @y@, and @result@, @N@ adapting in the same way at @f@'s loop bound, and a pass-thru use of @dim@ at @f@'s declaration of @ret@.
    109120Except for the lifetime-management issue of @result@, \ie explicit @free@, this program has eliminated both the syntactic and semantic problems associated with C arrays and their usage.
    110121These benefits cannot be underestimated.
     
    141152\CC has a (mistaken) belief that references are not objects, but pointers are objects.
    142153In the \CC example, the arrays fall back on C arrays, which have a duality with references with respect to automatic dereferencing.
    143 The \CFA array is a contiguous object with an address, which can stored as a reference or pointer.
     154The \CFA array is a contiguous object with an address, which can be stored as a reference or pointer.
    144155\item
    145156C/\CC arrays cannot be copied, while \CFA arrays can be copied, making them a first-class object (although array copy is often avoided for efficiency).
     
    151162
    152163@template< typename T, size_t N >@
    153 void copy( T ret[N], T x[N] ) {
     164void copy( T ret[@N@], T x[@N@] ) {
    154165        for ( int i = 0; i < N; i += 1 ) ret[i] = x[i];
    155166}
     
    167178int main() {
    168179        @forall( T, [N] )@   // nested function
    169         void copy( array( T, N ) & ret, array( T, N ) & x ) {
     180        void copy( array( T, @N@ ) & ret, array( T, @N@ ) & x ) {
    170181                for ( i; 10 ) ret[i] = x[i];
    171182        }
     
    185196
    186197Continuing the discussion of \VRef[Figure]{f:fHarness}, the example has @f@ expecting two arrays of the same length.
    187 A compile-time error occurs when attempting to call @f@ with arrays of differing lengths.
     198As stated previous, a compile-time error occurs when attempting to call @f@ with arrays of differing lengths.
    188199% removing leading whitespace
    189200\lstinput[tabsize=1]{52-53}{hello-array.cfa}
    190201\lstinput[tabsize=1,aboveskip=0pt]{62-64}{hello-array.cfa}
    191 As is common practice in C, the programmer is free to cast, \ie to assert knowledge not shared with the type system.
     202C allows casting to assert knowledge not shared with the type system.
    192203\lstinput{70-74}{hello-array.cfa}
    193204
     
    197208\lstinput{10-15}{hello-accordion.cfa}
    198209This structure's layout has the starting offset of @municipalities@ varying in @NprovTerty@, and the offset of @total_pt@ and @total_mun@ varying in both generic parameters.
    199 For a function that operates on a @CanadaPop@ structure, the type system handles this variation transparently.
     210For a function that operates on a @CanPop@ structure, the type system handles this variation transparently.
    200211\lstinput{40-45}{hello-accordion.cfa}
    201 \VRef[Figure]{f:checkHarness} shows program results where different offset values being used.
    202 The output values show that @summarize@ and its caller agree on both the offsets (where the callee starts reading @cost_contribs@ and where the callee writes @total_cost@).
    203 Yet the call site just says, ``pass the request.''
     212\VRef[Figure]{f:checkHarness} shows the @CanPop@ harness and results with different array sizes, if the municipalities changed after a census.
    204213
    205214\begin{figure}
    206215\lstinput{60-68}{hello-accordion.cfa}
    207 \lstinput{70-72}{hello-accordion.cfa}
     216\lstinput{70-75}{hello-accordion.cfa}
    208217\caption{\lstinline{check} Harness}
    209218\label{f:checkHarness}
     
    232241In general, storage layout is hidden by subscripting, and only appears when passing arrays among different programming languages or accessing specific hardware.
    233242
    234 \VRef[Figure]{f:FixedVariable} shows two C90 approaches for manipulating contiguous arrays.
     243\VRef[Figure]{f:FixedVariable} shows two C90 approaches for manipulating a contiguous matrix.
    235244Note, C90 does not support VLAs.
    236 The fixed-dimension approach uses the type system;
     245The fixed-dimension approach (left) uses the type system;
    237246however, it requires all dimensions except the first to be specified at compile time, \eg @m[][6]@, allowing all subscripting stride calculations to be generated with constants.
    238247Hence, every matrix passed to @fp1@ must have exactly 6 columns but the row size can vary.
    239 The variable-dimension approach ignores (violates) the type system, \ie argument and parameters types do not match, and manually performs pointer arithmetic for subscripting in the macro @sub@.
     248The variable-dimension approach (right) ignores (violates) the type system, \ie argument and parameters types do not match, and subscripting is performed manually using pointer arithmetic in the macro @sub@.
    240249
    241250\begin{figure}
     
    258267        ...  printf( "%d ", @sub( m, r, c )@ );  ...
    259268}
    260 int vm1[4][4], vm2[6][8]; // no VLA
     269int vm1[@4@][@4@], vm2[@6@][@8@]; // no VLA
    261270// initialize matrixes
    262271fp2( 4, 4, vm1 );
     
    290299The language decides if the matrix and array-of-array are laid out the same or differently.
    291300For example, an array-of-array may be an array of row pointers to arrays of columns, so the rows may not be contiguous in memory nor even the same length (triangular matrix).
    292 Regardless, there is usually a uniform subscripting syntax masking the memory layout, even though the two array forms could be differentiated at the subscript level, \eg @m[1,2]@ \vs @aa[1][2]@.
     301Regardless, there is usually a uniform subscripting syntax masking the memory layout, even though a language could differentiated between the two forms using subscript syntax, \eg @m[1,2]@ \vs @aa[1][2]@.
    293302Nevertheless, controlling memory layout can make a difference in what operations are allowed and in performance (caching/NUMA effects).
    294303
     
    301310The focus of this work is on the contiguous multidimensional arrays in C.
    302311The reason is that programmers are often forced to use the more complex array-of-array form when a contiguous array would be simpler, faster, and safer.
    303 Nevertheless, the C array-of-array form continues to be useful for special circumstances.
     312Nevertheless, the C array-of-array form is still important for special circumstances.
    304313
    305314\VRef[Figure]{f:ContiguousNon-contiguous} shows the extensions made in C99 for manipulating contiguous \vs non-contiguous arrays.\footnote{C90 also supported non-contiguous arrays.}
     
    313322While this contiguous-array capability is a step forward, it is still the programmer's responsibility to manually manage the number of dimensions and their sizes, both at the function definition and call sites.
    314323That is, the array does not automatically carry its structure and sizes for use in computing subscripts.
    315 While the non-contiguous style in @faa@ looks very similar to @fc@, the compiler only understands the unknown-sized array of row pointers, and it relies on the programmer to traverse the columns in a row correctly.
     324While the non-contiguous style in @faa@ looks very similar to @fc@, the compiler only understands the unknown-sized array of row pointers, and it relies on the programmer to traverse the columns in a row correctly with a correctly bounded loop index.
    316325Specifically, there is no requirement that the rows are the same length, like a poem with different length lines.
    317326
     
    365374this model has no awareness of dimensions just the ability to access memory at a distance from a reference point (base-displacement addressing), \eg @x + 23@ or @x[23}@ $\Rightarrow$ 23rd element from the start of @x@.
    366375A programmer must manually build any notion of dimensions using other tools;
    367 hence, this style is not offering multidimensional arrays \see{\VRef[Figure]{f:FixedVariable}}.
     376hence, this style is not offering multidimensional arrays \see{\VRef[Figure]{f:FixedVariable} right example}.
    368377\end{enumerate}
    369378
     
    381390A C/\CFA array interface includes the resulting memory layout.
    382391The defining requirement of a type-2 system is the ability to slice a column from a column-finest matrix.
    383 The required memory shape of such a slice is set, before any discussion of implementation.
     392The required memory shape of such a slice is fixed, before any discussion of implementation.
    384393The implementation presented here is how the \CFA array library wrangles the C type system, to make it do memory steps that are consistent with this layout.
    385394TODO: do I have/need a presentation of just this layout, just the semantics of -[all]?
     
    389398Beyond what C's array type offers, the new array brings direct support for working with a noncontiguous array slice, allowing a program to work with dimension subscripts given in a non-physical order.
    390399
    391 The following examples use an @array( float, 5, 7) m@, loaded with values incremented by $0.1$, when stepping across the length-7 finely-strided column dimension, and stepping across the length-5 coarsely-strided row dimension.
    392 \par\noindent
     400The following examples use the matrix declaration @array( float, 5, 7 ) m@, loaded with values incremented by $0.1$, when stepping across the length-7 finely-strided column dimension, and stepping across the length-5 coarsely-strided row dimension.
     401\par
    393402\mbox{\lstinput{121-126}{hello-md.cfa}}
    394403\par\noindent
     
    405414Specifically, declaring the parameter @r@ with type @array@ means that @r@ is contiguous, which is unnecessarily restrictive.
    406415That is, @r@ need only be of a container type that offers a subscript operator (of type @ptrdiff_t@ $\rightarrow$ @float@) with managed length @N@.
    407 The new-array library provides the trait @ix@, so-defined.
     416The new-array library provides the trait @ar@, so-defined.
    408417With it, the original declaration can be generalized with the same body.
    409418\lstinput{43-44}{hello-md.cfa}
    410419\lstinput[aboveskip=0pt]{145-145}{hello-md.cfa}
    411420The nontrivial slicing in this example now allows passing a \emph{noncontiguous} slice to @print1d@, where the new-array library provides a ``subscript by all'' operation for this purpose.
    412 In a multi-dimensional subscript operation, any dimension given as @all@ is a placeholder, \ie ``not yet subscripted by a value'', waiting for such a value, implementing the @ix@ trait.
     421In a multi-dimensional subscript operation, any dimension given as @all@ is a placeholder, \ie ``not yet subscripted by a value'', waiting for such a value, implementing the @ar@ trait.
    413422\lstinput{150-151}{hello-md.cfa}
    414423
     
    471480In both cases, value 2 selects from the coarser dimension (rows of @x@),
    472481while the value 3 selects from the finer dimension (columns of @x@).
    473 The figure illustrates the value of each subexpression, comparing how numeric subscripting proceeds from @x@, vs from @x[all]@.
     482The figure illustrates the value of each subexpression, comparing how numeric subscripting proceeds from @x@, \vs from @x[all]@.
    474483Proceeding from @x@ gives the numeric indices as coarse then fine, while proceeding from @x[all]@ gives them fine then coarse.
    475484These two starting expressions, which are the example's only multidimensional subexpressions
    476485(those that received zero numeric indices so far), are illustrated with vertical steps where a \emph{first} numeric index would select.
    477486
    478 The figure's presentation offers an intuition answering, What is an atomic element of @x[all]@?
    479 From there, @x[all]@ itself is simply a two-dimensional array, in the strict C sense, of these strange building blocks.
     487The figure's presentation offers an intuition answering to: What is an atomic element of @x[all]@?
     488From there, @x[all]@ itself is simply a two-dimensional array, in the strict C sense, of these building blocks.
    480489An atom (like the bottommost value, @x[all][3][2]@), is the contained value (in the square box)
    481490and a lie about its size (the wedge above it, growing upward).
    482 An array of these atoms (like the intermediate @x[all][3]@) is just a contiguous arrangement of them,
    483 done according to their size, as announced.  Call such an array a column.
    484 A column is almost ready to be arranged into a matrix; it is the \emph{contained value} of the next-level building block,
    485 but another lie about size is required.
    486 At first, an atom needed to be arranged as if it were bigger,
    487 but now a column needs to be arranged as if it is smaller (the wedge above it, shrinking upward).
     491An array of these atoms (like the intermediate @x[all][3]@) is just a contiguous arrangement of them, done according to their size;
     492call such an array a column.
     493A column is almost ready to be arranged into a matrix;
     494it is the \emph{contained value} of the next-level building block, but another lie about size is required.
     495At first, an atom needs to be arranged as if it were bigger, but now a column needs to be arranged as if it is smaller (the wedge above it, shrinking upward).
    488496These lying columns, arranged contiguously according to their size (as announced) form the matrix @x[all]@.
    489 Because @x[all]@ takes indices, first for the fine stride, then for the coarse stride,
    490 it achieves the requirement of representing the transpose of @x@.
    491 Yet every time the programmer presents an index, a mere C-array subscript is achieving the offset calculation.
     497Because @x[all]@ takes indices, first for the fine stride, then for the coarse stride, it achieves the requirement of representing the transpose of @x@.
     498Yet every time the programmer presents an index, a C-array subscript is achieving the offset calculation.
    492499
    493500In the @x[all]@ case, after the finely strided subscript is done (column 3 is selected),
     
    495502compared with where analogous rows appear when the row-level option is presented for @x@.
    496503
    497 These size lies create an appearance of overlap.
    498 For example, in @x[all]@, the shaded band touches atoms 2.0, 2.1, 2.2, 2.3, 1.4, 1.5 and 1.6.
     504\PAB{I don't understand this paragraph: These size lies create an appearance of overlap.
     505For example, in \lstinline{x[all]}, the shaded band touches atoms 2.0, 2.1, 2.2, 2.3, 1.4, 1.5 and 1.6.
    499506But only the atom 2.3 is storing its value there.
    500 The rest are lying about (conflicting) claims on this location, but never exercising these alleged claims.
     507The rest are lying about (conflicting) claims on this location, but never exercising these alleged claims.}
    501508
    502509Lying is implemented as casting.
     
    504511This structure uses one type in its internal field declaration and offers a different type as the return of its subscript operator.
    505512The field within is a plain-C array of the fictional type, which is 7 floats long for @x[all][3][2]@ and 1 float long for @x[all][3]@.
    506 The subscript operator presents what's really inside, by casting to the type below the wedge of lie.
     513The subscript operator presents what is really inside, by casting to the type below the wedge of the lie.
    507514
    508515%  Does x[all] have to lie too?  The picture currently glosses over how it it advertises a size of 7 floats.  I'm leaving that as an edge case benignly misrepresented in the picture.  Edge cases only have to be handled right in the code.
    509516
    510 Casting, overlapping and lying are unsafe.
    511 The mission here is to implement a style-2 feature that the type system helps the programmer use safely.
    512 The offered style-2 system is allowed to be internally unsafe,
    513 just as C's implementation of a style-3 system (upon a style-4 system) is unsafe within,
    514 even when the programmer is using it without casts or pointer arithmetic.
    515 Having a style-2 system relieves the programmer from resorting to unsafe pointer arithmetic when working with noncontiguous slices.
    516 
    517 The choice to implement this style-2 system upon C's style-3 arrays, rather than its style-4 pointer arithmetic,
    518 reduces the attack surface of unsafe code.
    519 My casting is unsafe, but I do not do any pointer arithmetic.
    520 When a programmer works in the common-case style-3 subset (in the no-@[all]@ top of Figure~\ref{fig:subscr-all}),
    521 my casts are identities, and the C compiler is doing its usual displacement calculations.
    522 If I had implemented my system upon style-4 pointer arithmetic,
    523 then this common case would be circumventing C's battle-hardened displacement calculations in favour of my own.
    524 
    525 \noindent END: Paste looking for a home
     517Casting, overlapping, and lying are unsafe.
     518The mission is to implement a style-1 feature in the type system for safe use by a programmer.
     519The offered style-1 system is allowed to be internally unsafe,
     520just as C's implementation of a style-2 system (upon a style-3 system) is unsafe within, even when the programmer is using it without casts or pointer arithmetic.
     521Having a style-1 system relieves the programmer from resorting to unsafe pointer arithmetic when working with noncontiguous slices.
     522
     523% PAB: repeat from previous paragraph.
     524% The choice to implement this style-1 system upon C's style-2 arrays, rather than its style-3 pointer arithmetic, reduces the attack surface of unsafe code.
     525% My casting is unsafe, but I do not do any pointer arithmetic.
     526% When a programmer works in the common-case style-2 subset (in the no-@[all]@ top of Figure~\ref{fig:subscr-all}), my casts are identities, and the C compiler is doing its usual displacement calculations.
     527% If I had implemented my system upon style-3 pointer arithmetic, then this common case would be circumventing C's battle-hardened displacement calculations in favour of my own.
     528
     529% \noindent END: Paste looking for a home
    526530
    527531The new-array library defines types and operations that ensure proper elements are accessed soundly in spite of the overlapping.
    528 The private @arpk@ structure (array with explicit packing) is generic over these two types (and more): the contained element, what it is masquerading as.
    529 This structure's public interface is the @array(...)@ construction macro and the two subscript operators.
    530 Construction by @array@ initializes the masquerading-as type information to be equal to the contained-element information.
    531 Subscripting by @all@ rearranges the order of masquerading-as types to achieve, in general, nontrivial striding.
    532 Subscripting by a number consumes the masquerading-as size of the contained element type, does normal array stepping according to that size, and returns there element found there, in unmasked form.
    533 
    534 The @arpk@ structure and its @-[i]@ operator are thus defined as:
    535 \begin{cfa}
    536 forall( ztype(N),                       $\C{// length of current dimension}$
    537         dtype(S) | sized(S),    $\C{// masquerading-as}$
    538         dtype E_im,                             $\C{// immediate element, often another array}$
    539         dtype E_base                    $\C{// base element, e.g. float, never array}$
     532The @arpk@ structure and its @-[i]@ operator are defined as:
     533\begin{cfa}
     534forall(
     535        [N],                                    $\C{// length of current dimension}$
     536        S & | sized(S),                 $\C{// masquerading-as}$
     537        Timmed &,                               $\C{// immediate element, often another array}$
     538        Tbase &                                 $\C{// base element, e.g. float, never array}$
    540539) { // distribute forall to each element
    541540        struct arpk {
    542541                S strides[N];           $\C{// so that sizeof(this) is N of S}$
    543542        };
    544         // expose E_im, stride by S
    545         E_im & ?[?]( arpk(N, S, E_im, E_base) & a, ptrdiff_t i ) {
    546                 return (E_im &) a.strides[i];
     543        // expose Timmed, stride by S
     544        static inline Timmed & ?[?]( arpk( N, S, Timmed, Tbase ) & a, long int i ) {
     545                subcheck( a, i, 0, N );
     546                return (Timmed &)a.strides[i];
    547547        }
    548548}
    549549\end{cfa}
     550The private @arpk@ structure (array with explicit packing) is generic over four types: dimension length, masquerading-as, ...
     551This structure's public interface is hidden behind the @array(...)@ macro and the subscript operator.
     552Construction by @array@ initializes the masquerading-as type information to be equal to the contained-element information.
     553Subscripting by @all@ rearranges the order of masquerading-as types to achieve, in general, nontrivial striding.
     554Subscripting by a number consumes the masquerading-as size of the contained element type, does normal array stepping according to that size, and returns there element found there, in unmasked form.
    550555
    551556An instantiation of the @arpk@ generic is given by the @array(E_base, N0, N1, ...)@ expansion, which is @arpk( N0, Rec, Rec, E_base )@, where @Rec@ is @array(E_base, N1, ...)@.
     
    568573This section provides a demonstration of the effect.
    569574
    570 The experiment compares the \CFA array system with the padded-room system [TODO:xref] most typically exemplified by Java arrays, but also reflected in the C++ pattern where restricted vector usage models a checked array.
     575The experiment compares the \CFA array system with the padded-room system [TODO:xref] most typically exemplified by Java arrays, but also reflected in the \CC pattern where restricted vector usage models a checked array.
    571576The essential feature of this padded-room system is the one-to-one correspondence between array instances and the symbolic bounds on which dynamic checks are based.
    572 The experiment compares with the C++ version to keep access to generated assembly code simple.
    573 
    574 As a control case, a simple loop (with no reused dimension sizes) is seen to get the same optimization treatment in both the \CFA and C++ versions.
     577The experiment compares with the \CC version to keep access to generated assembly code simple.
     578
     579As a control case, a simple loop (with no reused dimension sizes) is seen to get the same optimization treatment in both the \CFA and \CC versions.
    575580When the programmer treats the array's bound correctly (making the subscript ``obviously fine''), no dynamic bound check is observed in the program's optimized assembly code.
    576581But when the bounds are adjusted, such that the subscript is possibly invalid, the bound check appears in the optimized assembly, ready to catch an occurrence the mistake.
     
    589594\section{Comparison with other arrays}
    590595
     596
     597\subsection{Rust}
     598
    591599\CFA's array is the first lightweight application of dependently-typed bound tracking to an extension of C.
    592 Other extensions of C that apply dependently-typed bound tracking are heavyweight, in that the bound tracking is part of a linearly typed ownership system that further helps guarantee statically the validity of every pointer deference.
     600Other extensions of C that apply dependently-typed bound tracking are heavyweight, in that the bound tracking is part of a linearly-typed ownership-system, which further helps guarantee statically the validity of every pointer deference.
    593601These systems, therefore, ask the programmer to convince the type checker that every pointer dereference is valid.
    594602\CFA imposes the lighter-weight obligation, with the more limited guarantee, that initially-declared bounds are respected thereafter.
     
    598606The \CFA array, applied to accordion structures [TOD: cross-reference] \emph{implies} the necessary pointer arithmetic, generated automatically, and not appearing at all in a user's program.
    599607
    600 \subsection{Safety in a padded room}
    601 
    602 Java's array [TODO:cite] is a straightforward example of assuring safety against undefined behaviour, at a cost of expressiveness for more applied properties.
    603 Consider the array parameter declarations in:
    604 
     608
     609\subsection{Java}
     610
     611Java arrays are arrays-of-arrays because all objects are references \see{\VRef{toc:mdimpl}}.
     612For each array, Java implicitly storages the array dimension in a descriptor, supporting array length, subscript checking, and allowing dynamically-sized array-parameter declarations.
     613\begin{cquote}
    605614\begin{tabular}{rl}
    606615C      &  @void f( size_t n, size_t m, float x[n][m] );@ \\
    607 Java   &  @void f( float[][] a );@
     616Java   &  @void f( float x[][] );@
    608617\end{tabular}
    609 
    610 Java's safety against undefined behaviour assures the callee that, if @x@ is non-null, then @a.length@ is a valid access (say, evaluating to the number $\ell$) and if @i@ is in $[0, \ell)$ then @x[i]@ is a valid access.
    611 If a value of @i@ outside this range is used, a runtime error is guaranteed.
    612 In these respects, C offers no guarantees at all.
    613 Notably, the suggestion that @n@ is the intended size of the first dimension of @x@ is documentation only.
    614 Indeed, many might prefer the technically equivalent declarations @float x[][m]@ or @float (*a)[m]@ as emphasizing the ``no guarantees'' nature of an infrequently used language feature, over using the opportunity to explain a programmer intention.
    615 Moreover, even if @x[0][0]@ is valid for the purpose intended, C's basic infamous feature is the possibility of an @i@, such that @x[i][0]@ is not valid for the same purpose, and yet, its evaluation does not produce an error.
    616 
    617 Java's lack of expressiveness for more applied properties means these outcomes are possible:
    618 \begin{itemize}
    619 \item @x[0][17]@ and @x[2][17]@ are valid accesses, yet @x[1][17]@ is a runtime error, because @x[1]@ is a null pointer
    620 \item the same observation, now because @x[1]@ refers to an array of length 5
    621 \item execution times vary, because the @float@ values within @x@ are sometimes stored nearly contiguously, and other times, not at all
    622 \end{itemize}
    623 C's array has none of these limitations, nor do any of the ``array language'' comparators discussed in this section.
    624 
    625 This Java level of safety and expressiveness is also exemplified in the C family, with the commonly given advice [TODO:cite example], for C++ programmers to use @std::vector@ in place of the C++ language's array, which is essentially the C array.
    626 The advice is that, while a vector is also more powerful (and quirky) than an array, its capabilities include options to preallocate with an upfront size, to use an available bound-checked accessor (@a.at(i)@ in place of @x[i]@), to avoid using @push_back@, and to use a vector of vectors.
    627 Used with these restrictions, out-of-bound accesses are stopped, and in-bound accesses never exercise the vector's ability to grow, which is to say, they never make the program slow to reallocate and copy, and they never invalidate the program's other references to the contained values.
    628 Allowing this scheme the same referential integrity assumption that \CFA enjoys [TODO:xref], this scheme matches Java's safety and expressiveness exactly.
    629 [TODO: decide about going deeper; some of the Java expressiveness concerns have mitigations, up to even more tradeoffs.]
     618\end{cquote}
     619However, in the C prototype, the parameters @n@ and @m@  are documentation only as the intended size of the first and second dimension of @x@.
     620\VRef[Figure]{f:JavaVsCTriangularMatrix} compares a triangular matrix (array-of-arrays) in dynamically safe Java to unsafe C.
     621Each dynamically sized row in Java stores its dimension, while C requires the programmer to manage these sizes explicitly (@rlnth@).
     622All subscripting is Java has bounds checking, while C has none.
     623Both Java and C require explicit null checking, otherwise there is a runtime failure.
     624
     625\begin{figure}
     626\setlength{\tabcolsep}{15pt}
     627\begin{tabular}{ll@{}}
     628\begin{java}
     629int m[][] = {  // triangular matrix
     630        new int [4],
     631        new int [3],
     632        new int [2],
     633        new int [1],
     634        null
     635};
     636
     637for ( int r = 0; r < m.length; r += 1 ) {
     638        if ( m[r] == null ) continue;
     639        for ( int c = 0; c < m[r].length; c += 1 ) {
     640                m[r][c] = c + r; // subscript checking
     641        }
     642
     643}
     644
     645for ( int r = 0; r < m.length; r += 1 ) {
     646        if ( m[r] == null ) {
     647                System.out.println( "null row" );
     648                continue;
     649        }
     650        for ( int c = 0; c < m[r].length; c += 1 ) {
     651                System.out.print( m[r][c] + " " );
     652        }
     653        System.out.println();
     654
     655}
     656\end{java}
     657&
     658\begin{cfa}
     659int * m[5] = {  // triangular matrix
     660        calloc( 4, sizeof(int) ),
     661        calloc( 3, sizeof(int) ),
     662        calloc( 2, sizeof(int) ),
     663        calloc( 1, sizeof(int) ),
     664        NULL
     665};
     666int rlnth = 4;
     667for ( int r = 0; r < 5; r += 1 ) {
     668        if ( m[r] == NULL ) continue;
     669        for ( int c = 0; c < rlnth; c += 1 ) {
     670                m[r][c] = c + r; // no subscript checking
     671        }
     672        rlnth -= 1;
     673}
     674rlnth = 4;
     675for ( int r = 0; r < 5; r += 1 ) {
     676        if ( m[r] == NULL ) {
     677                printf( "null row\n" );
     678                continue;
     679        }
     680        for ( int c = 0; c < rlnth; c += 1 ) {
     681                printf( "%d ", m[r][c] );
     682        }
     683        printf( "\n" );
     684        rlnth -= 1;
     685}
     686\end{cfa}
     687\end{tabular}
     688\caption{Java (left) \vs C (right) Triangular Matrix}
     689\label{f:JavaVsCTriangularMatrix}
     690\end{figure}
     691
     692The downside of the arrays-of-arrays approach is performance due to pointer chasing versus pointer arithmetic for a contiguous arrays.
     693Furthermore, there is the cost of managing the implicit array descriptor.
     694It is unlikely that a JIT can dynamically rewrite an arrays-of-arrays form into a contiguous form.
     695
     696
     697\subsection{\CC}
     698
     699Because C arrays are difficult and dangerous, the mantra for \CC programmers is to use @std::vector@ in place of the C array.
     700While the vector size can grow and shrink dynamically, \vs a fixed-size dynamic size with VLAs, the cost of this extra feature is mitigated by preallocating the maximum size (like the VLA) at the declaration (one dynamic call) to avoid using @push_back@.
     701\begin{c++}
     702vector< vector< int > > m( 5, vector<int>(8) ); // initialize size of 5 x 8 with 6 dynamic allocations
     703\end{c++}
     704Multidimensional arrays are arrays-of-arrays with associated costs.
     705Each @vector@ array has an array descriptor contain the dimension, which allows bound checked using @x.at(i)@ in place of @x[i]@.
     706Used with these restrictions, out-of-bound accesses are caught, and in-bound accesses never exercise the vector's ability to grow, preventing costly reallocate and copy, and never invalidate references to contained values.
     707This scheme matches Java's safety and expressiveness exactly, but with the inherent costs.
     708
    630709
    631710\subsection{Levels of dependently typed arrays}
     
    655734it can also do these other cool checks, but watch how I can mess with its conservativeness and termination
    656735
    657 Two current, state-of-the-art array languages, Dex\cite{arr:dex:long} and Futhark\cite{arr:futhark:tytheory}, offer offer novel contributions concerning similar, restricted dependent types for tracking array length.
     736Two current, state-of-the-art array languages, Dex\cite{arr:dex:long} and Futhark\cite{arr:futhark:tytheory}, offer novel contributions concerning similar, restricted dependent types for tracking array length.
    658737Unlike \CFA, both are garbage-collected functional languages.
    659738Because they are garbage-collected, referential integrity is built-in, meaning that the heavyweight analysis, that \CFA aims to avoid, is unnecessary.
     
    727806[TODO: introduce Ada in the comparators]
    728807
    729 In Ada and Dex, an array is conceived as a function whose domain must satisfy only certain structural assumptions, while in C, C++, Java, Futhark and \CFA today, the domain is a prefix of the natural numbers.
     808In Ada and Dex, an array is conceived as a function whose domain must satisfy only certain structural assumptions, while in C, \CC, Java, Futhark and \CFA today, the domain is a prefix of the natural numbers.
    730809The generality has obvious aesthetic benefits for programmers working on scheduling resources to weekdays, and for programmers who prefer to count from an initial number of their own choosing.
    731810
  • doc/theses/mike_brooks_MMath/programs/hello-accordion.cfa

    rb8e047a rcaa3e2c  
    99
    1010forall( T, @[NprovTerty]@, @[Nmunicipalities]@ )
    11 struct CanadaPop {
     11struct CanPop {
    1212        array( T, @NprovTerty@ ) provTerty; $\C{// nested VLA}$
    1313        array( T, @Nmunicipalities@ ) municipalities; $\C{// nested VLA}$
     
    1919
    2020forall( T, [NprovTerty], [Nmunicipalities] )
    21         void ?{}( T &, CanadaPop( T, NprovTerty, Nmunicipalities ) & this ) {}
     21        void ?{}( T &, CanPop( T, NprovTerty, Nmunicipalities ) & this ) {}
    2222
    2323forall( T &, [NprovTerty], [Nmunicipalities] )
    24         void ^?{}( CanadaPop( T, NprovTerty, Nmunicipalities ) & this ) {}
     24        void ^?{}( CanPop( T, NprovTerty, Nmunicipalities ) & this ) {}
    2525
    2626
     
    3939
    4040forall( T, [NprovTerty], [Nmunicipalities] )
    41 void check( CanadaPop( T, NprovTerty, Nmunicipalities ) & pop ) with( pop ) {
     41void check( CanPop( T, NprovTerty, Nmunicipalities ) & pop ) with( pop ) {
    4242        total_pt = total_mun = 0;
    4343        for ( i; NprovTerty ) total_pt += provTerty[i];
     
    6060int main( int argc, char * argv[] ) {
    6161        const int npt = ato( argv[1] ), nmun = ato( argv[2] );
    62         @CanadaPop( int, npt, nmun ) pop;@
     62        @CanPop( int, npt, nmun ) pop;@
    6363        // read in population numbers
    6464        @check( pop );@
     
    7171Total province/territory: 36,991,981
    7272Total municipalities: 36,991,981
     73$\$$ ./a.out  13  3654
     74Total province/territory: 36,991,981
     75Total municipalities: 36,991,981
    7376*/
    7477
  • doc/theses/mike_brooks_MMath/programs/hello-array.cfa

    rb8e047a rcaa3e2c  
    99
    1010forall( [@N@] )                                                         $\C{// array dimension}$
    11 array( bool, @N@) & f( array( float, @N@ ) & x, array( float, @N@ ) & y ) {
     11array( bool, @N@ ) & f( array( float, @N@ ) & x, array( float, @N@ ) & y ) {
    1212        array( bool, @N@ ) & ret = *@alloc@();  $\C{// sizeof ret  used by alloc}$
    1313        for ( i; @N@ ) {
     
    2929
    3030int main( int argc, char * argv[] ) {
    31         const int @n@ = ato( argv[1] );                 $\C{// deduce conversion type}$
    32         array( float, @n@ ) x, y;                               $\C{// VLAs}$
    33         for ( i; n ) {                                                  $\C{// initialize arrays}$
     31        const int @dim@ = ato( argv[1] );               $\C{// deduce conversion type}$
     32        array( float, @dim@ ) x, y;                             $\C{// VLAs}$
     33        for ( i; dim ) {                                                $\C{// initialize arrays}$
    3434                x[i] = 3.14 / (i + 1);
    3535                y[i] = x[i] + 0.005 ;
    3636        }
    37         array( bool, @n@ ) & result = @f( x, y )@; $\C{// call}$
     37        array( bool, @dim@ ) & result = @f( x, y )@; $\C{// call}$
    3838        sout | "result: " | nonl;                               $\C{// print result}$
    39         for ( i; n )
     39        for ( i; dim )
    4040                sout | result[i] | nonl;
    4141        sout | nl;
  • doc/theses/mike_brooks_MMath/programs/hello-md.cfa

    rb8e047a rcaa3e2c  
    138138
    139139
    140 print1d_cstyle( m[ 2 ] );  $\C{// row 2:  2.0  2.1  2.2  2.3  2.4  2.5  2.6}$
     140print1d_cstyle( @m[ 2 ]@ );  $\C{// row 2:  2.0  2.1  2.2  2.3  2.4  2.5  2.6}$
    141141
    142142
Note: See TracChangeset for help on using the changeset viewer.