Ignore:
Timestamp:
Jul 8, 2024, 9:01:16 AM (4 months ago)
Author:
Peter A. Buhr <pabuhr@…>
Branches:
master
Children:
e26a842
Parents:
1e110bf
Message:

proofreading array chapter

File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/mike_brooks_MMath/array.tex

    r1e110bf r5a553e2  
    22\label{c:Array}
    33
     4
    45\section{Introduction}
    56
    6 This chapter describes my contribution of language and library features that provide a length-checked array type, as in:
    7 
    8 \begin{lstlisting}
    9 array(float, 99) x;    // x contains 99 floats
    10 
    11 void f( array(float, 42) & a ) {}
    12 f(x);                  // statically rejected: types are different
     7Arrays in C are possible the single most misunderstood and incorrectly used features in the language, resulting in the largest proportion of runtime errors and security violations.
     8This chapter describes the new \CFA language and library features that introduce a length-checked array-type to the \CFA standard library~\cite{Cforall}, \eg:
     9\begin{cfa}
     10@array( float, 99 )@ x;                                 $\C{// x contains 99 floats}$
     11void f( @array( float, 42 )@ & p ) {}   $\C{// p accepts 42 floats}$
     12f( x );                                                                 $\C{// statically rejected: types are different, 99 != 42}$
    1313
    1414forall( T, [N] )
    15 void g( array(T, N) & a, int i ) {
    16         T elem = a[i];     // dynamically checked: requires 0 <= i < N
    17 }
    18 g(x, 0);               // T is float, N is 99, succeeds
    19 g(x, 1000);            // T is float, N is 99, dynamic check fails
    20 \end{lstlisting}
    21 
    22 This example first declares @x@ a variable, whose type is an instantiation of the generic type named @array@, with arguments @float@ and @99@.
    23 Next, it declares @f@ as a function that expects a length-42 array; the type system rejects the call's attempt to pass @x@ to @f@, because the lengths do not match.
    24 Next, the @forall@ annotation on function @g@ introduces @T@ as a familiar type parameter and @N@ as a \emph{dimension} parameter, a new feature that represents a count of elements, as managed by the type system.
    25 Because @g@ accepts any length of array; the type system accepts the calls' passing @x@ to @g@, inferring that this length is 99.
    26 Just as the caller's code does not need to explain that @T@ is @float@, the safe capture and communication of the value @99@ occurs without programmer involvement.
    27 In the case of the second call (which passes the value 1000 for @i@), within the body of @g@, the attempt to subscript @a@ by @i@ fails with a runtime error, since $@i@ \nless @N@$.
    28 
    29 The type @array@, as seen above, comes from my additions to the \CFA standard library.
    30 It is very similar to the built-in array type, which \CFA inherits from C.
     15void g( @array( T, N )@ & p, int i ) {
     16        T elem = p[i];                                          $\C{// dynamically checked: requires 0 <= i < N}$
     17}
     18g( x, 0 );                                                              $\C{// T is float, N is 99, dynamic subscript check succeeds}$
     19g( x, 1000 );                                                   $\C{// T is float, N is 99, dynamic subscript check fails}$
     20\end{cfa}
     21This example declares variable @x@, with generic type @array@ using arguments @float@ and @99@.
     22Function @f@ is declared with an @array@ parameter of length @42@.
     23The call @f( x )@ is invalid because the @array@ lengths @99@ and @42@ do not match.
     24Next, function @g@ introduces a @forall@ prefix on type parameter @T@ and arbitrary \emph{dimension parameter} @N@, the new feature that represents a count of elements managed by the type system.
     25The call @g( x, 0 )@ is valid because @g@ accepts any length of array, where the type system infers @float@ for @T@ and length @99@ for @N@.
     26Inferring values for @T@ and @N@ is implicit without programmer involvement.
     27Furthermore, the runtime subscript @x[0]@ (parameter @i@ is @0@) in @g@ is valid because @0@ is in the dimension range $[0,99)$ of argument @x@.
     28The call @g( x, 1000 )@ is also valid;
     29however, the runtime subscript @x[1000]@ is invalid (generates a subscript error) because @1000@ is outside the dimension range $[0,99)$ of argument @x@.
     30
     31The generic @array@ type is similar to the C array type, which \CFA inherits from C.
    3132Its runtime characteristics are often identical, and some features are available in both.
    32 
    33 \begin{lstlisting}
     33For example, assume a caller can instantiates @N@ with 42 in the following (details to follow).
     34\begin{cfa}
    3435forall( [N] )
    3536void declDemo() {
    36         float a1[N];         // built-in type ("C array")
    37         array(float, N) a2;  // type from library
    38 }
    39 \end{lstlisting}
    40 
    41 If a caller instantiates @N@ with 42, then both locally-declared array variables, @a1@ and @a2@, become arrays of 42 elements, each element being a @float@.
    42 The two variables have identical size and layout; they both encapsulate 42-float stack allocations, no heap allocations, and no further "bookkeeping" allocations/header.
    43 Having the @array@ library type (that of @a2@) is a tactical measure, an early implementation that offers full feature support.
    44 A future goal (TODO xref) is to port all of its features into the built-in array type (that of @a1@); then, the library type could be removed, and \CFA would have only one array type.
    45 In present state, the built-in array has partial support for the new features.
    46 The fully-featured library type is used exclusively in introductory examples; feature support and C compatibility are revisited in sec TODO.
    47 
    48 Offering the @array@ type, as a distinct alternative from the the C array, is consistent with \CFA's extension philosophy (TODO xref background) to date.
    49 A few compatibility-breaking changes to the behaviour of the C array were also made, both as an implementation convenience, and as justified fixes to C's lax treatment.
    50 
    51 The @array@ type is an opportunity to start from a clean slate and show a cohesive selection of features.
    52 A clean slate was an important starting point because it meant not having to deal with every inherited complexity introduced in TODO xref background-array.
    53 
    54 
    55 My contributions are
    56 \begin{itemize}
    57 \item a type system enhancement that lets polymorphic functions and generic types be parameterized by a numeric value: @forall( [N] )@
     37        float x1[N];                            $\C{// built-in type ("C array")}$
     38        array(float, N) x2;                     $\C{// type from library}$
     39}
     40\end{cfa}
     41Both of the locally-declared array variables, @x1@ and @x2@, have 42 elements, each element being a @float@.
     42The two variables have identical size and layout; they both encapsulate 42-float, stack \vs heap allocations with no additional ``bookkeeping'' allocations or headers.
     43Providing this explicit generic approach required a significant extension to the \CFA type system to support a full-feature, safe, efficient (space and time) array-type, which forms the foundation for more complex array forms in \CFA.
     44
     45Admittedly, the @array@ library type (type for @x2@) is syntactically different from its C counterpart.
     46A future goal (TODO xref) is to provide a built-in array type with syntax approaching C's (type for @x1@);
     47then, the library @array@ type can be removed giving \CFA a largely uniform array type.
     48At present, the built-in array is only partially supported, so the generic @array@ is used exclusively in the discussion;
     49feature support and C compatibility are revisited in Section ? TODO.
     50
     51Offering an @array@ type, as a distinct alternative to the C array, is consistent with \CFA's goal of backwards compatibility, \ie virtually all existing C (gcc) programs can be compiled by \CFA with only a small number of changes, similar to \CC (g++).
     52However, a few compatibility-breaking changes to the behaviour of the C array are necessary, both as an implementation convenience and to fix C's lax treatment of arrays.
     53Hence, the @array@ type is an opportunity to start from a clean slate and show a cohesive selection of features, making it unnecessary to deal with every inherited complexity introduced by the C array TODO xref.
     54
     55My contributions are:
     56\begin{enumerate}
     57\item A type system enhancement that lets polymorphic functions and generic types be parameterized by a numeric value: @forall( [N] )@.
     58\item Provide a length-checked array-type in the \CFA standard library, where the array's length is statically managed and dynamically valued.
     59\item Provide argument/parameter passing safety for arrays and subscript safety.
    5860\item TODO: general parking...
    59 \item identify specific abilities brought by @array@
    60 \item Where there is a gap concerning this feature's readiness for prime-time, identification of specific workable improvements that are likely to close the gap
    61 \end{itemize}
     61\item Identify the interesting specific abilities available by the new @array@ type.
     62\item Where there is a gap concerning this feature's readiness for prime-time, identification of specific workable improvements that are likely to close the gap.
     63\end{enumerate}
    6264
    6365
     
    6870
    6971
    70 \section{Features Added}
    71 
    72 The present work adds a type @array@ to the \CFA standard library~\cite{Cforall}.
    73 
    74 This array's length is statically managed and dynamically valued.
    75 This static management achieves argument safety and suggests a path to subscript safety as future work (TODO: cross reference).
    76 
    77 This section presents motivating examples of the new array type's usage and follows up with definitions of the notations that appear.
    78 
    79 The core of the new array management is tracking all array lengths in the type system.
    80 Dynamically valued lengths are represented using type variables.
    81 The stratification of type variables preceding object declarations makes a length referenceable everywhere that it is needed.
     72\section{Features added}
     73
     74This section presents motivating examples for the new array type, demonstrating the syntax and semantics of the generic @array@.
     75As stated, the core capability of the new array is tracking all dimensions in the type system, where dynamic dimensions are represented using type variables.
     76
     77The definition of type variables preceding object declarations makes the array dimension lexically referenceable where it is needed.
    8278For example, a declaration can share one length, @N@, among a pair of parameters and the return.
    8379\lstinput{10-17}{hello-array.cfa}
    8480Here, the function @f@ does a pointwise comparison, checking if each pair of numbers is within half a percent of each other, returning the answers in a newly allocated @bool@ array.
    85 
    86 The array type uses the parameterized length information in its @sizeof@ determination, illustrated in the example's call to @alloc@.
    87 That call requests an allocation of type @array(bool, N)@, which the type system deduces from the left-hand side of the initialization, into the return type of the @alloc@ call.
    88 Preexisting \CFA behaviour is leveraged here, both in the return-type-only polymorphism, and the @sized(T)@-aware standard-library @alloc@ routine.
    89 The new @array@ type plugs into this behaviour by implementing the @sized@/@sizeof@ assertion to have the intuitive meaning.
    90 As a result, this design avoids an opportunity for programmer error by making the size/length communication to a called routine implicit, compared with C's @calloc@ (or the low-level \CFA analog @aalloc@), which take an explicit length parameter not managed by the type system.
    91 
    92 \VRef[Figure]{f:fHarness} shows the harness to use the @f@ function illustrating how dynamic values are fed into the system.
    93 Here, the @a@ array is loaded with decreasing values, and the @b@ array with amounts off by a constant, giving relative differences within tolerance at first and out of tolerance later.
    94 The program main is run with two different inputs of sequence length.
     81The dynamic allocation of the @ret@ array by @alloc@ uses the parameterized dimension information in its implicit @_Alignof@ and @sizeof@ determinations, and casting the return type.
     82\begin{cfa}
     83static inline forall( T & | sized(T) )
     84T * alloc( size_t dim ) {
     85        if ( _Alignof(T) <= libAlign() ) return (T *)aalloc( dim, sizeof(T) ); // calloc without zero fill
     86        else return (T *)amemalign( _Alignof(T), dim, sizeof(T) ); // array memalign
     87}
     88\end{cfa}
     89Here, the type system deduces from the left-hand side of the assignment the type @array(bool, N)@, and forwards it as the type variable @T@ for the generic @alloc@ function, making it available in its body.
     90Hence, preexisting \CFA behaviour is leveraged here, both in the return-type polymorphism, and the @sized(T)@-aware standard-library @alloc@ routine.
     91This example illustrates how the new @array@ type plugs into existing \CFA behaviour by implementing necessary @sized@ assertions needed by other types.
     92(@sized@ implies a concrete \vs abstract type with a compile-time size.)
     93As a result, there is significant programming safety by making the size accessible and implicit, compared with C's @calloc@ and non-array supporting @memalign@, which take an explicit length parameter not managed by the type system.
    9594
    9695\begin{figure}
    97 \lstinput{30-49}{hello-array.cfa}
     96\lstinput{30-43}{hello-array.cfa}
     97\lstinput{45-48}{hello-array.cfa}
    9898\caption{\lstinline{f} Harness}
    9999\label{f:fHarness}
    100100\end{figure}
    101101
    102 The loops in the program main follow the more familiar pattern of using the ordinary variable @n@ to convey the length.
    103 The type system implicitly captures this value at the call site (@main@ calling @f@) and makes it available within the callee (@f@'s loop bound).
    104 
    105 The two parts of the example show @n@ adapting a variable into a type-system managed length (at @main@'s declarations of @a@, @b@, and @result@), @N@ adapting in the opposite direction (at @f@'s loop bound), and a pass-thru use of a managed length (at @f@'s declaration of @ret@).
    106 
    107 The @forall( ...[N] )@ participates in the user-relevant declaration of the name @N@, which becomes usable in parameter/return declarations and in the function @b@.
    108 The present form is chosen to parallel the existing @forall@ forms:
    109 \begin{cfa}
    110 forall( @[N]@ ) ... // array kind
    111 forall( & T  ) ...  // reference kind (dtype)
    112 forall( T  ) ...    // value kind (otype)
    113 \end{cfa}
    114 
    115 The notation @array(thing, N)@ is a single-dimensional case, giving a generic type instance.
     102\VRef[Figure]{f:fHarness} shows a harness that uses the @f@ function illustrating how dynamic values are fed into the @array@ type.
     103Here, the dimension of the @x@, @y@, and @result@ arrays is specified from a command-line value and these arrays are allocated on the stack.
     104Then the @x@ array is initialized with decreasing values, and the @y@ array with amounts offset by constant @0.005@, giving relative differences within tolerance initially and diverging for later values.
     105The program main is run (see figure bottom) with inputs @5@ and @7@ for sequence lengths.
     106The loops follow the familiar pattern of using the variable @n@ to iterate through the arrays.
     107Most importantly, the type system implicitly captures @n@ at the call of @f@ and makes it available throughout @f@ as @N@.
     108The example shows @n@ adapting into a type-system managed length at the declarations of @x@, @y@, and @result@, @N@ adapting in the same way at @f@'s loop bound, and a pass-thru use of @n@ at @f@'s declaration of @ret@.
     109Except for the lifetime-management issue of @result@, \ie explicit @free@, this program has eliminated both the syntactic and semantic problems associated with C arrays and their usage.
     110These benefits cannot be underestimated.
     111
     112In general, the @forall( ..., [N] )@ participates in the user-relevant declaration of the name @N@, which becomes usable in parameter/return declarations and within a function.
     113The syntactic form is chosen to parallel other @forall@ forms:
     114\begin{cfa}
     115forall( @[N]@ ) ...     $\C[1.5in]{// array kind}$
     116forall( T & ) ...       $\C{// reference kind (dtype)}$
     117forall( T ) ...         $\C{// value kind (otype)}\CRT$
     118\end{cfa}
     119% The notation @array(thing, N)@ is a single-dimensional case, giving a generic type instance.
    116120In summary:
    117121\begin{itemize}
    118122\item
    119 @[N]@ -- within a forall, declares the type variable @N@ to be a managed length
    120 \item
    121 $e$ -- a type representing the value of $e$ as a managed length, where $e$ is a @size_t@-typed expression
    122 \item
    123 N -- an expression of type @size_t@, whose value is the managed length @N@
    124 \item
    125 @array( thing, N0, N1, ... )@ -- a type wrapping $\prod_i N_i$ adjacent occurrences of @thing@ objects
     123@[N]@ within a forall declares the type variable @N@ to be a managed length.
     124\item
     125The type of @N@ within code is @size_t@.
     126\item
     127The value of @N@ within code is the acquired length derived from the usage site, \ie generic declaration or function call.
     128\item
     129@array( thing, N0, N1, ... )@ is a multi-dimensional type wrapping $\prod_i N_i$ adjacent occurrences of @thing@ objects.
    126130\end{itemize}
    127 Unsigned integers have a special status in this type system.
    128 Unlike how C++ allows
     131
     132\VRef[Figure]{f:TemplateVsGenericType} shows @N@ is not the same as a @size_t@ declaration in a \CC \lstinline[language=C++]{template}.
     133\begin{enumerate}[leftmargin=*]
     134\item
     135The \CC template @N@ is a compile-time value, while the \CFA @N@ is a runtime value.
     136\item
     137The \CC template @N@ must be passed explicitly at the call, unless @N@ has a default value, even when \CC can deduct the type of @T@.
     138The \CFA @N@ is part of the array type and passed implicitly at the call.
     139\item
     140\CC cannot have an array of references, but can have an array of pointers.
     141\CC has a (mistaken) belief that references are not objects, but pointers are objects.
     142In the \CC example, the arrays fall back on C arrays, which have a duality with references with respect to automatic dereferencing.
     143The \CFA array is a contiguous object with an address, which can stored as a reference or pointer.
     144\item
     145C/\CC arrays cannot be copied, while \CFA arrays can be copied, making them a first-class object (although array copy is often avoided for efficiency).
     146\end{enumerate}
     147
     148\begin{figure}
     149\begin{tabular}{@{}l@{\hspace{20pt}}l@{}}
    129150\begin{c++}
    130 template< size_t N, char * msg, typename T >... // declarations
     151
     152@template< typename T, size_t N >@
     153void copy( T ret[N], T x[N] ) {
     154        for ( int i = 0; i < N; i += 1 ) ret[i] = x[i];
     155}
     156int main() {
     157        int ret[10], x[10];
     158        for ( int i = 0; i < 10; i += 1 ) x[i] = i;
     159        @copy<int, 10 >( ret, x );@
     160        for ( int i = 0; i < 10; i += 1 )
     161                cout << ret[i] << ' ';
     162        cout << endl;
     163}
    131164\end{c++}
    132 \CFA does not accommodate values of any user-provided type.
    133 TODO: discuss connection with dependent types.
    134 An example of a type error demonstrates argument safety.
    135 The running example has @f@ expecting two arrays of the same length.
    136 A compile-time error occurs when attempting to call @f@ with arrays whose lengths may differ.
    137 \begin{cfa}
    138 forall( [M], [N] )
    139 void bad( array(float, M) &a, array(float, N) &b ) {
    140         f( a, a ); // ok
    141         f( b, b ); // ok
    142         f( a, b ); // error
    143 }
    144 \end{cfa}
    145 %\lstinput{60-65}{hello-array.cfa}
    146 As is common practice in C, the programmer is free to cast, to assert knowledge not shared with the type system.
    147 \begin{cfa}
    148 forall( [M], [N] )
    149 void bad_fixed( array(float, M) & a, array(float, N) & b ) {
    150         if ( M == N ) {
    151             f( a, (array(float, M) &)b ); // cast b to matching type
     165&
     166\begin{cfa}
     167int main() {
     168        @forall( T, [N] )@   // nested function
     169        void copy( array( T, N ) & ret, array( T, N ) & x ) {
     170                for ( i; 10 ) ret[i] = x[i];
    152171        }
    153 }
    154 \end{cfa}
    155 %\lstinput{70-75}{hello-array.cfa}
    156 
    157 Argument safety and the associated implicit communication of array length work with \CFA's generic types too.
    158 \CFA allows aggregate types to be generalized with multiple type parameters, including parameterized element type, so can it be defined over a parameterized length.
    159 Doing so gives a refinement of C's ``flexible array member'' pattern, that allows nesting structures with array members anywhere within other structures.
    160 \lstinput{10-16}{hello-accordion.cfa}
    161 This structure's layout has the starting offset of @cost_contribs@ varying in @Nclients@, and the offset of @total_cost@ varying in both generic parameters.
    162 For a function that operates on a @request@ structure, the type system handles this variation transparently.
    163 \lstinput{40-47}{hello-accordion.cfa}
    164 In the example, different runs of the program result in different offset values being used.
    165 \lstinput{60-76}{hello-accordion.cfa}
     172
     173        array( int, 10 ) ret, x;
     174        for ( i; 10 ) x[i] = i;
     175        @copy( ret,  x );@
     176        for ( i; 10 )
     177                sout | ret[i] | nonl;
     178        sout | nl;
     179}
     180\end{cfa}
     181\end{tabular}
     182\caption{\CC \lstinline[language=C++]{template} \vs \CFA generic type }
     183\label{f:TemplateVsGenericType}
     184\end{figure}
     185
     186Continuing the discussion of \VRef[Figure]{f:fHarness}, the example has @f@ expecting two arrays of the same length.
     187A compile-time error occurs when attempting to call @f@ with arrays of differing lengths.
     188% removing leading whitespace
     189\lstinput[tabsize=1]{52-53}{hello-array.cfa}
     190\lstinput[tabsize=1,aboveskip=0pt]{62-64}{hello-array.cfa}
     191As is common practice in C, the programmer is free to cast, \ie to assert knowledge not shared with the type system.
     192\lstinput{70-74}{hello-array.cfa}
     193
     194Orthogonally, the new @array@ type works with \CFA's generic types, providing argument safety and the associated implicit communication of array length.
     195Specifically, \CFA allows aggregate types to be generalized with multiple type parameters, including parameterized element types and lengths.
     196Doing so gives a refinement of C's ``flexible array member'' pattern, allowing nesting structures with array members anywhere within other structures.
     197\lstinput{10-15}{hello-accordion.cfa}
     198This structure's layout has the starting offset of @municipalities@ varying in @NprovTerty@, and the offset of @total_pt@ and @total_mun@ varying in both generic parameters.
     199For a function that operates on a @CanadaPop@ structure, the type system handles this variation transparently.
     200\lstinput{40-45}{hello-accordion.cfa}
     201\VRef[Figure]{f:checkHarness} shows program results where different offset values being used.
    166202The output values show that @summarize@ and its caller agree on both the offsets (where the callee starts reading @cost_contribs@ and where the callee writes @total_cost@).
    167 Yet the call site still says just, ``pass the request.''
    168 
    169 
    170 \section{Multidimensional implementation}
     203Yet the call site just says, ``pass the request.''
     204
     205\begin{figure}
     206\lstinput{60-68}{hello-accordion.cfa}
     207\lstinput{70-72}{hello-accordion.cfa}
     208\caption{\lstinline{check} Harness}
     209\label{f:checkHarness}
     210\end{figure}
     211
     212
     213\section{Multidimensional Arrays}
    171214\label{toc:mdimpl}
    172215
    173 TODO: introduce multidimensional array feature and approaches
    174 
    175 The new \CFA standard library @array@ datatype supports multidimensional uses more richly than the C array.
    176 The new array's multidimensional interface and implementation, follows an array-of-arrays setup, meaning, like C's @float[n][m]@ type, one contiguous object, with coarsely-strided dimensions directly wrapping finely-strided dimensions.
    177 This setup is in contrast with the pattern of array of pointers to other allocations representing a sub-array.
    178 Beyond what C's type offers, the new array brings direct support for working with a noncontiguous array slice, allowing a program to work with dimension subscripts given in a non-physical order.
    179 C and C++ require a programmer with such a need to manage pointer/offset arithmetic manually.
    180 
    181 Examples are shown using a $5 \times 7$ float array, @a@, loaded with increments of $0.1$ when stepping across the length-7 finely-strided dimension shown on columns, and with increments of $1.0$ when stepping across the length-5 coarsely-strided dimension shown on rows.
    182 %\lstinput{120-126}{hello-md.cfa}
    183 The memory layout of @a@ has strictly increasing numbers along its 35 contiguous positions.
     216% TODO: introduce multidimensional array feature and approaches
     217
     218When working with arrays, \eg linear algebra, array dimensions are referred to as ``rows'' and ``columns'' for a matrix, adding ``planes'' for a cube.
     219(There is little terminology for higher dimensional arrays.)
     220For example, an acrostic poem\footnote{A type of poetry where the first, last or other letters in a line spell out a particular word or phrase in a vertical column.}
     221can be treated as a grid of characters, where the rows are the text and the columns are the embedded keyword(s).
     222Within a poem, there is the concept of a \newterm{slice}, \eg a row is a slice for the poem text, a column is a slice for a keyword.
     223In general, the dimensioning and subscripting for multidimensional arrays has two syntactic forms: @m[r,c]@ or @m[r][c]@.
     224
     225Commonly, an array, matrix, or cube, is visualized (especially in mathematics) as a contiguous row, rectangle, or block.
     226This conceptualization is reenforced by subscript ordering, \eg $m_{r,c}$ for a matrix and $c_{p,r,c}$ for a cube.
     227Few programming languages differ from the mathematical subscript ordering.
     228However, computer memory is flat, and hence, array forms are structured in memory as appropriate for the runtime system.
     229The closest representation to the conceptual visualization is for an array object to be contiguous, and the language structures this memory using pointer arithmetic to access the values using various subscripts.
     230This approach still has degrees of layout freedom, such as row or column major order, \ie juxtaposed rows or columns in memory, even when the subscript order remains fixed.
     231For example, programming languages like MATLAB, Fortran, Julia and R store matrices in column-major order since they are commonly used for processing column-vectors in tabular data sets but retain row-major subscripting.
     232In general, storage layout is hidden by subscripting, and only appears when passing arrays among different programming languages or accessing specific hardware.
     233
     234\VRef[Figure]{f:FixedVariable} shows two C90 approaches for manipulating contiguous arrays.
     235Note, C90 does not support VLAs.
     236The fixed-dimension approach uses the type system;
     237however, it requires all dimensions except the first to be specified at compile time, \eg @m[][6]@, allowing all subscripting stride calculations to be generated with constants.
     238Hence, every matrix passed to @fp1@ must have exactly 6 columns but the row size can vary.
     239The variable-dimension approach ignores (violates) the type system, \ie argument and parameters types do not match, and manually performs pointer arithmetic for subscripting in the macro @sub@.
     240
     241\begin{figure}
     242\begin{tabular}{@{}l@{\hspace{40pt}}l@{}}
     243\multicolumn{1}{c}{\textbf{Fixed Dimension}} & \multicolumn{1}{c}{\textbf{Variable Dimension}} \\
     244\begin{cfa}
     245
     246void fp1( int rows, int m[][@6@] ) {
     247        ...  printf( "%d ", @m[r][c]@ );  ...
     248}
     249int fm1[4][@6@], fm2[6][@6@]; // no VLA
     250// initialize matrixes
     251fp1( 4, fm1 ); // implicit 6 columns
     252fp1( 6, fm2 );
     253\end{cfa}
     254&
     255\begin{cfa}
     256#define sub( m, r, c ) *(m + r * sizeof( m[0] ) + c)
     257void fp2( int rows, int cols, int *m ) {
     258        ...  printf( "%d ", @sub( m, r, c )@ );  ...
     259}
     260int vm1[4][4], vm2[6][8]; // no VLA
     261// initialize matrixes
     262fp2( 4, 4, vm1 );
     263fp2( 6, 8, vm2 );
     264\end{cfa}
     265\end{tabular}
     266\caption{C90 Fixed \vs Variable Contiguous Matrix Styles}
     267\label{f:FixedVariable}
     268\end{figure}
     269
     270Many languages allow multidimensional arrays-of-arrays, \eg in Pascal or \CC.
     271\begin{cquote}
     272\begin{tabular}{@{}ll@{}}
     273\begin{pascal}
     274var m : array[0..4, 0..4] of Integer;  (* matrix *)
     275type AT = array[0..4] of Integer;  (* array type *)
     276type MT = array[0..4] of AT;  (* array of array type *)
     277var aa : MT;  (* array of array variable *)
     278m@[1][2]@ := 1;   aa@[1][2]@ := 1 (* same subscripting *)
     279\end{pascal}
     280&
     281\begin{c++}
     282int m[5][5];
     283
     284typedef vector< vector<int> > MT;
     285MT vm( 5, vector<int>( 5 ) );
     286m@[1][2]@ = 1;  aa@[1][2]@ = 1;
     287\end{c++}
     288\end{tabular}
     289\end{cquote}
     290The language decides if the matrix and array-of-array are laid out the same or differently.
     291For example, an array-of-array may be an array of row pointers to arrays of columns, so the rows may not be contiguous in memory nor even the same length (triangular matrix).
     292Regardless, there is usually a uniform subscripting syntax masking the memory layout, even though the two array forms could be differentiated at the subscript level, \eg @m[1,2]@ \vs @aa[1][2]@.
     293Nevertheless, controlling memory layout can make a difference in what operations are allowed and in performance (caching/NUMA effects).
     294
     295C also provides non-contiguous arrays-of-arrays.
     296\begin{cfa}
     297int m[5][5];                                                    $\C{// contiguous}$
     298int * aa[5];                                                    $\C{// non-contiguous}$
     299\end{cfa}
     300both with different memory layout using the same subscripting, and both with different degrees of issues.
     301The focus of this work is on the contiguous multidimensional arrays in C.
     302The reason is that programmers are often forced to use the more complex array-of-array form when a contiguous array would be simpler, faster, and safer.
     303Nevertheless, the C array-of-array form continues to be useful for special circumstances.
     304
     305\VRef[Figure]{f:ContiguousNon-contiguous} shows the extensions made in C99 for manipulating contiguous \vs non-contiguous arrays.\footnote{C90 also supported non-contiguous arrays.}
     306First, VLAs are supported.
     307Second, for contiguous arrays, C99 conjoins one or more of the parameters as a downstream dimension(s), \eg @cols@, implicitly using this parameter to compute the row stride of @m@.
     308If the declaration of @fc@ is changed to:
     309\begin{cfa}
     310void fc( int rows, int cols, int m[@rows@][@cols@] ) ...
     311\end{cfa}
     312it is possible for C to perform bound checking across all subscripting, but it does not.
     313While this contiguous-array capability is a step forward, it is still the programmer's responsibility to manually manage the number of dimensions and their sizes, both at the function definition and call sites.
     314That is, the array does not automatically carry its structure and sizes for use in computing subscripts.
     315While the non-contiguous style in @faa@ looks very similar to @fc@, the compiler only understands the unknown-sized array of row pointers, and it relies on the programmer to traverse the columns in a row correctly.
     316Specifically, there is no requirement that the rows are the same length, like a poem with different length lines.
     317
     318\begin{figure}
     319\begin{tabular}{@{}ll@{}}
     320\multicolumn{1}{c}{\textbf{Contiguous}} & \multicolumn{1}{c}{\textbf{ Non-contiguous}} \\
     321\begin{cfa}
     322void fc( int rows, @int cols@, int m[ /* rows */ ][@cols@] ) {
     323        ...  printf( "%d ", @m[r][c]@ );  ...
     324}
     325int m@[5][5]@;
     326for ( int r = 0; r < 5; r += 1 ) {
     327
     328        for ( int c = 0; c < 5; c += 1 )
     329                m[r][c] = r + c;
     330}
     331fc( 5, 5, m );
     332\end{cfa}
     333&
     334\begin{cfa}
     335void faa( int rows, int cols, int * m[ @/* cols */@ ] ) {
     336        ...  printf( "%d ", @m[r][c]@ );  ...
     337}
     338int @* aa[5]@;  // row pointers
     339for ( int r = 0; r < 5; r += 1 ) {
     340        @aa[r] = malloc( 5 * sizeof(int) );@ // create rows
     341        for ( int c = 0; c < 5; c += 1 )
     342                aa[r][c] = r + c;
     343}
     344faa( 5, 5, aa );
     345\end{cfa}
     346\end{tabular}
     347\caption{C99 Contiguous \vs Non-contiguous Matrix Styles}
     348\label{f:ContiguousNon-contiguous}
     349\end{figure}
     350
     351
     352\subsection{Multidimensional array implementation}
     353
     354A multidimensional array implementation has three relevant levels of abstraction, from highest to lowest, where the array occupies \emph{contiguous memory}.
     355\begin{enumerate}
     356\item
     357Flexible-stride memory:
     358this model has complete independence between subscripting ordering and memory layout, offering the ability to slice by (provide an index for) any dimension, \eg slice a plane, row, or column, \eg @c[3][*][*]@, @c[3][4][*]@, @c[3][*][5]@.
     359\item
     360Fixed-stride memory:
     361this model binds the first subscript and the first memory layout dimension, offering the ability to slice by (provide an index for) only the coarsest dimension, @m[row][*]@ or @c[plane][*][*]@, \eg slice only by row (2D) or plane (3D).
     362After which, subscripting and memory layout are independent.
     363\item
     364Explicit-displacement memory:
     365this model has no awareness of dimensions just the ability to access memory at a distance from a reference point (base-displacement addressing), \eg @x + 23@ or @x[23}@ $\Rightarrow$ 23rd element from the start of @x@.
     366A programmer must manually build any notion of dimensions using other tools;
     367hence, this style is not offering multidimensional arrays \see{\VRef[Figure]{f:FixedVariable}}.
     368\end{enumerate}
     369
     370There is some debate as to whether the abstraction ordering goes $\{1, 2\} < 3$, rather than my numerically-ordering.
     371That is, styles 1 and 2 are at the same abstraction level, with 3 offering a limited set of functionality.
     372I chose to build the \CFA style-1 array upon a style-2 abstraction.
     373(Justification of the decision follows, after the description of the design.)
     374
     375Style 3 is the inevitable target of any array implementation.
     376The hardware offers this model to the C compiler, with bytes as the unit of displacement.
     377C offers this model to its programmer as pointer arithmetic, with arbitrary sizes as the unit.
     378Casting a multidimensional array as a single-dimensional array/pointer, then using @x[i]@ syntax to access its elements, is still a form of pointer arithmetic.
     379
     380Now stepping into the implementation of \CFA's new type-1 multidimensional arrays in terms of C's existing type-2 multidimensional arrays, it helps to clarify that even the interface is quite low-level.
     381A C/\CFA array interface includes the resulting memory layout.
     382The defining requirement of a type-2 system is the ability to slice a column from a column-finest matrix.
     383The required memory shape of such a slice is set, before any discussion of implementation.
     384The implementation presented here is how the \CFA array library wrangles the C type system, to make it do memory steps that are consistent with this layout.
     385TODO: do I have/need a presentation of just this layout, just the semantics of -[all]?
     386
     387The new \CFA standard library @array@ datatype supports richer multidimensional features than C.
     388The new array implementation follows C's contiguous approach, \ie @float [r][c]@, with one contiguous object subscripted by coarsely-strided dimensions directly wrapping finely-strided dimensions.
     389Beyond what C's array type offers, the new array brings direct support for working with a noncontiguous array slice, allowing a program to work with dimension subscripts given in a non-physical order.
     390
     391The following examples use an @array( float, 5, 7) m@, loaded with values incremented by $0.1$, when stepping across the length-7 finely-strided column dimension, and stepping across the length-5 coarsely-strided row dimension.
     392\par\noindent
     393\mbox{\lstinput{121-126}{hello-md.cfa}}
     394\par\noindent
     395The memory layout is 35 contiguous elements with strictly increasing addresses.
    184396
    185397A trivial form of slicing extracts a contiguous inner array, within an array-of-arrays.
    186 Like with the C array, a lesser-dimensional array reference can be bound to the result of subscripting a greater-dimensional array, by a prefix of its dimensions.
    187 This action first subscripts away the most coarsely strided dimensions, leaving a result that expects to be be subscripted by the more finely strided dimensions.
    188 \lstinput{60-66}{hello-md.cfa}
     398As for the C array, a lesser-dimensional array reference can be bound to the result of subscripting a greater-dimensional array by a prefix of its dimensions, \eg @m[2]@, giving the third row.
     399This action first subscripts away the most coarsely strided dimensions, leaving a result that expects to be subscripted by the more finely strided dimensions, \eg @m[2][3]@, giving the value @2.3@.
     400The following is an example slicing a row.
     401\lstinput{60-64}{hello-md.cfa}
    189402\lstinput[aboveskip=0pt]{140-140}{hello-md.cfa}
    190403
    191 This function declaration is asserting too much knowledge about its parameter @c@, for it to be usable for printing either a row slice or a column slice.
    192 Specifically, declaring the parameter @c@ with type @array@ means that @c@ is contiguous.
    193 However, the function does not use this fact.
    194 For the function to do its job, @c@ need only be of a container type that offers a subscript operator (of type @ptrdiff_t@ $\rightarrow$ @float@), with managed length @N@.
     404However, function @print1d@ is asserting too much knowledge about its parameter @r@ for printing either a row slice or a column slice.
     405Specifically, declaring the parameter @r@ with type @array@ means that @r@ is contiguous, which is unnecessarily restrictive.
     406That is, @r@ need only be of a container type that offers a subscript operator (of type @ptrdiff_t@ $\rightarrow$ @float@) with managed length @N@.
    195407The new-array library provides the trait @ix@, so-defined.
    196 With it, the original declaration can be generalized, while still implemented with the same body, to the latter declaration:
    197 \lstinput{40-44}{hello-md.cfa}
     408With it, the original declaration can be generalized with the same body.
     409\lstinput{43-44}{hello-md.cfa}
    198410\lstinput[aboveskip=0pt]{145-145}{hello-md.cfa}
    199 
    200 Nontrivial slicing, in this example, means passing a noncontiguous slice to @print1d@.
    201 The new-array library provides a ``subscript by all'' operation for this purpose.
    202 In a multi-dimensional subscript operation, any dimension given as @all@ is left ``not yet subscripted by a value,'' implementing the @ix@ trait, waiting for such a value.
     411The nontrivial slicing in this example now allows passing a \emph{noncontiguous} slice to @print1d@, where the new-array library provides a ``subscript by all'' operation for this purpose.
     412In a multi-dimensional subscript operation, any dimension given as @all@ is a placeholder, \ie ``not yet subscripted by a value'', waiting for such a value, implementing the @ix@ trait.
    203413\lstinput{150-151}{hello-md.cfa}
    204414
    205 The example has shown that @a[2]@ and @a[[2, all]]@ both refer to the same, ``2.*'' slice.
    206 Indeed, the various @print1d@ calls under discussion access the entry with value 2.3 as @a[2][3]@, @a[[2,all]][3]@, and @a[[all,3]][2]@.
    207 This design preserves (and extends) C array semantics by defining @a[[i,j]]@ to be @a[i][j]@ for numeric subscripts, but also for ``subscripting by all''.
     415The example shows @x[2]@ and @x[[2, all]]@ both refer to the same, ``2.*'' slice.
     416Indeed, the various @print1d@ calls under discussion access the entry with value @2.3@ as @x[2][3]@, @x[[2,all]][3]@, and @x[[all,3]][2]@.
     417This design preserves (and extends) C array semantics by defining @x[[i,j]]@ to be @x[i][j]@ for numeric subscripts, but also for ``subscripting by all''.
    208418That is:
    209 
    210 \begin{tabular}{cccccl}
    211 @a[[2,all]][3]@  &  $=$  &  @a[2][all][3]@  & $=$  &  @a[2][3]@  & (here, @all@ is redundant)  \\
    212 @a[[all,3]][2]@  &  $=$  &  @a[all][3][2]@  & $=$  &  @a[2][3]@  & (here, @all@ is effective)
     419\begin{cquote}
     420\begin{tabular}{@{}cccccl@{}}
     421@x[[2,all]][3]@ & $\equiv$      & @x[2][all][3]@  & $\equiv$    & @x[2][3]@  & (here, @all@ is redundant)  \\
     422@x[[all,3]][2]@ & $\equiv$      & @x[all][3][2]@  & $\equiv$    & @x[2][3]@  & (here, @all@ is effective)
    213423\end{tabular}
    214 
    215 Narrating progress through each of the @-[-][-][-]@ expressions gives, firstly, a definition of @-[all]@, and secondly, a generalization of C's @-[i]@.
    216 
    217 \noindent Where @all@ is redundant:
    218 
    219 \begin{tabular}{ll}
    220 @a@  & 2-dimensional, want subscripts for coarse then fine \\
    221 @a[2]@  & 1-dimensional, want subscript for fine; lock coarse = 2 \\
    222 @a[2][all]@  & 1-dimensional, want subscript for fine \\
    223 @a[2][all][3]@  & 0-dimensional; lock fine = 3
     424\end{cquote}
     425
     426Narrating progress through each of the @-[-][-][-]@\footnote{
     427The first ``\lstinline{-}'' is a variable expression and the remaining ``\lstinline{-}'' are subscript expressions.}
     428expressions gives, firstly, a definition of @-[all]@, and secondly, a generalization of C's @-[i]@.
     429Where @all@ is redundant:
     430\begin{cquote}
     431\begin{tabular}{@{}ll@{}}
     432@x@  & 2-dimensional, want subscripts for coarse then fine \\
     433@x[2]@  & 1-dimensional, want subscript for fine; lock coarse == 2 \\
     434@x[2][all]@  & 1-dimensional, want subscript for fine \\
     435@x[2][all][3]@  & 0-dimensional; lock fine == 3
    224436\end{tabular}
    225 
    226 \noindent Where @all@ is effective:
    227 
    228 \begin{tabular}{ll}
    229 @a@  & 2-dimensional, want subscripts for coarse then fine \\
    230 @a[all]@  & 2-dimensional, want subscripts for fine then coarse \\
    231 @a[all][3]@  & 1-dimensional, want subscript for coarse; lock fine = 3 \\
    232 @a[all][3][2]@  & 0-dimensional; lock coarse = 2
     437\end{cquote}
     438Where @all@ is effective:
     439\begin{cquote}
     440\begin{tabular}{@{}ll@{}}
     441@x@  & 2-dimensional, want subscripts for coarse then fine \\
     442@x[all]@  & 2-dimensional, want subscripts for fine then coarse \\
     443@x[all][3]@  & 1-dimensional, want subscript for coarse; lock fine == 3 \\
     444@x[all][3][2]@  & 0-dimensional; lock coarse == 2
    233445\end{tabular}
    234 
     446\end{cquote}
    235447The semantics of @-[all]@ is to dequeue from the front of the ``want subscripts'' list and re-enqueue at its back.
     448For example, in a two dimensional matrix, this semantics conceptually transposes the matrix by reversing the subscripts.
    236449The semantics of @-[i]@ is to dequeue from the front of the ``want subscripts'' list and lock its value to be @i@.
    237450
    238 Contiguous arrays, and slices of them, are all realized by the same underlying parameterized type.
    239 It includes stride information in its metatdata.
    240 The @-[all]@ operation is a conversion from a reference to one instantiation, to a reference to another instantiation.
     451Contiguous arrays, and slices of them, are all represented by the same underlying parameterized type, which includes stride information in its metatdata.
     452\PAB{Do not understand this sentence: The \lstinline{-[all]} operation is a conversion from a reference to one instantiation to a reference to another instantiation.}
    241453The running example's @all@-effective step, stated more concretely, is:
    242 
    243 \begin{tabular}{ll}
    244 @a@       & : 5 of ( 7 of float each spaced 1 float apart ) each spaced 7 floats apart \\
    245 @a[all]@  & : 7 of ( 5 of float each spaced 7 floats apart ) each spaced 1 float apart
     454\begin{cquote}
     455\begin{tabular}{@{}ll@{}}
     456@x@       & : 5 of ( 7 of @float@ each spaced 1 @float@ apart ) each spaced 7 @floats@ apart \\
     457@x[all]@  & : 7 of ( 5 of @float@ each spaced 7 @float@s apart ) each spaced 1 @float@ apart
    246458\end{tabular}
     459\end{cquote}
    247460
    248461\begin{figure}
    249462\includegraphics{measuring-like-layout}
    250 \caption{Visualization of subscripting, by numeric value, and by \lstinline[language=CFA]{all}.
    251         Here \lstinline[language=CFA]{x} has type \lstinline[language=CFA]{array( float, 5, 7 )}, understood as 5 rows by 7 columns.
    252         The horizontal layout represents contiguous memory addresses while the vertical layout uses artistic license.
    253         The vertical shaded band highlights the location of the targeted element, 2.3.
    254         Any such vertical contains various interpretations of a single address.}
     463\caption{Visualization of subscripting by value and by \lstinline[language=CFA]{all}, for \lstinline{x} of type \lstinline{array( float, 5, 7 )} understood as 5 rows by 7 columns.
     464The horizontal layout represents contiguous memory addresses while the vertical layout is conceptual.
     465The vertical shaded band highlights the location of the targeted element, 2.3.
     466Any such vertical slice contains various interpretations of a single address.}
    255467\label{fig:subscr-all}
    256468\end{figure}
    257 
    258 \noindent BEGIN: Paste looking for a home
    259 
    260 The world of multidimensional array implementation has, or abuts, four relevant levels of abstraction, highest to lowest:
    261 
    262 1, purpose:
    263 If you are doing linear algebra, you might call its dimensions, "column" and "row."
    264 If you are treating an acrostic poem as a grid of characters, you might say,
    265 the direction of reading the phrases vs the direction of reading the keyword.
    266 
    267 2, flexible-stride memory:
    268 assuming, from here on, a need to see/use contiguous memory,
    269 this model offers the ability to slice by (provide an index for) any dimension
    270 
    271 3, fixed-stride memory:
    272 this model offers the ability to slice by (provide an index for) only the coarsest dimension
    273 
    274 4, explicit-displacement memory:
    275 no awareness of dimensions, so no distinguishing them; just the ability to access memory at a distance from a reference point
    276 
    277 C offers style-3 arrays.  Fortran, Matlab and APL offer style-2 arrays.
    278 Offering style-2 implies offering style-3 as a sub-case.
    279 My CFA arrays are style-2.
    280 
    281 Some debate is reasonable as to whether the abstraction actually goes $ 1 < \{2, 3\} < 4 $,
    282 rather than my numerically-ordered chain.
    283 According to the diamond view, styles 2 and 3 are at the same abstraction level, just with 3 offering a more limited set of functionality.
    284 The chain view reflects the design decision I made in my mission to offer a style-2 abstraction;
    285 I chose to build it upon a style-3 abstraction.
    286 (Justification of the decision follows, after the description of the design.)
    287 
    288 The following discussion first dispenses with API styles 1 and 4, then elaborates on my work with styles 2 and 3.
    289 
    290 Style 1 is not a concern of array implementations.
    291 It concerns documentation and identifier choices of the purpose-specific API.
    292 If one is offering a matrix-multiply function, one must specify which dimension(s) is/are being summed over
    293 (or rely on the familiar convention of these being the first argument's rows and second argument's columns).
    294 Some libraries offer a style-1 abstraction that is not directly backed by a single array
    295 (e.g. make quadrants contiguous, as may help cache coherence during a parallel matrix multiply),
    296 but such designs are out of scope for a discussion on arrays; they are applications of several arrays.
    297 I typically include style-1 language with examples to help guide intuition.
    298 
    299 It is often said that C has row-major arrays while Fortran has column-major arrays.
    300 This comparison brings an unhelpful pollution of style-1 thinking into issues of array implementation.
    301 Unfortunately, ``-major'' has two senses: the program's order of presenting indices and the array's layout in memory.
    302 (The program's order could be either lexical, as in @x[1,2,3]@ subscripting, or runtime, as in the @x[1][2][3]@ version.)
    303 Style 2 is concerned with introducing a nontrivial relationship between program order and memory order,
    304 while style 3 sees program order identical with memory order.
    305 Both C and (the style-3 subset of) Fortran actually use the same relationship here:
    306 an earlier subscript in program order controls coarser steps in memory.
    307 The job of a layer-2/3 system is to implement program-ordered subscripting according to a defined memory layout.
    308 C and Fortran do not use opposite orders in doing this job.
    309 Fortran is only ``backward'' in its layer-1 conventions for reading/writing and linear algebra.
    310 Fortran subscripts as $m(c,r)$.  When I use style-1 language, I am following the C/mathematical convention of $m(r,c)$.
    311 
    312 Style 4 is the inevitable target of any array implementation.
    313 The hardware offers this model to the C compiler, with bytes as the unit of displacement.
    314 C offers this model to its programmer as pointer arithmetic, with arbitrary sizes as the unit.
    315 I consider casting a multidimensional array as a single-dimensional array/pointer,
    316 then using @x[i]@ syntax to access its elements, to be a form of pointer arithmetic.
    317 But style 4 is not offering arrays.
    318 
    319 Now stepping into the implementation
    320 of CFA's new type-3 multidimensional arrays in terms of C's existing type-2 multidimensional arrays,
    321 it helps to clarify that even the interface is quite low-level.
    322 A C/CFA array interface includes the resulting memory layout.
    323 The defining requirement of a type-3 system is the ability to slice a column from a column-finest matrix.
    324 The required memory shape of such a slice is set, before any discussion of implementation.
    325 The implementation presented here is how the CFA array library wrangles the C type system,
    326 to make it do memory steps that are consistent with this layout.
    327 TODO: do I have/need a presentation of just this layout, just the semantics of -[all]?
    328469
    329470Figure~\ref{fig:subscr-all} shows one element (in the shaded band) accessed two different ways: as @x[2][3]@ and as @x[all][3][2]@.
     
    365506The subscript operator presents what's really inside, by casting to the type below the wedge of lie.
    366507
    367 %  Does x[all] have to lie too?  The picture currently glosses over how it it advertizes a size of 7 floats.  I'm leaving that as an edge case benignly misrepresented in the picture.  Edge cases only have to be handled right in the code.
     508%  Does x[all] have to lie too?  The picture currently glosses over how it it advertises a size of 7 floats.  I'm leaving that as an edge case benignly misrepresented in the picture.  Edge cases only have to be handled right in the code.
    368509
    369510Casting, overlapping and lying are unsafe.
     
    392533
    393534The @arpk@ structure and its @-[i]@ operator are thus defined as:
    394 \begin{lstlisting}
    395 forall( ztype(N),               // length of current dimension
    396         dtype(S) | sized(S),    // masquerading-as
    397         dtype E_im,             // immediate element, often another array
    398         dtype E_base            // base element, e.g. float, never array
    399  ) {
    400 struct arpk {
    401         S strides[N];           // so that sizeof(this) is N of S
    402 };
    403 
    404 // expose E_im, stride by S
    405 E_im & ?[?]( arpk(N, S, E_im, E_base) & a, ptrdiff_t i ) {
    406         return (E_im &) a.strides[i];
    407 }
    408 }
    409 \end{lstlisting}
     535\begin{cfa}
     536forall( ztype(N),                       $\C{// length of current dimension}$
     537        dtype(S) | sized(S),    $\C{// masquerading-as}$
     538        dtype E_im,                             $\C{// immediate element, often another array}$
     539        dtype E_base                    $\C{// base element, e.g. float, never array}$
     540) { // distribute forall to each element
     541        struct arpk {
     542                S strides[N];           $\C{// so that sizeof(this) is N of S}$
     543        };
     544        // expose E_im, stride by S
     545        E_im & ?[?]( arpk(N, S, E_im, E_base) & a, ptrdiff_t i ) {
     546                return (E_im &) a.strides[i];
     547        }
     548}
     549\end{cfa}
    410550
    411551An instantiation of the @arpk@ generic is given by the @array(E_base, N0, N1, ...)@ expansion, which is @arpk( N0, Rec, Rec, E_base )@, where @Rec@ is @array(E_base, N1, ...)@.
     
    464604
    465605\begin{tabular}{rl}
    466 C      &  @void f( size_t n, size_t m, float a[n][m] );@ \\
     606C      &  @void f( size_t n, size_t m, float x[n][m] );@ \\
    467607Java   &  @void f( float[][] a );@
    468608\end{tabular}
    469609
    470 Java's safety against undefined behaviour assures the callee that, if @a@ is non-null, then @a.length@ is a valid access (say, evaluating to the number $\ell$) and if @i@ is in $[0, \ell)$ then @a[i]@ is a valid access.
     610Java's safety against undefined behaviour assures the callee that, if @x@ is non-null, then @a.length@ is a valid access (say, evaluating to the number $\ell$) and if @i@ is in $[0, \ell)$ then @x[i]@ is a valid access.
    471611If a value of @i@ outside this range is used, a runtime error is guaranteed.
    472612In these respects, C offers no guarantees at all.
    473 Notably, the suggestion that @n@ is the intended size of the first dimension of @a@ is documentation only.
    474 Indeed, many might prefer the technically equivalent declarations @float a[][m]@ or @float (*a)[m]@ as emphasizing the ``no guarantees'' nature of an infrequently used language feature, over using the opportunity to explain a programmer intention.
    475 Moreover, even if @a[0][0]@ is valid for the purpose intended, C's basic infamous feature is the possibility of an @i@, such that @a[i][0]@ is not valid for the same purpose, and yet, its evaluation does not produce an error.
     613Notably, the suggestion that @n@ is the intended size of the first dimension of @x@ is documentation only.
     614Indeed, many might prefer the technically equivalent declarations @float x[][m]@ or @float (*a)[m]@ as emphasizing the ``no guarantees'' nature of an infrequently used language feature, over using the opportunity to explain a programmer intention.
     615Moreover, even if @x[0][0]@ is valid for the purpose intended, C's basic infamous feature is the possibility of an @i@, such that @x[i][0]@ is not valid for the same purpose, and yet, its evaluation does not produce an error.
    476616
    477617Java's lack of expressiveness for more applied properties means these outcomes are possible:
    478618\begin{itemize}
    479 \item @a[0][17]@ and @a[2][17]@ are valid accesses, yet @a[1][17]@ is a runtime error, because @a[1]@ is a null pointer
    480 \item the same observation, now because @a[1]@ refers to an array of length 5
    481 \item execution times vary, because the @float@ values within @a@ are sometimes stored nearly contiguously, and other times, not at all
     619\item @x[0][17]@ and @x[2][17]@ are valid accesses, yet @x[1][17]@ is a runtime error, because @x[1]@ is a null pointer
     620\item the same observation, now because @x[1]@ refers to an array of length 5
     621\item execution times vary, because the @float@ values within @x@ are sometimes stored nearly contiguously, and other times, not at all
    482622\end{itemize}
    483623C's array has none of these limitations, nor do any of the ``array language'' comparators discussed in this section.
    484624
    485625This Java level of safety and expressiveness is also exemplified in the C family, with the commonly given advice [TODO:cite example], for C++ programmers to use @std::vector@ in place of the C++ language's array, which is essentially the C array.
    486 The advice is that, while a vector is also more powerful (and quirky) than an array, its capabilities include options to preallocate with an upfront size, to use an available bound-checked accessor (@a.at(i)@ in place of @a[i]@), to avoid using @push_back@, and to use a vector of vectors.
     626The advice is that, while a vector is also more powerful (and quirky) than an array, its capabilities include options to preallocate with an upfront size, to use an available bound-checked accessor (@a.at(i)@ in place of @x[i]@), to avoid using @push_back@, and to use a vector of vectors.
    487627Used with these restrictions, out-of-bound accesses are stopped, and in-bound accesses never exercise the vector's ability to grow, which is to say, they never make the program slow to reallocate and copy, and they never invalidate the program's other references to the contained values.
    488628Allowing this scheme the same referential integrity assumption that \CFA enjoys [TODO:xref], this scheme matches Java's safety and expressiveness exactly.
     
    532672Futhark restricts these expressions syntactically to variables and constants, while a full-strength dependent system does not.
    533673
    534 CFA's hybrid presentation, @forall( [N] )@, has @N@ belonging to the type system, yet has no instances.
     674\CFA's hybrid presentation, @forall( [N] )@, has @N@ belonging to the type system, yet has no instances.
    535675Belonging to the type system means it is inferred at a call site and communicated implicitly, like in Dex and unlike in Futhark.
    536676Having no instances means there is no type for a variable @i@ that constrains @i@ to be in the range for @N@, unlike Dex, [TODO: verify], but like Futhark.
     
    539679
    540680
    541 \section{Future Work}
     681\section{Future work}
    542682
    543683\subsection{Declaration syntax}
     
    553693I will not discuss the core of this project, which has a tall mission already, to improve type safety, maintain appropriate C compatibility and offer more flexibility about storage use.
    554694It also has a candidate stretch goal, to adapt \CFA's @forall@ generic system to communicate generalized enumerations:
    555 \begin{lstlisting}
     695\begin{cfa}
    556696forall( T | is_enum(T) )
    557697void show_in_context( T val ) {
     
    565705enum weekday { mon, tue, wed = 500, thu, fri };
    566706show_in_context( wed );
    567 \end{lstlisting}
     707\end{cfa}
    568708with output
    569 \begin{lstlisting}
     709\begin{cfa}
    570710mon
    571711tue < ready
     
    573713thu
    574714fri
    575 \end{lstlisting}
     715\end{cfa}
    576716The details in this presentation aren't meant to be taken too precisely as suggestions for how it should look in \CFA.
    577717But the example shows these abilities:
     
    599739The structural assumptions required for the domain of an array in Dex are given by the trait (there, ``interface'') @Ix@, which says that the parameter @n@ is a type (which could take an argument like @weekday@) that provides two-way conversion with the integers and a report on the number of values.
    600740Dex's @Ix@ is analogous the @is_enum@ proposed for \CFA above.
    601 \begin{lstlisting}
     741\begin{cfa}
    602742interface Ix n
    603  get_size n : Unit -> Int
    604  ordinal : n -> Int
    605  unsafe_from_ordinal n : Int -> n
    606 \end{lstlisting}
     743get_size n : Unit -> Int
     744ordinal : n -> Int
     745unsafe_from_ordinal n : Int -> n
     746\end{cfa}
    607747
    608748Dex uses this foundation of a trait (as an array type's domain) to achieve polymorphism over shapes.
     
    616756In the photograph instantiation, it's the tuple-type of $ \langle \mathrm{img\_wd}, \mathrm{img\_ht} \rangle $.
    617757This use of a tuple-as-index is made possible by the built-in rule for implementing @Ix@ on a pair, given @Ix@ implementations for its elements
    618 \begin{lstlisting}
     758\begin{cfa}
    619759instance {a b} [Ix a, Ix b] Ix (a & b)
    620  get_size = \(). size a * size b
    621  ordinal = \(i, j). (ordinal i * size b) + ordinal j
    622  unsafe_from_ordinal = \o.
     760get_size = \(). size a * size b
     761ordinal = \(i, j). (ordinal i * size b) + ordinal j
     762unsafe_from_ordinal = \o.
    623763bs = size b
    624764(unsafe_from_ordinal a (idiv o bs), unsafe_from_ordinal b (rem o bs))
    625 \end{lstlisting}
     765\end{cfa}
    626766and by a user-provided adapter expression at the call site that shows how to indexing with a tuple is backed by indexing each dimension at a time
    627 \begin{lstlisting}
     767\begin{cfa}
    628768img_trans :: (img_wd,img_ht)=>Real
    629769img_trans.(i,j) = img.i.j
    630770result = pairwise img_trans
    631 \end{lstlisting}
     771\end{cfa}
    632772[TODO: cite as simplification of example from https://openreview.net/pdf?id=rJxd7vsWPS section 4]
    633773
     
    661801void * malloc( size_t );
    662802// C, user
    663 struct tm * el1 = malloc(      sizeof(struct tm) );
     803struct tm * el1 = malloc( sizeof(struct tm) );
    664804struct tm * ar1 = malloc( 10 * sizeof(struct tm) );
    665805
Note: See TracChangeset for help on using the changeset viewer.