Changeset 5a553e2 for doc/theses/mike_brooks_MMath/array.tex
- Timestamp:
- Jul 8, 2024, 9:01:16 AM (4 months ago)
- Branches:
- master
- Children:
- e26a842
- Parents:
- 1e110bf
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/theses/mike_brooks_MMath/array.tex
r1e110bf r5a553e2 2 2 \label{c:Array} 3 3 4 4 5 \section{Introduction} 5 6 6 This chapter describes my contribution of language and library features that provide a length-checked array type, as in: 7 8 \begin{lstlisting} 9 array(float, 99) x; // x contains 99 floats 10 11 void f( array(float, 42) & a ) {} 12 f(x); // statically rejected: types are different 7 Arrays in C are possible the single most misunderstood and incorrectly used features in the language, resulting in the largest proportion of runtime errors and security violations. 8 This chapter describes the new \CFA language and library features that introduce a length-checked array-type to the \CFA standard library~\cite{Cforall}, \eg: 9 \begin{cfa} 10 @array( float, 99 )@ x; $\C{// x contains 99 floats}$ 11 void f( @array( float, 42 )@ & p ) {} $\C{// p accepts 42 floats}$ 12 f( x ); $\C{// statically rejected: types are different, 99 != 42}$ 13 13 14 14 forall( T, [N] ) 15 void g( array(T, N) & a, int i ) { 16 T elem = a[i]; // dynamically checked: requires 0 <= i < N 17 } 18 g(x, 0); // T is float, N is 99, succeeds 19 g(x, 1000); // T is float, N is 99, dynamic check fails 20 \end{lstlisting} 21 22 This example first declares @x@ a variable, whose type is an instantiation of the generic type named @array@, with arguments @float@ and @99@. 23 Next, it declares @f@ as a function that expects a length-42 array; the type system rejects the call's attempt to pass @x@ to @f@, because the lengths do not match. 24 Next, the @forall@ annotation on function @g@ introduces @T@ as a familiar type parameter and @N@ as a \emph{dimension} parameter, a new feature that represents a count of elements, as managed by the type system. 25 Because @g@ accepts any length of array; the type system accepts the calls' passing @x@ to @g@, inferring that this length is 99. 26 Just as the caller's code does not need to explain that @T@ is @float@, the safe capture and communication of the value @99@ occurs without programmer involvement. 27 In the case of the second call (which passes the value 1000 for @i@), within the body of @g@, the attempt to subscript @a@ by @i@ fails with a runtime error, since $@i@ \nless @N@$. 28 29 The type @array@, as seen above, comes from my additions to the \CFA standard library. 30 It is very similar to the built-in array type, which \CFA inherits from C. 15 void g( @array( T, N )@ & p, int i ) { 16 T elem = p[i]; $\C{// dynamically checked: requires 0 <= i < N}$ 17 } 18 g( x, 0 ); $\C{// T is float, N is 99, dynamic subscript check succeeds}$ 19 g( x, 1000 ); $\C{// T is float, N is 99, dynamic subscript check fails}$ 20 \end{cfa} 21 This example declares variable @x@, with generic type @array@ using arguments @float@ and @99@. 22 Function @f@ is declared with an @array@ parameter of length @42@. 23 The call @f( x )@ is invalid because the @array@ lengths @99@ and @42@ do not match. 24 Next, function @g@ introduces a @forall@ prefix on type parameter @T@ and arbitrary \emph{dimension parameter} @N@, the new feature that represents a count of elements managed by the type system. 25 The call @g( x, 0 )@ is valid because @g@ accepts any length of array, where the type system infers @float@ for @T@ and length @99@ for @N@. 26 Inferring values for @T@ and @N@ is implicit without programmer involvement. 27 Furthermore, the runtime subscript @x[0]@ (parameter @i@ is @0@) in @g@ is valid because @0@ is in the dimension range $[0,99)$ of argument @x@. 28 The call @g( x, 1000 )@ is also valid; 29 however, the runtime subscript @x[1000]@ is invalid (generates a subscript error) because @1000@ is outside the dimension range $[0,99)$ of argument @x@. 30 31 The generic @array@ type is similar to the C array type, which \CFA inherits from C. 31 32 Its runtime characteristics are often identical, and some features are available in both. 32 33 \begin{ lstlisting}33 For example, assume a caller can instantiates @N@ with 42 in the following (details to follow). 34 \begin{cfa} 34 35 forall( [N] ) 35 36 void declDemo() { 36 float a1[N]; // built-in type ("C array") 37 array(float, N) a2; // type from library 38 } 39 \end{lstlisting} 40 41 If a caller instantiates @N@ with 42, then both locally-declared array variables, @a1@ and @a2@, become arrays of 42 elements, each element being a @float@. 42 The two variables have identical size and layout; they both encapsulate 42-float stack allocations, no heap allocations, and no further "bookkeeping" allocations/header. 43 Having the @array@ library type (that of @a2@) is a tactical measure, an early implementation that offers full feature support. 44 A future goal (TODO xref) is to port all of its features into the built-in array type (that of @a1@); then, the library type could be removed, and \CFA would have only one array type. 45 In present state, the built-in array has partial support for the new features. 46 The fully-featured library type is used exclusively in introductory examples; feature support and C compatibility are revisited in sec TODO. 47 48 Offering the @array@ type, as a distinct alternative from the the C array, is consistent with \CFA's extension philosophy (TODO xref background) to date. 49 A few compatibility-breaking changes to the behaviour of the C array were also made, both as an implementation convenience, and as justified fixes to C's lax treatment. 50 51 The @array@ type is an opportunity to start from a clean slate and show a cohesive selection of features. 52 A clean slate was an important starting point because it meant not having to deal with every inherited complexity introduced in TODO xref background-array. 53 54 55 My contributions are 56 \begin{itemize} 57 \item a type system enhancement that lets polymorphic functions and generic types be parameterized by a numeric value: @forall( [N] )@ 37 float x1[N]; $\C{// built-in type ("C array")}$ 38 array(float, N) x2; $\C{// type from library}$ 39 } 40 \end{cfa} 41 Both of the locally-declared array variables, @x1@ and @x2@, have 42 elements, each element being a @float@. 42 The two variables have identical size and layout; they both encapsulate 42-float, stack \vs heap allocations with no additional ``bookkeeping'' allocations or headers. 43 Providing this explicit generic approach required a significant extension to the \CFA type system to support a full-feature, safe, efficient (space and time) array-type, which forms the foundation for more complex array forms in \CFA. 44 45 Admittedly, the @array@ library type (type for @x2@) is syntactically different from its C counterpart. 46 A future goal (TODO xref) is to provide a built-in array type with syntax approaching C's (type for @x1@); 47 then, the library @array@ type can be removed giving \CFA a largely uniform array type. 48 At present, the built-in array is only partially supported, so the generic @array@ is used exclusively in the discussion; 49 feature support and C compatibility are revisited in Section ? TODO. 50 51 Offering an @array@ type, as a distinct alternative to the C array, is consistent with \CFA's goal of backwards compatibility, \ie virtually all existing C (gcc) programs can be compiled by \CFA with only a small number of changes, similar to \CC (g++). 52 However, a few compatibility-breaking changes to the behaviour of the C array are necessary, both as an implementation convenience and to fix C's lax treatment of arrays. 53 Hence, the @array@ type is an opportunity to start from a clean slate and show a cohesive selection of features, making it unnecessary to deal with every inherited complexity introduced by the C array TODO xref. 54 55 My contributions are: 56 \begin{enumerate} 57 \item A type system enhancement that lets polymorphic functions and generic types be parameterized by a numeric value: @forall( [N] )@. 58 \item Provide a length-checked array-type in the \CFA standard library, where the array's length is statically managed and dynamically valued. 59 \item Provide argument/parameter passing safety for arrays and subscript safety. 58 60 \item TODO: general parking... 59 \item identify specific abilities brought by @array@60 \item Where there is a gap concerning this feature's readiness for prime-time, identification of specific workable improvements that are likely to close the gap 61 \end{ itemize}61 \item Identify the interesting specific abilities available by the new @array@ type. 62 \item Where there is a gap concerning this feature's readiness for prime-time, identification of specific workable improvements that are likely to close the gap. 63 \end{enumerate} 62 64 63 65 … … 68 70 69 71 70 \section{Features Added} 71 72 The present work adds a type @array@ to the \CFA standard library~\cite{Cforall}. 73 74 This array's length is statically managed and dynamically valued. 75 This static management achieves argument safety and suggests a path to subscript safety as future work (TODO: cross reference). 76 77 This section presents motivating examples of the new array type's usage and follows up with definitions of the notations that appear. 78 79 The core of the new array management is tracking all array lengths in the type system. 80 Dynamically valued lengths are represented using type variables. 81 The stratification of type variables preceding object declarations makes a length referenceable everywhere that it is needed. 72 \section{Features added} 73 74 This section presents motivating examples for the new array type, demonstrating the syntax and semantics of the generic @array@. 75 As stated, the core capability of the new array is tracking all dimensions in the type system, where dynamic dimensions are represented using type variables. 76 77 The definition of type variables preceding object declarations makes the array dimension lexically referenceable where it is needed. 82 78 For example, a declaration can share one length, @N@, among a pair of parameters and the return. 83 79 \lstinput{10-17}{hello-array.cfa} 84 80 Here, the function @f@ does a pointwise comparison, checking if each pair of numbers is within half a percent of each other, returning the answers in a newly allocated @bool@ array. 85 86 The array type uses the parameterized length information in its @sizeof@ determination, illustrated in the example's call to @alloc@. 87 That call requests an allocation of type @array(bool, N)@, which the type system deduces from the left-hand side of the initialization, into the return type of the @alloc@ call. 88 Preexisting \CFA behaviour is leveraged here, both in the return-type-only polymorphism, and the @sized(T)@-aware standard-library @alloc@ routine. 89 The new @array@ type plugs into this behaviour by implementing the @sized@/@sizeof@ assertion to have the intuitive meaning. 90 As a result, this design avoids an opportunity for programmer error by making the size/length communication to a called routine implicit, compared with C's @calloc@ (or the low-level \CFA analog @aalloc@), which take an explicit length parameter not managed by the type system. 91 92 \VRef[Figure]{f:fHarness} shows the harness to use the @f@ function illustrating how dynamic values are fed into the system. 93 Here, the @a@ array is loaded with decreasing values, and the @b@ array with amounts off by a constant, giving relative differences within tolerance at first and out of tolerance later. 94 The program main is run with two different inputs of sequence length. 81 The dynamic allocation of the @ret@ array by @alloc@ uses the parameterized dimension information in its implicit @_Alignof@ and @sizeof@ determinations, and casting the return type. 82 \begin{cfa} 83 static inline forall( T & | sized(T) ) 84 T * alloc( size_t dim ) { 85 if ( _Alignof(T) <= libAlign() ) return (T *)aalloc( dim, sizeof(T) ); // calloc without zero fill 86 else return (T *)amemalign( _Alignof(T), dim, sizeof(T) ); // array memalign 87 } 88 \end{cfa} 89 Here, the type system deduces from the left-hand side of the assignment the type @array(bool, N)@, and forwards it as the type variable @T@ for the generic @alloc@ function, making it available in its body. 90 Hence, preexisting \CFA behaviour is leveraged here, both in the return-type polymorphism, and the @sized(T)@-aware standard-library @alloc@ routine. 91 This example illustrates how the new @array@ type plugs into existing \CFA behaviour by implementing necessary @sized@ assertions needed by other types. 92 (@sized@ implies a concrete \vs abstract type with a compile-time size.) 93 As a result, there is significant programming safety by making the size accessible and implicit, compared with C's @calloc@ and non-array supporting @memalign@, which take an explicit length parameter not managed by the type system. 95 94 96 95 \begin{figure} 97 \lstinput{30-49}{hello-array.cfa} 96 \lstinput{30-43}{hello-array.cfa} 97 \lstinput{45-48}{hello-array.cfa} 98 98 \caption{\lstinline{f} Harness} 99 99 \label{f:fHarness} 100 100 \end{figure} 101 101 102 The loops in the program main follow the more familiar pattern of using the ordinary variable @n@ to convey the length. 103 The type system implicitly captures this value at the call site (@main@ calling @f@) and makes it available within the callee (@f@'s loop bound). 104 105 The two parts of the example show @n@ adapting a variable into a type-system managed length (at @main@'s declarations of @a@, @b@, and @result@), @N@ adapting in the opposite direction (at @f@'s loop bound), and a pass-thru use of a managed length (at @f@'s declaration of @ret@). 106 107 The @forall( ...[N] )@ participates in the user-relevant declaration of the name @N@, which becomes usable in parameter/return declarations and in the function @b@. 108 The present form is chosen to parallel the existing @forall@ forms: 109 \begin{cfa} 110 forall( @[N]@ ) ... // array kind 111 forall( & T ) ... // reference kind (dtype) 112 forall( T ) ... // value kind (otype) 113 \end{cfa} 114 115 The notation @array(thing, N)@ is a single-dimensional case, giving a generic type instance. 102 \VRef[Figure]{f:fHarness} shows a harness that uses the @f@ function illustrating how dynamic values are fed into the @array@ type. 103 Here, the dimension of the @x@, @y@, and @result@ arrays is specified from a command-line value and these arrays are allocated on the stack. 104 Then the @x@ array is initialized with decreasing values, and the @y@ array with amounts offset by constant @0.005@, giving relative differences within tolerance initially and diverging for later values. 105 The program main is run (see figure bottom) with inputs @5@ and @7@ for sequence lengths. 106 The loops follow the familiar pattern of using the variable @n@ to iterate through the arrays. 107 Most importantly, the type system implicitly captures @n@ at the call of @f@ and makes it available throughout @f@ as @N@. 108 The example shows @n@ adapting into a type-system managed length at the declarations of @x@, @y@, and @result@, @N@ adapting in the same way at @f@'s loop bound, and a pass-thru use of @n@ at @f@'s declaration of @ret@. 109 Except for the lifetime-management issue of @result@, \ie explicit @free@, this program has eliminated both the syntactic and semantic problems associated with C arrays and their usage. 110 These benefits cannot be underestimated. 111 112 In general, the @forall( ..., [N] )@ participates in the user-relevant declaration of the name @N@, which becomes usable in parameter/return declarations and within a function. 113 The syntactic form is chosen to parallel other @forall@ forms: 114 \begin{cfa} 115 forall( @[N]@ ) ... $\C[1.5in]{// array kind}$ 116 forall( T & ) ... $\C{// reference kind (dtype)}$ 117 forall( T ) ... $\C{// value kind (otype)}\CRT$ 118 \end{cfa} 119 % The notation @array(thing, N)@ is a single-dimensional case, giving a generic type instance. 116 120 In summary: 117 121 \begin{itemize} 118 122 \item 119 @[N]@ -- within a forall, declares the type variable @N@ to be a managed length120 \item 121 $e$ -- a type representing the value of $e$ as a managed length, where $e$ is a @size_t@-typed expression 122 \item 123 N -- an expression of type @size_t@, whose value is the managed length @N@ 124 \item 125 @array( thing, N0, N1, ... )@ -- a type wrapping $\prod_i N_i$ adjacent occurrences of @thing@ objects123 @[N]@ within a forall declares the type variable @N@ to be a managed length. 124 \item 125 The type of @N@ within code is @size_t@. 126 \item 127 The value of @N@ within code is the acquired length derived from the usage site, \ie generic declaration or function call. 128 \item 129 @array( thing, N0, N1, ... )@ is a multi-dimensional type wrapping $\prod_i N_i$ adjacent occurrences of @thing@ objects. 126 130 \end{itemize} 127 Unsigned integers have a special status in this type system. 128 Unlike how C++ allows 131 132 \VRef[Figure]{f:TemplateVsGenericType} shows @N@ is not the same as a @size_t@ declaration in a \CC \lstinline[language=C++]{template}. 133 \begin{enumerate}[leftmargin=*] 134 \item 135 The \CC template @N@ is a compile-time value, while the \CFA @N@ is a runtime value. 136 \item 137 The \CC template @N@ must be passed explicitly at the call, unless @N@ has a default value, even when \CC can deduct the type of @T@. 138 The \CFA @N@ is part of the array type and passed implicitly at the call. 139 \item 140 \CC cannot have an array of references, but can have an array of pointers. 141 \CC has a (mistaken) belief that references are not objects, but pointers are objects. 142 In the \CC example, the arrays fall back on C arrays, which have a duality with references with respect to automatic dereferencing. 143 The \CFA array is a contiguous object with an address, which can stored as a reference or pointer. 144 \item 145 C/\CC arrays cannot be copied, while \CFA arrays can be copied, making them a first-class object (although array copy is often avoided for efficiency). 146 \end{enumerate} 147 148 \begin{figure} 149 \begin{tabular}{@{}l@{\hspace{20pt}}l@{}} 129 150 \begin{c++} 130 template< size_t N, char * msg, typename T >... // declarations 151 152 @template< typename T, size_t N >@ 153 void copy( T ret[N], T x[N] ) { 154 for ( int i = 0; i < N; i += 1 ) ret[i] = x[i]; 155 } 156 int main() { 157 int ret[10], x[10]; 158 for ( int i = 0; i < 10; i += 1 ) x[i] = i; 159 @copy<int, 10 >( ret, x );@ 160 for ( int i = 0; i < 10; i += 1 ) 161 cout << ret[i] << ' '; 162 cout << endl; 163 } 131 164 \end{c++} 132 \CFA does not accommodate values of any user-provided type. 133 TODO: discuss connection with dependent types. 134 An example of a type error demonstrates argument safety. 135 The running example has @f@ expecting two arrays of the same length. 136 A compile-time error occurs when attempting to call @f@ with arrays whose lengths may differ. 137 \begin{cfa} 138 forall( [M], [N] ) 139 void bad( array(float, M) &a, array(float, N) &b ) { 140 f( a, a ); // ok 141 f( b, b ); // ok 142 f( a, b ); // error 143 } 144 \end{cfa} 145 %\lstinput{60-65}{hello-array.cfa} 146 As is common practice in C, the programmer is free to cast, to assert knowledge not shared with the type system. 147 \begin{cfa} 148 forall( [M], [N] ) 149 void bad_fixed( array(float, M) & a, array(float, N) & b ) { 150 if ( M == N ) { 151 f( a, (array(float, M) &)b ); // cast b to matching type 165 & 166 \begin{cfa} 167 int main() { 168 @forall( T, [N] )@ // nested function 169 void copy( array( T, N ) & ret, array( T, N ) & x ) { 170 for ( i; 10 ) ret[i] = x[i]; 152 171 } 153 } 154 \end{cfa} 155 %\lstinput{70-75}{hello-array.cfa} 156 157 Argument safety and the associated implicit communication of array length work with \CFA's generic types too. 158 \CFA allows aggregate types to be generalized with multiple type parameters, including parameterized element type, so can it be defined over a parameterized length. 159 Doing so gives a refinement of C's ``flexible array member'' pattern, that allows nesting structures with array members anywhere within other structures. 160 \lstinput{10-16}{hello-accordion.cfa} 161 This structure's layout has the starting offset of @cost_contribs@ varying in @Nclients@, and the offset of @total_cost@ varying in both generic parameters. 162 For a function that operates on a @request@ structure, the type system handles this variation transparently. 163 \lstinput{40-47}{hello-accordion.cfa} 164 In the example, different runs of the program result in different offset values being used. 165 \lstinput{60-76}{hello-accordion.cfa} 172 173 array( int, 10 ) ret, x; 174 for ( i; 10 ) x[i] = i; 175 @copy( ret, x );@ 176 for ( i; 10 ) 177 sout | ret[i] | nonl; 178 sout | nl; 179 } 180 \end{cfa} 181 \end{tabular} 182 \caption{\CC \lstinline[language=C++]{template} \vs \CFA generic type } 183 \label{f:TemplateVsGenericType} 184 \end{figure} 185 186 Continuing the discussion of \VRef[Figure]{f:fHarness}, the example has @f@ expecting two arrays of the same length. 187 A compile-time error occurs when attempting to call @f@ with arrays of differing lengths. 188 % removing leading whitespace 189 \lstinput[tabsize=1]{52-53}{hello-array.cfa} 190 \lstinput[tabsize=1,aboveskip=0pt]{62-64}{hello-array.cfa} 191 As is common practice in C, the programmer is free to cast, \ie to assert knowledge not shared with the type system. 192 \lstinput{70-74}{hello-array.cfa} 193 194 Orthogonally, the new @array@ type works with \CFA's generic types, providing argument safety and the associated implicit communication of array length. 195 Specifically, \CFA allows aggregate types to be generalized with multiple type parameters, including parameterized element types and lengths. 196 Doing so gives a refinement of C's ``flexible array member'' pattern, allowing nesting structures with array members anywhere within other structures. 197 \lstinput{10-15}{hello-accordion.cfa} 198 This structure's layout has the starting offset of @municipalities@ varying in @NprovTerty@, and the offset of @total_pt@ and @total_mun@ varying in both generic parameters. 199 For a function that operates on a @CanadaPop@ structure, the type system handles this variation transparently. 200 \lstinput{40-45}{hello-accordion.cfa} 201 \VRef[Figure]{f:checkHarness} shows program results where different offset values being used. 166 202 The output values show that @summarize@ and its caller agree on both the offsets (where the callee starts reading @cost_contribs@ and where the callee writes @total_cost@). 167 Yet the call site still says just, ``pass the request.'' 168 169 170 \section{Multidimensional implementation} 203 Yet the call site just says, ``pass the request.'' 204 205 \begin{figure} 206 \lstinput{60-68}{hello-accordion.cfa} 207 \lstinput{70-72}{hello-accordion.cfa} 208 \caption{\lstinline{check} Harness} 209 \label{f:checkHarness} 210 \end{figure} 211 212 213 \section{Multidimensional Arrays} 171 214 \label{toc:mdimpl} 172 215 173 TODO: introduce multidimensional array feature and approaches 174 175 The new \CFA standard library @array@ datatype supports multidimensional uses more richly than the C array. 176 The new array's multidimensional interface and implementation, follows an array-of-arrays setup, meaning, like C's @float[n][m]@ type, one contiguous object, with coarsely-strided dimensions directly wrapping finely-strided dimensions. 177 This setup is in contrast with the pattern of array of pointers to other allocations representing a sub-array. 178 Beyond what C's type offers, the new array brings direct support for working with a noncontiguous array slice, allowing a program to work with dimension subscripts given in a non-physical order. 179 C and C++ require a programmer with such a need to manage pointer/offset arithmetic manually. 180 181 Examples are shown using a $5 \times 7$ float array, @a@, loaded with increments of $0.1$ when stepping across the length-7 finely-strided dimension shown on columns, and with increments of $1.0$ when stepping across the length-5 coarsely-strided dimension shown on rows. 182 %\lstinput{120-126}{hello-md.cfa} 183 The memory layout of @a@ has strictly increasing numbers along its 35 contiguous positions. 216 % TODO: introduce multidimensional array feature and approaches 217 218 When working with arrays, \eg linear algebra, array dimensions are referred to as ``rows'' and ``columns'' for a matrix, adding ``planes'' for a cube. 219 (There is little terminology for higher dimensional arrays.) 220 For example, an acrostic poem\footnote{A type of poetry where the first, last or other letters in a line spell out a particular word or phrase in a vertical column.} 221 can be treated as a grid of characters, where the rows are the text and the columns are the embedded keyword(s). 222 Within a poem, there is the concept of a \newterm{slice}, \eg a row is a slice for the poem text, a column is a slice for a keyword. 223 In general, the dimensioning and subscripting for multidimensional arrays has two syntactic forms: @m[r,c]@ or @m[r][c]@. 224 225 Commonly, an array, matrix, or cube, is visualized (especially in mathematics) as a contiguous row, rectangle, or block. 226 This conceptualization is reenforced by subscript ordering, \eg $m_{r,c}$ for a matrix and $c_{p,r,c}$ for a cube. 227 Few programming languages differ from the mathematical subscript ordering. 228 However, computer memory is flat, and hence, array forms are structured in memory as appropriate for the runtime system. 229 The closest representation to the conceptual visualization is for an array object to be contiguous, and the language structures this memory using pointer arithmetic to access the values using various subscripts. 230 This approach still has degrees of layout freedom, such as row or column major order, \ie juxtaposed rows or columns in memory, even when the subscript order remains fixed. 231 For example, programming languages like MATLAB, Fortran, Julia and R store matrices in column-major order since they are commonly used for processing column-vectors in tabular data sets but retain row-major subscripting. 232 In general, storage layout is hidden by subscripting, and only appears when passing arrays among different programming languages or accessing specific hardware. 233 234 \VRef[Figure]{f:FixedVariable} shows two C90 approaches for manipulating contiguous arrays. 235 Note, C90 does not support VLAs. 236 The fixed-dimension approach uses the type system; 237 however, it requires all dimensions except the first to be specified at compile time, \eg @m[][6]@, allowing all subscripting stride calculations to be generated with constants. 238 Hence, every matrix passed to @fp1@ must have exactly 6 columns but the row size can vary. 239 The variable-dimension approach ignores (violates) the type system, \ie argument and parameters types do not match, and manually performs pointer arithmetic for subscripting in the macro @sub@. 240 241 \begin{figure} 242 \begin{tabular}{@{}l@{\hspace{40pt}}l@{}} 243 \multicolumn{1}{c}{\textbf{Fixed Dimension}} & \multicolumn{1}{c}{\textbf{Variable Dimension}} \\ 244 \begin{cfa} 245 246 void fp1( int rows, int m[][@6@] ) { 247 ... printf( "%d ", @m[r][c]@ ); ... 248 } 249 int fm1[4][@6@], fm2[6][@6@]; // no VLA 250 // initialize matrixes 251 fp1( 4, fm1 ); // implicit 6 columns 252 fp1( 6, fm2 ); 253 \end{cfa} 254 & 255 \begin{cfa} 256 #define sub( m, r, c ) *(m + r * sizeof( m[0] ) + c) 257 void fp2( int rows, int cols, int *m ) { 258 ... printf( "%d ", @sub( m, r, c )@ ); ... 259 } 260 int vm1[4][4], vm2[6][8]; // no VLA 261 // initialize matrixes 262 fp2( 4, 4, vm1 ); 263 fp2( 6, 8, vm2 ); 264 \end{cfa} 265 \end{tabular} 266 \caption{C90 Fixed \vs Variable Contiguous Matrix Styles} 267 \label{f:FixedVariable} 268 \end{figure} 269 270 Many languages allow multidimensional arrays-of-arrays, \eg in Pascal or \CC. 271 \begin{cquote} 272 \begin{tabular}{@{}ll@{}} 273 \begin{pascal} 274 var m : array[0..4, 0..4] of Integer; (* matrix *) 275 type AT = array[0..4] of Integer; (* array type *) 276 type MT = array[0..4] of AT; (* array of array type *) 277 var aa : MT; (* array of array variable *) 278 m@[1][2]@ := 1; aa@[1][2]@ := 1 (* same subscripting *) 279 \end{pascal} 280 & 281 \begin{c++} 282 int m[5][5]; 283 284 typedef vector< vector<int> > MT; 285 MT vm( 5, vector<int>( 5 ) ); 286 m@[1][2]@ = 1; aa@[1][2]@ = 1; 287 \end{c++} 288 \end{tabular} 289 \end{cquote} 290 The language decides if the matrix and array-of-array are laid out the same or differently. 291 For example, an array-of-array may be an array of row pointers to arrays of columns, so the rows may not be contiguous in memory nor even the same length (triangular matrix). 292 Regardless, there is usually a uniform subscripting syntax masking the memory layout, even though the two array forms could be differentiated at the subscript level, \eg @m[1,2]@ \vs @aa[1][2]@. 293 Nevertheless, controlling memory layout can make a difference in what operations are allowed and in performance (caching/NUMA effects). 294 295 C also provides non-contiguous arrays-of-arrays. 296 \begin{cfa} 297 int m[5][5]; $\C{// contiguous}$ 298 int * aa[5]; $\C{// non-contiguous}$ 299 \end{cfa} 300 both with different memory layout using the same subscripting, and both with different degrees of issues. 301 The focus of this work is on the contiguous multidimensional arrays in C. 302 The reason is that programmers are often forced to use the more complex array-of-array form when a contiguous array would be simpler, faster, and safer. 303 Nevertheless, the C array-of-array form continues to be useful for special circumstances. 304 305 \VRef[Figure]{f:ContiguousNon-contiguous} shows the extensions made in C99 for manipulating contiguous \vs non-contiguous arrays.\footnote{C90 also supported non-contiguous arrays.} 306 First, VLAs are supported. 307 Second, for contiguous arrays, C99 conjoins one or more of the parameters as a downstream dimension(s), \eg @cols@, implicitly using this parameter to compute the row stride of @m@. 308 If the declaration of @fc@ is changed to: 309 \begin{cfa} 310 void fc( int rows, int cols, int m[@rows@][@cols@] ) ... 311 \end{cfa} 312 it is possible for C to perform bound checking across all subscripting, but it does not. 313 While this contiguous-array capability is a step forward, it is still the programmer's responsibility to manually manage the number of dimensions and their sizes, both at the function definition and call sites. 314 That is, the array does not automatically carry its structure and sizes for use in computing subscripts. 315 While the non-contiguous style in @faa@ looks very similar to @fc@, the compiler only understands the unknown-sized array of row pointers, and it relies on the programmer to traverse the columns in a row correctly. 316 Specifically, there is no requirement that the rows are the same length, like a poem with different length lines. 317 318 \begin{figure} 319 \begin{tabular}{@{}ll@{}} 320 \multicolumn{1}{c}{\textbf{Contiguous}} & \multicolumn{1}{c}{\textbf{ Non-contiguous}} \\ 321 \begin{cfa} 322 void fc( int rows, @int cols@, int m[ /* rows */ ][@cols@] ) { 323 ... printf( "%d ", @m[r][c]@ ); ... 324 } 325 int m@[5][5]@; 326 for ( int r = 0; r < 5; r += 1 ) { 327 328 for ( int c = 0; c < 5; c += 1 ) 329 m[r][c] = r + c; 330 } 331 fc( 5, 5, m ); 332 \end{cfa} 333 & 334 \begin{cfa} 335 void faa( int rows, int cols, int * m[ @/* cols */@ ] ) { 336 ... printf( "%d ", @m[r][c]@ ); ... 337 } 338 int @* aa[5]@; // row pointers 339 for ( int r = 0; r < 5; r += 1 ) { 340 @aa[r] = malloc( 5 * sizeof(int) );@ // create rows 341 for ( int c = 0; c < 5; c += 1 ) 342 aa[r][c] = r + c; 343 } 344 faa( 5, 5, aa ); 345 \end{cfa} 346 \end{tabular} 347 \caption{C99 Contiguous \vs Non-contiguous Matrix Styles} 348 \label{f:ContiguousNon-contiguous} 349 \end{figure} 350 351 352 \subsection{Multidimensional array implementation} 353 354 A multidimensional array implementation has three relevant levels of abstraction, from highest to lowest, where the array occupies \emph{contiguous memory}. 355 \begin{enumerate} 356 \item 357 Flexible-stride memory: 358 this model has complete independence between subscripting ordering and memory layout, offering the ability to slice by (provide an index for) any dimension, \eg slice a plane, row, or column, \eg @c[3][*][*]@, @c[3][4][*]@, @c[3][*][5]@. 359 \item 360 Fixed-stride memory: 361 this model binds the first subscript and the first memory layout dimension, offering the ability to slice by (provide an index for) only the coarsest dimension, @m[row][*]@ or @c[plane][*][*]@, \eg slice only by row (2D) or plane (3D). 362 After which, subscripting and memory layout are independent. 363 \item 364 Explicit-displacement memory: 365 this model has no awareness of dimensions just the ability to access memory at a distance from a reference point (base-displacement addressing), \eg @x + 23@ or @x[23}@ $\Rightarrow$ 23rd element from the start of @x@. 366 A programmer must manually build any notion of dimensions using other tools; 367 hence, this style is not offering multidimensional arrays \see{\VRef[Figure]{f:FixedVariable}}. 368 \end{enumerate} 369 370 There is some debate as to whether the abstraction ordering goes $\{1, 2\} < 3$, rather than my numerically-ordering. 371 That is, styles 1 and 2 are at the same abstraction level, with 3 offering a limited set of functionality. 372 I chose to build the \CFA style-1 array upon a style-2 abstraction. 373 (Justification of the decision follows, after the description of the design.) 374 375 Style 3 is the inevitable target of any array implementation. 376 The hardware offers this model to the C compiler, with bytes as the unit of displacement. 377 C offers this model to its programmer as pointer arithmetic, with arbitrary sizes as the unit. 378 Casting a multidimensional array as a single-dimensional array/pointer, then using @x[i]@ syntax to access its elements, is still a form of pointer arithmetic. 379 380 Now stepping into the implementation of \CFA's new type-1 multidimensional arrays in terms of C's existing type-2 multidimensional arrays, it helps to clarify that even the interface is quite low-level. 381 A C/\CFA array interface includes the resulting memory layout. 382 The defining requirement of a type-2 system is the ability to slice a column from a column-finest matrix. 383 The required memory shape of such a slice is set, before any discussion of implementation. 384 The implementation presented here is how the \CFA array library wrangles the C type system, to make it do memory steps that are consistent with this layout. 385 TODO: do I have/need a presentation of just this layout, just the semantics of -[all]? 386 387 The new \CFA standard library @array@ datatype supports richer multidimensional features than C. 388 The new array implementation follows C's contiguous approach, \ie @float [r][c]@, with one contiguous object subscripted by coarsely-strided dimensions directly wrapping finely-strided dimensions. 389 Beyond what C's array type offers, the new array brings direct support for working with a noncontiguous array slice, allowing a program to work with dimension subscripts given in a non-physical order. 390 391 The following examples use an @array( float, 5, 7) m@, loaded with values incremented by $0.1$, when stepping across the length-7 finely-strided column dimension, and stepping across the length-5 coarsely-strided row dimension. 392 \par\noindent 393 \mbox{\lstinput{121-126}{hello-md.cfa}} 394 \par\noindent 395 The memory layout is 35 contiguous elements with strictly increasing addresses. 184 396 185 397 A trivial form of slicing extracts a contiguous inner array, within an array-of-arrays. 186 Like with the C array, a lesser-dimensional array reference can be bound to the result of subscripting a greater-dimensional array, by a prefix of its dimensions. 187 This action first subscripts away the most coarsely strided dimensions, leaving a result that expects to be be subscripted by the more finely strided dimensions. 188 \lstinput{60-66}{hello-md.cfa} 398 As for the C array, a lesser-dimensional array reference can be bound to the result of subscripting a greater-dimensional array by a prefix of its dimensions, \eg @m[2]@, giving the third row. 399 This action first subscripts away the most coarsely strided dimensions, leaving a result that expects to be subscripted by the more finely strided dimensions, \eg @m[2][3]@, giving the value @2.3@. 400 The following is an example slicing a row. 401 \lstinput{60-64}{hello-md.cfa} 189 402 \lstinput[aboveskip=0pt]{140-140}{hello-md.cfa} 190 403 191 This function declaration is asserting too much knowledge about its parameter @c@, for it to be usable for printing either a row slice or a column slice. 192 Specifically, declaring the parameter @c@ with type @array@ means that @c@ is contiguous. 193 However, the function does not use this fact. 194 For the function to do its job, @c@ need only be of a container type that offers a subscript operator (of type @ptrdiff_t@ $\rightarrow$ @float@), with managed length @N@. 404 However, function @print1d@ is asserting too much knowledge about its parameter @r@ for printing either a row slice or a column slice. 405 Specifically, declaring the parameter @r@ with type @array@ means that @r@ is contiguous, which is unnecessarily restrictive. 406 That is, @r@ need only be of a container type that offers a subscript operator (of type @ptrdiff_t@ $\rightarrow$ @float@) with managed length @N@. 195 407 The new-array library provides the trait @ix@, so-defined. 196 With it, the original declaration can be generalized , while still implemented with the same body, to the latter declaration:197 \lstinput{4 0-44}{hello-md.cfa}408 With it, the original declaration can be generalized with the same body. 409 \lstinput{43-44}{hello-md.cfa} 198 410 \lstinput[aboveskip=0pt]{145-145}{hello-md.cfa} 199 200 Nontrivial slicing, in this example, means passing a noncontiguous slice to @print1d@. 201 The new-array library provides a ``subscript by all'' operation for this purpose. 202 In a multi-dimensional subscript operation, any dimension given as @all@ is left ``not yet subscripted by a value,'' implementing the @ix@ trait, waiting for such a value. 411 The nontrivial slicing in this example now allows passing a \emph{noncontiguous} slice to @print1d@, where the new-array library provides a ``subscript by all'' operation for this purpose. 412 In a multi-dimensional subscript operation, any dimension given as @all@ is a placeholder, \ie ``not yet subscripted by a value'', waiting for such a value, implementing the @ix@ trait. 203 413 \lstinput{150-151}{hello-md.cfa} 204 414 205 The example has shown that @a[2]@ and @a[[2, all]]@ both refer to the same, ``2.*'' slice.206 Indeed, the various @print1d@ calls under discussion access the entry with value 2.3 as @a[2][3]@, @a[[2,all]][3]@, and @a[[all,3]][2]@.207 This design preserves (and extends) C array semantics by defining @ a[[i,j]]@ to be @a[i][j]@ for numeric subscripts, but also for ``subscripting by all''.415 The example shows @x[2]@ and @x[[2, all]]@ both refer to the same, ``2.*'' slice. 416 Indeed, the various @print1d@ calls under discussion access the entry with value @2.3@ as @x[2][3]@, @x[[2,all]][3]@, and @x[[all,3]][2]@. 417 This design preserves (and extends) C array semantics by defining @x[[i,j]]@ to be @x[i][j]@ for numeric subscripts, but also for ``subscripting by all''. 208 418 That is: 209 210 \begin{tabular}{ cccccl}211 @ a[[2,all]][3]@ & $=$ & @a[2][all][3]@ & $=$ & @a[2][3]@ & (here, @all@ is redundant) \\212 @ a[[all,3]][2]@ & $=$ & @a[all][3][2]@ & $=$ & @a[2][3]@ & (here, @all@ is effective)419 \begin{cquote} 420 \begin{tabular}{@{}cccccl@{}} 421 @x[[2,all]][3]@ & $\equiv$ & @x[2][all][3]@ & $\equiv$ & @x[2][3]@ & (here, @all@ is redundant) \\ 422 @x[[all,3]][2]@ & $\equiv$ & @x[all][3][2]@ & $\equiv$ & @x[2][3]@ & (here, @all@ is effective) 213 423 \end{tabular} 214 215 Narrating progress through each of the @-[-][-][-]@ expressions gives, firstly, a definition of @-[all]@, and secondly, a generalization of C's @-[i]@. 216 217 \noindent Where @all@ is redundant: 218 219 \begin{tabular}{ll} 220 @a@ & 2-dimensional, want subscripts for coarse then fine \\ 221 @a[2]@ & 1-dimensional, want subscript for fine; lock coarse = 2 \\ 222 @a[2][all]@ & 1-dimensional, want subscript for fine \\ 223 @a[2][all][3]@ & 0-dimensional; lock fine = 3 424 \end{cquote} 425 426 Narrating progress through each of the @-[-][-][-]@\footnote{ 427 The first ``\lstinline{-}'' is a variable expression and the remaining ``\lstinline{-}'' are subscript expressions.} 428 expressions gives, firstly, a definition of @-[all]@, and secondly, a generalization of C's @-[i]@. 429 Where @all@ is redundant: 430 \begin{cquote} 431 \begin{tabular}{@{}ll@{}} 432 @x@ & 2-dimensional, want subscripts for coarse then fine \\ 433 @x[2]@ & 1-dimensional, want subscript for fine; lock coarse == 2 \\ 434 @x[2][all]@ & 1-dimensional, want subscript for fine \\ 435 @x[2][all][3]@ & 0-dimensional; lock fine == 3 224 436 \end{tabular} 225 226 \noindentWhere @all@ is effective:227 228 \begin{tabular}{ ll}229 @ a@ & 2-dimensional, want subscripts for coarse then fine \\230 @ a[all]@ & 2-dimensional, want subscripts for fine then coarse \\231 @ a[all][3]@ & 1-dimensional, want subscript for coarse; lock fine= 3 \\232 @ a[all][3][2]@ & 0-dimensional; lock coarse= 2437 \end{cquote} 438 Where @all@ is effective: 439 \begin{cquote} 440 \begin{tabular}{@{}ll@{}} 441 @x@ & 2-dimensional, want subscripts for coarse then fine \\ 442 @x[all]@ & 2-dimensional, want subscripts for fine then coarse \\ 443 @x[all][3]@ & 1-dimensional, want subscript for coarse; lock fine == 3 \\ 444 @x[all][3][2]@ & 0-dimensional; lock coarse == 2 233 445 \end{tabular} 234 446 \end{cquote} 235 447 The semantics of @-[all]@ is to dequeue from the front of the ``want subscripts'' list and re-enqueue at its back. 448 For example, in a two dimensional matrix, this semantics conceptually transposes the matrix by reversing the subscripts. 236 449 The semantics of @-[i]@ is to dequeue from the front of the ``want subscripts'' list and lock its value to be @i@. 237 450 238 Contiguous arrays, and slices of them, are all realized by the same underlying parameterized type. 239 It includes stride information in its metatdata. 240 The @-[all]@ operation is a conversion from a reference to one instantiation, to a reference to another instantiation. 451 Contiguous arrays, and slices of them, are all represented by the same underlying parameterized type, which includes stride information in its metatdata. 452 \PAB{Do not understand this sentence: The \lstinline{-[all]} operation is a conversion from a reference to one instantiation to a reference to another instantiation.} 241 453 The running example's @all@-effective step, stated more concretely, is: 242 243 \begin{tabular}{ ll}244 @ a@ & : 5 of ( 7 of float each spaced 1 float apart ) each spaced 7 floatsapart \\245 @ a[all]@ & : 7 of ( 5 of float each spaced 7 floats apart ) each spaced 1 floatapart454 \begin{cquote} 455 \begin{tabular}{@{}ll@{}} 456 @x@ & : 5 of ( 7 of @float@ each spaced 1 @float@ apart ) each spaced 7 @floats@ apart \\ 457 @x[all]@ & : 7 of ( 5 of @float@ each spaced 7 @float@s apart ) each spaced 1 @float@ apart 246 458 \end{tabular} 459 \end{cquote} 247 460 248 461 \begin{figure} 249 462 \includegraphics{measuring-like-layout} 250 \caption{Visualization of subscripting, by numeric value, and by \lstinline[language=CFA]{all}. 251 Here \lstinline[language=CFA]{x} has type \lstinline[language=CFA]{array( float, 5, 7 )}, understood as 5 rows by 7 columns. 252 The horizontal layout represents contiguous memory addresses while the vertical layout uses artistic license. 253 The vertical shaded band highlights the location of the targeted element, 2.3. 254 Any such vertical contains various interpretations of a single address.} 463 \caption{Visualization of subscripting by value and by \lstinline[language=CFA]{all}, for \lstinline{x} of type \lstinline{array( float, 5, 7 )} understood as 5 rows by 7 columns. 464 The horizontal layout represents contiguous memory addresses while the vertical layout is conceptual. 465 The vertical shaded band highlights the location of the targeted element, 2.3. 466 Any such vertical slice contains various interpretations of a single address.} 255 467 \label{fig:subscr-all} 256 468 \end{figure} 257 258 \noindent BEGIN: Paste looking for a home259 260 The world of multidimensional array implementation has, or abuts, four relevant levels of abstraction, highest to lowest:261 262 1, purpose:263 If you are doing linear algebra, you might call its dimensions, "column" and "row."264 If you are treating an acrostic poem as a grid of characters, you might say,265 the direction of reading the phrases vs the direction of reading the keyword.266 267 2, flexible-stride memory:268 assuming, from here on, a need to see/use contiguous memory,269 this model offers the ability to slice by (provide an index for) any dimension270 271 3, fixed-stride memory:272 this model offers the ability to slice by (provide an index for) only the coarsest dimension273 274 4, explicit-displacement memory:275 no awareness of dimensions, so no distinguishing them; just the ability to access memory at a distance from a reference point276 277 C offers style-3 arrays. Fortran, Matlab and APL offer style-2 arrays.278 Offering style-2 implies offering style-3 as a sub-case.279 My CFA arrays are style-2.280 281 Some debate is reasonable as to whether the abstraction actually goes $ 1 < \{2, 3\} < 4 $,282 rather than my numerically-ordered chain.283 According to the diamond view, styles 2 and 3 are at the same abstraction level, just with 3 offering a more limited set of functionality.284 The chain view reflects the design decision I made in my mission to offer a style-2 abstraction;285 I chose to build it upon a style-3 abstraction.286 (Justification of the decision follows, after the description of the design.)287 288 The following discussion first dispenses with API styles 1 and 4, then elaborates on my work with styles 2 and 3.289 290 Style 1 is not a concern of array implementations.291 It concerns documentation and identifier choices of the purpose-specific API.292 If one is offering a matrix-multiply function, one must specify which dimension(s) is/are being summed over293 (or rely on the familiar convention of these being the first argument's rows and second argument's columns).294 Some libraries offer a style-1 abstraction that is not directly backed by a single array295 (e.g. make quadrants contiguous, as may help cache coherence during a parallel matrix multiply),296 but such designs are out of scope for a discussion on arrays; they are applications of several arrays.297 I typically include style-1 language with examples to help guide intuition.298 299 It is often said that C has row-major arrays while Fortran has column-major arrays.300 This comparison brings an unhelpful pollution of style-1 thinking into issues of array implementation.301 Unfortunately, ``-major'' has two senses: the program's order of presenting indices and the array's layout in memory.302 (The program's order could be either lexical, as in @x[1,2,3]@ subscripting, or runtime, as in the @x[1][2][3]@ version.)303 Style 2 is concerned with introducing a nontrivial relationship between program order and memory order,304 while style 3 sees program order identical with memory order.305 Both C and (the style-3 subset of) Fortran actually use the same relationship here:306 an earlier subscript in program order controls coarser steps in memory.307 The job of a layer-2/3 system is to implement program-ordered subscripting according to a defined memory layout.308 C and Fortran do not use opposite orders in doing this job.309 Fortran is only ``backward'' in its layer-1 conventions for reading/writing and linear algebra.310 Fortran subscripts as $m(c,r)$. When I use style-1 language, I am following the C/mathematical convention of $m(r,c)$.311 312 Style 4 is the inevitable target of any array implementation.313 The hardware offers this model to the C compiler, with bytes as the unit of displacement.314 C offers this model to its programmer as pointer arithmetic, with arbitrary sizes as the unit.315 I consider casting a multidimensional array as a single-dimensional array/pointer,316 then using @x[i]@ syntax to access its elements, to be a form of pointer arithmetic.317 But style 4 is not offering arrays.318 319 Now stepping into the implementation320 of CFA's new type-3 multidimensional arrays in terms of C's existing type-2 multidimensional arrays,321 it helps to clarify that even the interface is quite low-level.322 A C/CFA array interface includes the resulting memory layout.323 The defining requirement of a type-3 system is the ability to slice a column from a column-finest matrix.324 The required memory shape of such a slice is set, before any discussion of implementation.325 The implementation presented here is how the CFA array library wrangles the C type system,326 to make it do memory steps that are consistent with this layout.327 TODO: do I have/need a presentation of just this layout, just the semantics of -[all]?328 469 329 470 Figure~\ref{fig:subscr-all} shows one element (in the shaded band) accessed two different ways: as @x[2][3]@ and as @x[all][3][2]@. … … 365 506 The subscript operator presents what's really inside, by casting to the type below the wedge of lie. 366 507 367 % Does x[all] have to lie too? The picture currently glosses over how it it adverti zes a size of 7 floats. I'm leaving that as an edge case benignly misrepresented in the picture. Edge cases only have to be handled right in the code.508 % Does x[all] have to lie too? The picture currently glosses over how it it advertises a size of 7 floats. I'm leaving that as an edge case benignly misrepresented in the picture. Edge cases only have to be handled right in the code. 368 509 369 510 Casting, overlapping and lying are unsafe. … … 392 533 393 534 The @arpk@ structure and its @-[i]@ operator are thus defined as: 394 \begin{lstlisting} 395 forall( ztype(N), // length of current dimension 396 dtype(S) | sized(S), // masquerading-as 397 dtype E_im, // immediate element, often another array 398 dtype E_base // base element, e.g. float, never array 399 ) { 400 struct arpk { 401 S strides[N]; // so that sizeof(this) is N of S 402 }; 403 404 // expose E_im, stride by S 405 E_im & ?[?]( arpk(N, S, E_im, E_base) & a, ptrdiff_t i ) { 406 return (E_im &) a.strides[i]; 407 } 408 } 409 \end{lstlisting} 535 \begin{cfa} 536 forall( ztype(N), $\C{// length of current dimension}$ 537 dtype(S) | sized(S), $\C{// masquerading-as}$ 538 dtype E_im, $\C{// immediate element, often another array}$ 539 dtype E_base $\C{// base element, e.g. float, never array}$ 540 ) { // distribute forall to each element 541 struct arpk { 542 S strides[N]; $\C{// so that sizeof(this) is N of S}$ 543 }; 544 // expose E_im, stride by S 545 E_im & ?[?]( arpk(N, S, E_im, E_base) & a, ptrdiff_t i ) { 546 return (E_im &) a.strides[i]; 547 } 548 } 549 \end{cfa} 410 550 411 551 An instantiation of the @arpk@ generic is given by the @array(E_base, N0, N1, ...)@ expansion, which is @arpk( N0, Rec, Rec, E_base )@, where @Rec@ is @array(E_base, N1, ...)@. … … 464 604 465 605 \begin{tabular}{rl} 466 C & @void f( size_t n, size_t m, float a[n][m] );@ \\606 C & @void f( size_t n, size_t m, float x[n][m] );@ \\ 467 607 Java & @void f( float[][] a );@ 468 608 \end{tabular} 469 609 470 Java's safety against undefined behaviour assures the callee that, if @ a@ is non-null, then @a.length@ is a valid access (say, evaluating to the number $\ell$) and if @i@ is in $[0, \ell)$ then @a[i]@ is a valid access.610 Java's safety against undefined behaviour assures the callee that, if @x@ is non-null, then @a.length@ is a valid access (say, evaluating to the number $\ell$) and if @i@ is in $[0, \ell)$ then @x[i]@ is a valid access. 471 611 If a value of @i@ outside this range is used, a runtime error is guaranteed. 472 612 In these respects, C offers no guarantees at all. 473 Notably, the suggestion that @n@ is the intended size of the first dimension of @ a@ is documentation only.474 Indeed, many might prefer the technically equivalent declarations @float a[][m]@ or @float (*a)[m]@ as emphasizing the ``no guarantees'' nature of an infrequently used language feature, over using the opportunity to explain a programmer intention.475 Moreover, even if @ a[0][0]@ is valid for the purpose intended, C's basic infamous feature is the possibility of an @i@, such that @a[i][0]@ is not valid for the same purpose, and yet, its evaluation does not produce an error.613 Notably, the suggestion that @n@ is the intended size of the first dimension of @x@ is documentation only. 614 Indeed, many might prefer the technically equivalent declarations @float x[][m]@ or @float (*a)[m]@ as emphasizing the ``no guarantees'' nature of an infrequently used language feature, over using the opportunity to explain a programmer intention. 615 Moreover, even if @x[0][0]@ is valid for the purpose intended, C's basic infamous feature is the possibility of an @i@, such that @x[i][0]@ is not valid for the same purpose, and yet, its evaluation does not produce an error. 476 616 477 617 Java's lack of expressiveness for more applied properties means these outcomes are possible: 478 618 \begin{itemize} 479 \item @ a[0][17]@ and @a[2][17]@ are valid accesses, yet @a[1][17]@ is a runtime error, because @a[1]@ is a null pointer480 \item the same observation, now because @ a[1]@ refers to an array of length 5481 \item execution times vary, because the @float@ values within @ a@ are sometimes stored nearly contiguously, and other times, not at all619 \item @x[0][17]@ and @x[2][17]@ are valid accesses, yet @x[1][17]@ is a runtime error, because @x[1]@ is a null pointer 620 \item the same observation, now because @x[1]@ refers to an array of length 5 621 \item execution times vary, because the @float@ values within @x@ are sometimes stored nearly contiguously, and other times, not at all 482 622 \end{itemize} 483 623 C's array has none of these limitations, nor do any of the ``array language'' comparators discussed in this section. 484 624 485 625 This Java level of safety and expressiveness is also exemplified in the C family, with the commonly given advice [TODO:cite example], for C++ programmers to use @std::vector@ in place of the C++ language's array, which is essentially the C array. 486 The advice is that, while a vector is also more powerful (and quirky) than an array, its capabilities include options to preallocate with an upfront size, to use an available bound-checked accessor (@a.at(i)@ in place of @ a[i]@), to avoid using @push_back@, and to use a vector of vectors.626 The advice is that, while a vector is also more powerful (and quirky) than an array, its capabilities include options to preallocate with an upfront size, to use an available bound-checked accessor (@a.at(i)@ in place of @x[i]@), to avoid using @push_back@, and to use a vector of vectors. 487 627 Used with these restrictions, out-of-bound accesses are stopped, and in-bound accesses never exercise the vector's ability to grow, which is to say, they never make the program slow to reallocate and copy, and they never invalidate the program's other references to the contained values. 488 628 Allowing this scheme the same referential integrity assumption that \CFA enjoys [TODO:xref], this scheme matches Java's safety and expressiveness exactly. … … 532 672 Futhark restricts these expressions syntactically to variables and constants, while a full-strength dependent system does not. 533 673 534 CFA's hybrid presentation, @forall( [N] )@, has @N@ belonging to the type system, yet has no instances.674 \CFA's hybrid presentation, @forall( [N] )@, has @N@ belonging to the type system, yet has no instances. 535 675 Belonging to the type system means it is inferred at a call site and communicated implicitly, like in Dex and unlike in Futhark. 536 676 Having no instances means there is no type for a variable @i@ that constrains @i@ to be in the range for @N@, unlike Dex, [TODO: verify], but like Futhark. … … 539 679 540 680 541 \section{Future Work}681 \section{Future work} 542 682 543 683 \subsection{Declaration syntax} … … 553 693 I will not discuss the core of this project, which has a tall mission already, to improve type safety, maintain appropriate C compatibility and offer more flexibility about storage use. 554 694 It also has a candidate stretch goal, to adapt \CFA's @forall@ generic system to communicate generalized enumerations: 555 \begin{ lstlisting}695 \begin{cfa} 556 696 forall( T | is_enum(T) ) 557 697 void show_in_context( T val ) { … … 565 705 enum weekday { mon, tue, wed = 500, thu, fri }; 566 706 show_in_context( wed ); 567 \end{ lstlisting}707 \end{cfa} 568 708 with output 569 \begin{ lstlisting}709 \begin{cfa} 570 710 mon 571 711 tue < ready … … 573 713 thu 574 714 fri 575 \end{ lstlisting}715 \end{cfa} 576 716 The details in this presentation aren't meant to be taken too precisely as suggestions for how it should look in \CFA. 577 717 But the example shows these abilities: … … 599 739 The structural assumptions required for the domain of an array in Dex are given by the trait (there, ``interface'') @Ix@, which says that the parameter @n@ is a type (which could take an argument like @weekday@) that provides two-way conversion with the integers and a report on the number of values. 600 740 Dex's @Ix@ is analogous the @is_enum@ proposed for \CFA above. 601 \begin{ lstlisting}741 \begin{cfa} 602 742 interface Ix n 603 604 605 606 \end{ lstlisting}743 get_size n : Unit -> Int 744 ordinal : n -> Int 745 unsafe_from_ordinal n : Int -> n 746 \end{cfa} 607 747 608 748 Dex uses this foundation of a trait (as an array type's domain) to achieve polymorphism over shapes. … … 616 756 In the photograph instantiation, it's the tuple-type of $ \langle \mathrm{img\_wd}, \mathrm{img\_ht} \rangle $. 617 757 This use of a tuple-as-index is made possible by the built-in rule for implementing @Ix@ on a pair, given @Ix@ implementations for its elements 618 \begin{ lstlisting}758 \begin{cfa} 619 759 instance {a b} [Ix a, Ix b] Ix (a & b) 620 621 622 760 get_size = \(). size a * size b 761 ordinal = \(i, j). (ordinal i * size b) + ordinal j 762 unsafe_from_ordinal = \o. 623 763 bs = size b 624 764 (unsafe_from_ordinal a (idiv o bs), unsafe_from_ordinal b (rem o bs)) 625 \end{ lstlisting}765 \end{cfa} 626 766 and by a user-provided adapter expression at the call site that shows how to indexing with a tuple is backed by indexing each dimension at a time 627 \begin{ lstlisting}767 \begin{cfa} 628 768 img_trans :: (img_wd,img_ht)=>Real 629 769 img_trans.(i,j) = img.i.j 630 770 result = pairwise img_trans 631 \end{ lstlisting}771 \end{cfa} 632 772 [TODO: cite as simplification of example from https://openreview.net/pdf?id=rJxd7vsWPS section 4] 633 773 … … 661 801 void * malloc( size_t ); 662 802 // C, user 663 struct tm * el1 = malloc( 803 struct tm * el1 = malloc( sizeof(struct tm) ); 664 804 struct tm * ar1 = malloc( 10 * sizeof(struct tm) ); 665 805
Note: See TracChangeset
for help on using the changeset viewer.