Changeset caa3e2c
 Timestamp:
 Jul 29, 2024, 9:49:07 AM (3 months ago)
 Branches:
 master
 Children:
 b67d7a5b
 Parents:
 b8e047a
 Location:
 doc/theses/mike_brooks_MMath
 Files:

 4 edited
Legend:
 Unmodified
 Added
 Removed

doc/theses/mike_brooks_MMath/array.tex
rb8e047a rcaa3e2c 6 6 7 7 Arrays in C are possible the single most misunderstood and incorrectly used features in the language, resulting in the largest proportion of runtime errors and security violations. 8 This chapter describes the new \CFA language and library features that introduce a lengthchecked arraytype to the \CFA standard library~\cite{Cforall}, \eg: 9 \begin{cfa} 10 @array( float, 99 )@ x; $\C{// x contains 99 floats}$ 8 This chapter describes the new \CFA language and library features that introduce a lengthchecked arraytype to the \CFA standard library~\cite{Cforall}. 9 10 Specifically, a new \CFA array is declared: 11 \begin{cfa} 12 @array( float, 99 )@ x; $\C[2.75in]{// x contains 99 floats}$ 13 \end{cfa} 14 using generic type @array@ with arguments @float@ and @99@. 15 A function @f@ is declared with an @array@ parameter of length @42@. 16 \begin{cfa} 11 17 void f( @array( float, 42 )@ & p ) {} $\C{// p accepts 42 floats}$ 12 18 f( x ); $\C{// statically rejected: types are different, 99 != 42}$ 13 19 14 forall( T, [N] ) 15 void g( @array( T, N )@ & p, int i ) { 20 test2.cfa:3:1 error: Invalid application of existing declaration(s) in expression. 21 Applying untyped: Name: f ... to: Name: x 22 \end{cfa} 23 The call @f( x )@ is invalid because the @array@ lengths @99@ and @42@ do not match. 24 25 Next, function @g@ introduces a @forall@ prefix on type parameter @T@ and arbitrary \emph{dimension parameter} @N@, the new feature that represents a count of elements managed by the type system. 26 \begin{cfa} 27 forall( T, @[N]@ ) 28 void g( array( T, @N@ ) & p, int i ) { 16 29 T elem = p[i]; $\C{// dynamically checked: requires 0 <= i < N}$ 17 30 } 18 31 g( x, 0 ); $\C{// T is float, N is 99, dynamic subscript check succeeds}$ 19 g( x, 1000 ); $\C{// T is float, N is 99, dynamic subscript check fails}$ 20 \end{cfa} 21 This example declares variable @x@, with generic type @array@ using arguments @float@ and @99@. 22 Function @f@ is declared with an @array@ parameter of length @42@. 23 The call @f( x )@ is invalid because the @array@ lengths @99@ and @42@ do not match. 24 Next, function @g@ introduces a @forall@ prefix on type parameter @T@ and arbitrary \emph{dimension parameter} @N@, the new feature that represents a count of elements managed by the type system. 32 g( x, 1000 ); $\C{// T is float, N is 99, dynamic subscript check fails}\CRT$ 33 34 Cforall Runtime error: subscript 1000 exceeds dimension range [0,99) $for$ array 0x555555558020. 35 \end{cfa} 25 36 The call @g( x, 0 )@ is valid because @g@ accepts any length of array, where the type system infers @float@ for @T@ and length @99@ for @N@. 26 37 Inferring values for @T@ and @N@ is implicit without programmer involvement. … … 35 46 forall( [N] ) 36 47 void declDemo() { 37 float x1[N]; $\C{// builtin type ("C array")}$38 array(float, N) x2; $\C{// type from library}$48 float x1[N]; $\C{// builtin type ("C array")}$ 49 array(float, N) x2; $\C{// type from library}$ 39 50 } 40 51 \end{cfa} 41 52 Both of the locallydeclared array variables, @x1@ and @x2@, have 42 elements, each element being a @float@. 42 53 The two variables have identical size and layout; they both encapsulate 42float, stack \vs heap allocations with no additional ``bookkeeping'' allocations or headers. 43 Providing this explicit generic approach require da significant extension to the \CFA type system to support a fullfeature, safe, efficient (space and time) arraytype, which forms the foundation for more complex array forms in \CFA.54 Providing this explicit generic approach requires a significant extension to the \CFA type system to support a fullfeature, safe, efficient (space and time) arraytype, which forms the foundation for more complex array forms in \CFA. 44 55 45 56 Admittedly, the @array@ library type (type for @x2@) is syntactically different from its C counterpart. 46 57 A future goal (TODO xref) is to provide a builtin array type with syntax approaching C's (type for @x1@); 47 58 then, the library @array@ type can be removed giving \CFA a largely uniform array type. 48 At present, the builtin arrayis only partially supported, so the generic @array@ is used exclusively in the discussion;59 At present, the C syntax @array@ is only partially supported, so the generic @array@ is used exclusively in the discussion; 49 60 feature support and C compatibility are revisited in Section ? TODO. 50 61 51 Offering an @array@ type, as a distinct alternative to the C array, is consistent with \CFA's goal of backwards compatibility, \ie virtually all existing C (gcc) programs can be compiled by \CFA with only a small number of changes, similar to \CC (g++).62 Offering the @array@ type, as a distinct alternative to the C array, is consistent with \CFA's goal of backwards compatibility, \ie virtually all existing C (@gcc@) programs can be compiled by \CFA with only a small number of changes, similar to \CC (@g++@). 52 63 However, a few compatibilitybreaking changes to the behaviour of the C array are necessary, both as an implementation convenience and to fix C's lax treatment of arrays. 53 64 Hence, the @array@ type is an opportunity to start from a clean slate and show a cohesive selection of features, making it unnecessary to deal with every inherited complexity introduced by the C array TODO xref. 54 65 55 My contributions are:66 My contributions in this chapter are: 56 67 \begin{enumerate} 57 68 \item A type system enhancement that lets polymorphic functions and generic types be parameterized by a numeric value: @forall( [N] )@. … … 100 111 \end{figure} 101 112 102 \VRef[Figure]{f:fHarness} shows a harness that uses the @f@ function illustratinghow dynamic values are fed into the @array@ type.103 Here, the dimension of the @x@, @y@, and @result@ arrays is specified from a commandline valueand these arrays are allocated on the stack.113 \VRef[Figure]{f:fHarness} shows a harness that uses function @f@ to illustrate how dynamic values are fed into the @array@ type. 114 Here, the dimension of arrays @x@, @y@, and @result@ is specified from a commandline value, @dim@, and these arrays are allocated on the stack. 104 115 Then the @x@ array is initialized with decreasing values, and the @y@ array with amounts offset by constant @0.005@, giving relative differences within tolerance initially and diverging for later values. 105 116 The program main is run (see figure bottom) with inputs @5@ and @7@ for sequence lengths. 106 The loops follow the familiar pattern of using the variable @ n@ to iterate through the arrays.107 Most importantly, the type system implicitly captures @ n@ at the call of @f@ and makes it available throughout @f@ as @N@.108 The example shows @ n@ adapting into a typesystem managed length at the declarations of @x@, @y@, and @result@, @N@ adapting in the same way at @f@'s loop bound, and a passthru use of @n@ at @f@'s declaration of @ret@.117 The loops follow the familiar pattern of using the variable @dim@ to iterate through the arrays. 118 Most importantly, the type system implicitly captures @dim@ at the call of @f@ and makes it available throughout @f@ as @N@. 119 The example shows @dim@ adapting into a typesystem managed length at the declarations of @x@, @y@, and @result@, @N@ adapting in the same way at @f@'s loop bound, and a passthru use of @dim@ at @f@'s declaration of @ret@. 109 120 Except for the lifetimemanagement issue of @result@, \ie explicit @free@, this program has eliminated both the syntactic and semantic problems associated with C arrays and their usage. 110 121 These benefits cannot be underestimated. … … 141 152 \CC has a (mistaken) belief that references are not objects, but pointers are objects. 142 153 In the \CC example, the arrays fall back on C arrays, which have a duality with references with respect to automatic dereferencing. 143 The \CFA array is a contiguous object with an address, which can stored as a reference or pointer.154 The \CFA array is a contiguous object with an address, which can be stored as a reference or pointer. 144 155 \item 145 156 C/\CC arrays cannot be copied, while \CFA arrays can be copied, making them a firstclass object (although array copy is often avoided for efficiency). … … 151 162 152 163 @template< typename T, size_t N >@ 153 void copy( T ret[ N], T x[N] ) {164 void copy( T ret[@N@], T x[@N@] ) { 154 165 for ( int i = 0; i < N; i += 1 ) ret[i] = x[i]; 155 166 } … … 167 178 int main() { 168 179 @forall( T, [N] )@ // nested function 169 void copy( array( T, N ) & ret, array( T, N) & x ) {180 void copy( array( T, @N@ ) & ret, array( T, @N@ ) & x ) { 170 181 for ( i; 10 ) ret[i] = x[i]; 171 182 } … … 185 196 186 197 Continuing the discussion of \VRef[Figure]{f:fHarness}, the example has @f@ expecting two arrays of the same length. 187 A compiletime error occurs when attempting to call @f@ with arrays of differing lengths.198 As stated previous, a compiletime error occurs when attempting to call @f@ with arrays of differing lengths. 188 199 % removing leading whitespace 189 200 \lstinput[tabsize=1]{5253}{helloarray.cfa} 190 201 \lstinput[tabsize=1,aboveskip=0pt]{6264}{helloarray.cfa} 191 As is common practice in C, the programmer is free to cast, \ieto assert knowledge not shared with the type system.202 C allows casting to assert knowledge not shared with the type system. 192 203 \lstinput{7074}{helloarray.cfa} 193 204 … … 197 208 \lstinput{1015}{helloaccordion.cfa} 198 209 This structure's layout has the starting offset of @municipalities@ varying in @NprovTerty@, and the offset of @total_pt@ and @total_mun@ varying in both generic parameters. 199 For a function that operates on a @Can adaPop@ structure, the type system handles this variation transparently.210 For a function that operates on a @CanPop@ structure, the type system handles this variation transparently. 200 211 \lstinput{4045}{helloaccordion.cfa} 201 \VRef[Figure]{f:checkHarness} shows program results where different offset values being used. 202 The output values show that @summarize@ and its caller agree on both the offsets (where the callee starts reading @cost_contribs@ and where the callee writes @total_cost@). 203 Yet the call site just says, ``pass the request.'' 212 \VRef[Figure]{f:checkHarness} shows the @CanPop@ harness and results with different array sizes, if the municipalities changed after a census. 204 213 205 214 \begin{figure} 206 215 \lstinput{6068}{helloaccordion.cfa} 207 \lstinput{707 2}{helloaccordion.cfa}216 \lstinput{7075}{helloaccordion.cfa} 208 217 \caption{\lstinline{check} Harness} 209 218 \label{f:checkHarness} … … 232 241 In general, storage layout is hidden by subscripting, and only appears when passing arrays among different programming languages or accessing specific hardware. 233 242 234 \VRef[Figure]{f:FixedVariable} shows two C90 approaches for manipulating contiguous arrays.243 \VRef[Figure]{f:FixedVariable} shows two C90 approaches for manipulating a contiguous matrix. 235 244 Note, C90 does not support VLAs. 236 The fixeddimension approach uses the type system;245 The fixeddimension approach (left) uses the type system; 237 246 however, it requires all dimensions except the first to be specified at compile time, \eg @m[][6]@, allowing all subscripting stride calculations to be generated with constants. 238 247 Hence, every matrix passed to @fp1@ must have exactly 6 columns but the row size can vary. 239 The variabledimension approach ignores (violates) the type system, \ie argument and parameters types do not match, and manually performs pointer arithmetic for subscriptingin the macro @sub@.248 The variabledimension approach (right) ignores (violates) the type system, \ie argument and parameters types do not match, and subscripting is performed manually using pointer arithmetic in the macro @sub@. 240 249 241 250 \begin{figure} … … 258 267 ... printf( "%d ", @sub( m, r, c )@ ); ... 259 268 } 260 int vm1[ 4][4], vm2[6][8]; // no VLA269 int vm1[@4@][@4@], vm2[@6@][@8@]; // no VLA 261 270 // initialize matrixes 262 271 fp2( 4, 4, vm1 ); … … 290 299 The language decides if the matrix and arrayofarray are laid out the same or differently. 291 300 For example, an arrayofarray may be an array of row pointers to arrays of columns, so the rows may not be contiguous in memory nor even the same length (triangular matrix). 292 Regardless, there is usually a uniform subscripting syntax masking the memory layout, even though the two array forms could be differentiated at the subscript level, \eg @m[1,2]@ \vs @aa[1][2]@.301 Regardless, there is usually a uniform subscripting syntax masking the memory layout, even though a language could differentiated between the two forms using subscript syntax, \eg @m[1,2]@ \vs @aa[1][2]@. 293 302 Nevertheless, controlling memory layout can make a difference in what operations are allowed and in performance (caching/NUMA effects). 294 303 … … 301 310 The focus of this work is on the contiguous multidimensional arrays in C. 302 311 The reason is that programmers are often forced to use the more complex arrayofarray form when a contiguous array would be simpler, faster, and safer. 303 Nevertheless, the C arrayofarray form continues to be usefulfor special circumstances.312 Nevertheless, the C arrayofarray form is still important for special circumstances. 304 313 305 314 \VRef[Figure]{f:ContiguousNoncontiguous} shows the extensions made in C99 for manipulating contiguous \vs noncontiguous arrays.\footnote{C90 also supported noncontiguous arrays.} … … 313 322 While this contiguousarray capability is a step forward, it is still the programmer's responsibility to manually manage the number of dimensions and their sizes, both at the function definition and call sites. 314 323 That is, the array does not automatically carry its structure and sizes for use in computing subscripts. 315 While the noncontiguous style in @faa@ looks very similar to @fc@, the compiler only understands the unknownsized array of row pointers, and it relies on the programmer to traverse the columns in a row correctly .324 While the noncontiguous style in @faa@ looks very similar to @fc@, the compiler only understands the unknownsized array of row pointers, and it relies on the programmer to traverse the columns in a row correctly with a correctly bounded loop index. 316 325 Specifically, there is no requirement that the rows are the same length, like a poem with different length lines. 317 326 … … 365 374 this model has no awareness of dimensions just the ability to access memory at a distance from a reference point (basedisplacement addressing), \eg @x + 23@ or @x[23}@ $\Rightarrow$ 23rd element from the start of @x@. 366 375 A programmer must manually build any notion of dimensions using other tools; 367 hence, this style is not offering multidimensional arrays \see{\VRef[Figure]{f:FixedVariable} }.376 hence, this style is not offering multidimensional arrays \see{\VRef[Figure]{f:FixedVariable} right example}. 368 377 \end{enumerate} 369 378 … … 381 390 A C/\CFA array interface includes the resulting memory layout. 382 391 The defining requirement of a type2 system is the ability to slice a column from a columnfinest matrix. 383 The required memory shape of such a slice is set, before any discussion of implementation.392 The required memory shape of such a slice is fixed, before any discussion of implementation. 384 393 The implementation presented here is how the \CFA array library wrangles the C type system, to make it do memory steps that are consistent with this layout. 385 394 TODO: do I have/need a presentation of just this layout, just the semantics of [all]? … … 389 398 Beyond what C's array type offers, the new array brings direct support for working with a noncontiguous array slice, allowing a program to work with dimension subscripts given in a nonphysical order. 390 399 391 The following examples use an @array( float, 5, 7) m@, loaded with values incremented by $0.1$, when stepping across the length7 finelystrided column dimension, and stepping across the length5 coarselystrided row dimension.392 \par \noindent400 The following examples use the matrix declaration @array( float, 5, 7 ) m@, loaded with values incremented by $0.1$, when stepping across the length7 finelystrided column dimension, and stepping across the length5 coarselystrided row dimension. 401 \par 393 402 \mbox{\lstinput{121126}{hellomd.cfa}} 394 403 \par\noindent … … 405 414 Specifically, declaring the parameter @r@ with type @array@ means that @r@ is contiguous, which is unnecessarily restrictive. 406 415 That is, @r@ need only be of a container type that offers a subscript operator (of type @ptrdiff_t@ $\rightarrow$ @float@) with managed length @N@. 407 The newarray library provides the trait @ ix@, sodefined.416 The newarray library provides the trait @ar@, sodefined. 408 417 With it, the original declaration can be generalized with the same body. 409 418 \lstinput{4344}{hellomd.cfa} 410 419 \lstinput[aboveskip=0pt]{145145}{hellomd.cfa} 411 420 The nontrivial slicing in this example now allows passing a \emph{noncontiguous} slice to @print1d@, where the newarray library provides a ``subscript by all'' operation for this purpose. 412 In a multidimensional subscript operation, any dimension given as @all@ is a placeholder, \ie ``not yet subscripted by a value'', waiting for such a value, implementing the @ ix@ trait.421 In a multidimensional subscript operation, any dimension given as @all@ is a placeholder, \ie ``not yet subscripted by a value'', waiting for such a value, implementing the @ar@ trait. 413 422 \lstinput{150151}{hellomd.cfa} 414 423 … … 471 480 In both cases, value 2 selects from the coarser dimension (rows of @x@), 472 481 while the value 3 selects from the finer dimension (columns of @x@). 473 The figure illustrates the value of each subexpression, comparing how numeric subscripting proceeds from @x@, vs from @x[all]@.482 The figure illustrates the value of each subexpression, comparing how numeric subscripting proceeds from @x@, \vs from @x[all]@. 474 483 Proceeding from @x@ gives the numeric indices as coarse then fine, while proceeding from @x[all]@ gives them fine then coarse. 475 484 These two starting expressions, which are the example's only multidimensional subexpressions 476 485 (those that received zero numeric indices so far), are illustrated with vertical steps where a \emph{first} numeric index would select. 477 486 478 The figure's presentation offers an intuition answering ,What is an atomic element of @x[all]@?479 From there, @x[all]@ itself is simply a twodimensional array, in the strict C sense, of these strangebuilding blocks.487 The figure's presentation offers an intuition answering to: What is an atomic element of @x[all]@? 488 From there, @x[all]@ itself is simply a twodimensional array, in the strict C sense, of these building blocks. 480 489 An atom (like the bottommost value, @x[all][3][2]@), is the contained value (in the square box) 481 490 and a lie about its size (the wedge above it, growing upward). 482 An array of these atoms (like the intermediate @x[all][3]@) is just a contiguous arrangement of them, 483 done according to their size, as announced. Call such an array a column. 484 A column is almost ready to be arranged into a matrix; it is the \emph{contained value} of the nextlevel building block, 485 but another lie about size is required. 486 At first, an atom needed to be arranged as if it were bigger, 487 but now a column needs to be arranged as if it is smaller (the wedge above it, shrinking upward). 491 An array of these atoms (like the intermediate @x[all][3]@) is just a contiguous arrangement of them, done according to their size; 492 call such an array a column. 493 A column is almost ready to be arranged into a matrix; 494 it is the \emph{contained value} of the nextlevel building block, but another lie about size is required. 495 At first, an atom needs to be arranged as if it were bigger, but now a column needs to be arranged as if it is smaller (the wedge above it, shrinking upward). 488 496 These lying columns, arranged contiguously according to their size (as announced) form the matrix @x[all]@. 489 Because @x[all]@ takes indices, first for the fine stride, then for the coarse stride, 490 it achieves the requirement of representing the transpose of @x@. 491 Yet every time the programmer presents an index, a mere Carray subscript is achieving the offset calculation. 497 Because @x[all]@ takes indices, first for the fine stride, then for the coarse stride, it achieves the requirement of representing the transpose of @x@. 498 Yet every time the programmer presents an index, a Carray subscript is achieving the offset calculation. 492 499 493 500 In the @x[all]@ case, after the finely strided subscript is done (column 3 is selected), … … 495 502 compared with where analogous rows appear when the rowlevel option is presented for @x@. 496 503 497 These size lies create an appearance of overlap.498 For example, in @x[all]@, the shaded band touches atoms 2.0, 2.1, 2.2, 2.3, 1.4, 1.5 and 1.6.504 \PAB{I don't understand this paragraph: These size lies create an appearance of overlap. 505 For example, in \lstinline{x[all]}, the shaded band touches atoms 2.0, 2.1, 2.2, 2.3, 1.4, 1.5 and 1.6. 499 506 But only the atom 2.3 is storing its value there. 500 The rest are lying about (conflicting) claims on this location, but never exercising these alleged claims. 507 The rest are lying about (conflicting) claims on this location, but never exercising these alleged claims.} 501 508 502 509 Lying is implemented as casting. … … 504 511 This structure uses one type in its internal field declaration and offers a different type as the return of its subscript operator. 505 512 The field within is a plainC array of the fictional type, which is 7 floats long for @x[all][3][2]@ and 1 float long for @x[all][3]@. 506 The subscript operator presents what 's really inside, by casting to the type below the wedge oflie.513 The subscript operator presents what is really inside, by casting to the type below the wedge of the lie. 507 514 508 515 % Does x[all] have to lie too? The picture currently glosses over how it it advertises a size of 7 floats. I'm leaving that as an edge case benignly misrepresented in the picture. Edge cases only have to be handled right in the code. 509 516 510 Casting, overlapping and lying are unsafe. 511 The mission here is to implement a style2 feature that the type system helps the programmer use safely. 512 The offered style2 system is allowed to be internally unsafe, 513 just as C's implementation of a style3 system (upon a style4 system) is unsafe within, 514 even when the programmer is using it without casts or pointer arithmetic. 515 Having a style2 system relieves the programmer from resorting to unsafe pointer arithmetic when working with noncontiguous slices. 516 517 The choice to implement this style2 system upon C's style3 arrays, rather than its style4 pointer arithmetic, 518 reduces the attack surface of unsafe code. 519 My casting is unsafe, but I do not do any pointer arithmetic. 520 When a programmer works in the commoncase style3 subset (in the no@[all]@ top of Figure~\ref{fig:subscrall}), 521 my casts are identities, and the C compiler is doing its usual displacement calculations. 522 If I had implemented my system upon style4 pointer arithmetic, 523 then this common case would be circumventing C's battlehardened displacement calculations in favour of my own. 524 525 \noindent END: Paste looking for a home 517 Casting, overlapping, and lying are unsafe. 518 The mission is to implement a style1 feature in the type system for safe use by a programmer. 519 The offered style1 system is allowed to be internally unsafe, 520 just as C's implementation of a style2 system (upon a style3 system) is unsafe within, even when the programmer is using it without casts or pointer arithmetic. 521 Having a style1 system relieves the programmer from resorting to unsafe pointer arithmetic when working with noncontiguous slices. 522 523 % PAB: repeat from previous paragraph. 524 % The choice to implement this style1 system upon C's style2 arrays, rather than its style3 pointer arithmetic, reduces the attack surface of unsafe code. 525 % My casting is unsafe, but I do not do any pointer arithmetic. 526 % When a programmer works in the commoncase style2 subset (in the no@[all]@ top of Figure~\ref{fig:subscrall}), my casts are identities, and the C compiler is doing its usual displacement calculations. 527 % If I had implemented my system upon style3 pointer arithmetic, then this common case would be circumventing C's battlehardened displacement calculations in favour of my own. 528 529 % \noindent END: Paste looking for a home 526 530 527 531 The newarray library defines types and operations that ensure proper elements are accessed soundly in spite of the overlapping. 528 The private @arpk@ structure (array with explicit packing) is generic over these two types (and more): the contained element, what it is masquerading as. 529 This structure's public interface is the @array(...)@ construction macro and the two subscript operators. 530 Construction by @array@ initializes the masqueradingas type information to be equal to the containedelement information. 531 Subscripting by @all@ rearranges the order of masqueradingas types to achieve, in general, nontrivial striding. 532 Subscripting by a number consumes the masqueradingas size of the contained element type, does normal array stepping according to that size, and returns there element found there, in unmasked form. 533 534 The @arpk@ structure and its @[i]@ operator are thus defined as: 535 \begin{cfa} 536 forall( ztype(N), $\C{// length of current dimension}$ 537 dtype(S)  sized(S), $\C{// masqueradingas}$ 538 dtype E_im, $\C{// immediate element, often another array}$ 539 dtype E_base $\C{// base element, e.g. float, never array}$ 532 The @arpk@ structure and its @[i]@ operator are defined as: 533 \begin{cfa} 534 forall( 535 [N], $\C{// length of current dimension}$ 536 S &  sized(S), $\C{// masqueradingas}$ 537 Timmed &, $\C{// immediate element, often another array}$ 538 Tbase & $\C{// base element, e.g. float, never array}$ 540 539 ) { // distribute forall to each element 541 540 struct arpk { 542 541 S strides[N]; $\C{// so that sizeof(this) is N of S}$ 543 542 }; 544 // expose E_im, stride by S 545 E_im & ?[?]( arpk(N, S, E_im, E_base) & a, ptrdiff_t i ) { 546 return (E_im &) a.strides[i]; 543 // expose Timmed, stride by S 544 static inline Timmed & ?[?]( arpk( N, S, Timmed, Tbase ) & a, long int i ) { 545 subcheck( a, i, 0, N ); 546 return (Timmed &)a.strides[i]; 547 547 } 548 548 } 549 549 \end{cfa} 550 The private @arpk@ structure (array with explicit packing) is generic over four types: dimension length, masqueradingas, ... 551 This structure's public interface is hidden behind the @array(...)@ macro and the subscript operator. 552 Construction by @array@ initializes the masqueradingas type information to be equal to the containedelement information. 553 Subscripting by @all@ rearranges the order of masqueradingas types to achieve, in general, nontrivial striding. 554 Subscripting by a number consumes the masqueradingas size of the contained element type, does normal array stepping according to that size, and returns there element found there, in unmasked form. 550 555 551 556 An instantiation of the @arpk@ generic is given by the @array(E_base, N0, N1, ...)@ expansion, which is @arpk( N0, Rec, Rec, E_base )@, where @Rec@ is @array(E_base, N1, ...)@. … … 568 573 This section provides a demonstration of the effect. 569 574 570 The experiment compares the \CFA array system with the paddedroom system [TODO:xref] most typically exemplified by Java arrays, but also reflected in the C++pattern where restricted vector usage models a checked array.575 The experiment compares the \CFA array system with the paddedroom system [TODO:xref] most typically exemplified by Java arrays, but also reflected in the \CC pattern where restricted vector usage models a checked array. 571 576 The essential feature of this paddedroom system is the onetoone correspondence between array instances and the symbolic bounds on which dynamic checks are based. 572 The experiment compares with the C++version to keep access to generated assembly code simple.573 574 As a control case, a simple loop (with no reused dimension sizes) is seen to get the same optimization treatment in both the \CFA and C++versions.577 The experiment compares with the \CC version to keep access to generated assembly code simple. 578 579 As a control case, a simple loop (with no reused dimension sizes) is seen to get the same optimization treatment in both the \CFA and \CC versions. 575 580 When the programmer treats the array's bound correctly (making the subscript ``obviously fine''), no dynamic bound check is observed in the program's optimized assembly code. 576 581 But when the bounds are adjusted, such that the subscript is possibly invalid, the bound check appears in the optimized assembly, ready to catch an occurrence the mistake. … … 589 594 \section{Comparison with other arrays} 590 595 596 597 \subsection{Rust} 598 591 599 \CFA's array is the first lightweight application of dependentlytyped bound tracking to an extension of C. 592 Other extensions of C that apply dependentlytyped bound tracking are heavyweight, in that the bound tracking is part of a linearly typed ownership system thatfurther helps guarantee statically the validity of every pointer deference.600 Other extensions of C that apply dependentlytyped bound tracking are heavyweight, in that the bound tracking is part of a linearlytyped ownershipsystem, which further helps guarantee statically the validity of every pointer deference. 593 601 These systems, therefore, ask the programmer to convince the type checker that every pointer dereference is valid. 594 602 \CFA imposes the lighterweight obligation, with the more limited guarantee, that initiallydeclared bounds are respected thereafter. … … 598 606 The \CFA array, applied to accordion structures [TOD: crossreference] \emph{implies} the necessary pointer arithmetic, generated automatically, and not appearing at all in a user's program. 599 607 600 \subsection{Safety in a padded room} 601 602 Java's array [TODO:cite] is a straightforward example of assuring safety against undefined behaviour, at a cost of expressiveness for more applied properties. 603 Consider the array parameter declarations in: 604 608 609 \subsection{Java} 610 611 Java arrays are arraysofarrays because all objects are references \see{\VRef{toc:mdimpl}}. 612 For each array, Java implicitly storages the array dimension in a descriptor, supporting array length, subscript checking, and allowing dynamicallysized arrayparameter declarations. 613 \begin{cquote} 605 614 \begin{tabular}{rl} 606 615 C & @void f( size_t n, size_t m, float x[n][m] );@ \\ 607 Java & @void f( float [][] a);@616 Java & @void f( float x[][] );@ 608 617 \end{tabular} 609 610 Java's safety against undefined behaviour assures the callee that, if @x@ is nonnull, then @a.length@ is a valid access (say, evaluating to the number $\ell$) and if @i@ is in $[0, \ell)$ then @x[i]@ is a valid access. 611 If a value of @i@ outside this range is used, a runtime error is guaranteed. 612 In these respects, C offers no guarantees at all. 613 Notably, the suggestion that @n@ is the intended size of the first dimension of @x@ is documentation only. 614 Indeed, many might prefer the technically equivalent declarations @float x[][m]@ or @float (*a)[m]@ as emphasizing the ``no guarantees'' nature of an infrequently used language feature, over using the opportunity to explain a programmer intention. 615 Moreover, even if @x[0][0]@ is valid for the purpose intended, C's basic infamous feature is the possibility of an @i@, such that @x[i][0]@ is not valid for the same purpose, and yet, its evaluation does not produce an error. 616 617 Java's lack of expressiveness for more applied properties means these outcomes are possible: 618 \begin{itemize} 619 \item @x[0][17]@ and @x[2][17]@ are valid accesses, yet @x[1][17]@ is a runtime error, because @x[1]@ is a null pointer 620 \item the same observation, now because @x[1]@ refers to an array of length 5 621 \item execution times vary, because the @float@ values within @x@ are sometimes stored nearly contiguously, and other times, not at all 622 \end{itemize} 623 C's array has none of these limitations, nor do any of the ``array language'' comparators discussed in this section. 624 625 This Java level of safety and expressiveness is also exemplified in the C family, with the commonly given advice [TODO:cite example], for C++ programmers to use @std::vector@ in place of the C++ language's array, which is essentially the C array. 626 The advice is that, while a vector is also more powerful (and quirky) than an array, its capabilities include options to preallocate with an upfront size, to use an available boundchecked accessor (@a.at(i)@ in place of @x[i]@), to avoid using @push_back@, and to use a vector of vectors. 627 Used with these restrictions, outofbound accesses are stopped, and inbound accesses never exercise the vector's ability to grow, which is to say, they never make the program slow to reallocate and copy, and they never invalidate the program's other references to the contained values. 628 Allowing this scheme the same referential integrity assumption that \CFA enjoys [TODO:xref], this scheme matches Java's safety and expressiveness exactly. 629 [TODO: decide about going deeper; some of the Java expressiveness concerns have mitigations, up to even more tradeoffs.] 618 \end{cquote} 619 However, in the C prototype, the parameters @n@ and @m@ are documentation only as the intended size of the first and second dimension of @x@. 620 \VRef[Figure]{f:JavaVsCTriangularMatrix} compares a triangular matrix (arrayofarrays) in dynamically safe Java to unsafe C. 621 Each dynamically sized row in Java stores its dimension, while C requires the programmer to manage these sizes explicitly (@rlnth@). 622 All subscripting is Java has bounds checking, while C has none. 623 Both Java and C require explicit null checking, otherwise there is a runtime failure. 624 625 \begin{figure} 626 \setlength{\tabcolsep}{15pt} 627 \begin{tabular}{ll@{}} 628 \begin{java} 629 int m[][] = { // triangular matrix 630 new int [4], 631 new int [3], 632 new int [2], 633 new int [1], 634 null 635 }; 636 637 for ( int r = 0; r < m.length; r += 1 ) { 638 if ( m[r] == null ) continue; 639 for ( int c = 0; c < m[r].length; c += 1 ) { 640 m[r][c] = c + r; // subscript checking 641 } 642 643 } 644 645 for ( int r = 0; r < m.length; r += 1 ) { 646 if ( m[r] == null ) { 647 System.out.println( "null row" ); 648 continue; 649 } 650 for ( int c = 0; c < m[r].length; c += 1 ) { 651 System.out.print( m[r][c] + " " ); 652 } 653 System.out.println(); 654 655 } 656 \end{java} 657 & 658 \begin{cfa} 659 int * m[5] = { // triangular matrix 660 calloc( 4, sizeof(int) ), 661 calloc( 3, sizeof(int) ), 662 calloc( 2, sizeof(int) ), 663 calloc( 1, sizeof(int) ), 664 NULL 665 }; 666 int rlnth = 4; 667 for ( int r = 0; r < 5; r += 1 ) { 668 if ( m[r] == NULL ) continue; 669 for ( int c = 0; c < rlnth; c += 1 ) { 670 m[r][c] = c + r; // no subscript checking 671 } 672 rlnth = 1; 673 } 674 rlnth = 4; 675 for ( int r = 0; r < 5; r += 1 ) { 676 if ( m[r] == NULL ) { 677 printf( "null row\n" ); 678 continue; 679 } 680 for ( int c = 0; c < rlnth; c += 1 ) { 681 printf( "%d ", m[r][c] ); 682 } 683 printf( "\n" ); 684 rlnth = 1; 685 } 686 \end{cfa} 687 \end{tabular} 688 \caption{Java (left) \vs C (right) Triangular Matrix} 689 \label{f:JavaVsCTriangularMatrix} 690 \end{figure} 691 692 The downside of the arraysofarrays approach is performance due to pointer chasing versus pointer arithmetic for a contiguous arrays. 693 Furthermore, there is the cost of managing the implicit array descriptor. 694 It is unlikely that a JIT can dynamically rewrite an arraysofarrays form into a contiguous form. 695 696 697 \subsection{\CC} 698 699 Because C arrays are difficult and dangerous, the mantra for \CC programmers is to use @std::vector@ in place of the C array. 700 While the vector size can grow and shrink dynamically, \vs a fixedsize dynamic size with VLAs, the cost of this extra feature is mitigated by preallocating the maximum size (like the VLA) at the declaration (one dynamic call) to avoid using @push_back@. 701 \begin{c++} 702 vector< vector< int > > m( 5, vector<int>(8) ); // initialize size of 5 x 8 with 6 dynamic allocations 703 \end{c++} 704 Multidimensional arrays are arraysofarrays with associated costs. 705 Each @vector@ array has an array descriptor contain the dimension, which allows bound checked using @x.at(i)@ in place of @x[i]@. 706 Used with these restrictions, outofbound accesses are caught, and inbound accesses never exercise the vector's ability to grow, preventing costly reallocate and copy, and never invalidate references to contained values. 707 This scheme matches Java's safety and expressiveness exactly, but with the inherent costs. 708 630 709 631 710 \subsection{Levels of dependently typed arrays} … … 655 734 it can also do these other cool checks, but watch how I can mess with its conservativeness and termination 656 735 657 Two current, stateoftheart array languages, Dex\cite{arr:dex:long} and Futhark\cite{arr:futhark:tytheory}, offer offernovel contributions concerning similar, restricted dependent types for tracking array length.736 Two current, stateoftheart array languages, Dex\cite{arr:dex:long} and Futhark\cite{arr:futhark:tytheory}, offer novel contributions concerning similar, restricted dependent types for tracking array length. 658 737 Unlike \CFA, both are garbagecollected functional languages. 659 738 Because they are garbagecollected, referential integrity is builtin, meaning that the heavyweight analysis, that \CFA aims to avoid, is unnecessary. … … 727 806 [TODO: introduce Ada in the comparators] 728 807 729 In Ada and Dex, an array is conceived as a function whose domain must satisfy only certain structural assumptions, while in C, C++, Java, Futhark and \CFA today, the domain is a prefix of the natural numbers.808 In Ada and Dex, an array is conceived as a function whose domain must satisfy only certain structural assumptions, while in C, \CC, Java, Futhark and \CFA today, the domain is a prefix of the natural numbers. 730 809 The generality has obvious aesthetic benefits for programmers working on scheduling resources to weekdays, and for programmers who prefer to count from an initial number of their own choosing. 731 810 
doc/theses/mike_brooks_MMath/programs/helloaccordion.cfa
rb8e047a rcaa3e2c 9 9 10 10 forall( T, @[NprovTerty]@, @[Nmunicipalities]@ ) 11 struct Can adaPop {11 struct CanPop { 12 12 array( T, @NprovTerty@ ) provTerty; $\C{// nested VLA}$ 13 13 array( T, @Nmunicipalities@ ) municipalities; $\C{// nested VLA}$ … … 19 19 20 20 forall( T, [NprovTerty], [Nmunicipalities] ) 21 void ?{}( T &, Can adaPop( T, NprovTerty, Nmunicipalities ) & this ) {}21 void ?{}( T &, CanPop( T, NprovTerty, Nmunicipalities ) & this ) {} 22 22 23 23 forall( T &, [NprovTerty], [Nmunicipalities] ) 24 void ^?{}( Can adaPop( T, NprovTerty, Nmunicipalities ) & this ) {}24 void ^?{}( CanPop( T, NprovTerty, Nmunicipalities ) & this ) {} 25 25 26 26 … … 39 39 40 40 forall( T, [NprovTerty], [Nmunicipalities] ) 41 void check( Can adaPop( T, NprovTerty, Nmunicipalities ) & pop ) with( pop ) {41 void check( CanPop( T, NprovTerty, Nmunicipalities ) & pop ) with( pop ) { 42 42 total_pt = total_mun = 0; 43 43 for ( i; NprovTerty ) total_pt += provTerty[i]; … … 60 60 int main( int argc, char * argv[] ) { 61 61 const int npt = ato( argv[1] ), nmun = ato( argv[2] ); 62 @Can adaPop( int, npt, nmun ) pop;@62 @CanPop( int, npt, nmun ) pop;@ 63 63 // read in population numbers 64 64 @check( pop );@ … … 71 71 Total province/territory: 36,991,981 72 72 Total municipalities: 36,991,981 73 $\$$ ./a.out 13 3654 74 Total province/territory: 36,991,981 75 Total municipalities: 36,991,981 73 76 */ 74 77 
doc/theses/mike_brooks_MMath/programs/helloarray.cfa
rb8e047a rcaa3e2c 9 9 10 10 forall( [@N@] ) $\C{// array dimension}$ 11 array( bool, @N@ ) & f( array( float, @N@ ) & x, array( float, @N@ ) & y ) {11 array( bool, @N@ ) & f( array( float, @N@ ) & x, array( float, @N@ ) & y ) { 12 12 array( bool, @N@ ) & ret = *@alloc@(); $\C{// sizeof ret used by alloc}$ 13 13 for ( i; @N@ ) { … … 29 29 30 30 int main( int argc, char * argv[] ) { 31 const int @ n@ = ato( argv[1] );$\C{// deduce conversion type}$32 array( float, @ n@ ) x, y; $\C{// VLAs}$33 for ( i; n ) {$\C{// initialize arrays}$31 const int @dim@ = ato( argv[1] ); $\C{// deduce conversion type}$ 32 array( float, @dim@ ) x, y; $\C{// VLAs}$ 33 for ( i; dim ) { $\C{// initialize arrays}$ 34 34 x[i] = 3.14 / (i + 1); 35 35 y[i] = x[i] + 0.005 ; 36 36 } 37 array( bool, @ n@ ) & result = @f( x, y )@; $\C{// call}$37 array( bool, @dim@ ) & result = @f( x, y )@; $\C{// call}$ 38 38 sout  "result: "  nonl; $\C{// print result}$ 39 for ( i; n)39 for ( i; dim ) 40 40 sout  result[i]  nonl; 41 41 sout  nl; 
doc/theses/mike_brooks_MMath/programs/hellomd.cfa
rb8e047a rcaa3e2c 138 138 139 139 140 print1d_cstyle( m[ 2 ]); $\C{// row 2: 2.0 2.1 2.2 2.3 2.4 2.5 2.6}$140 print1d_cstyle( @m[ 2 ]@ ); $\C{// row 2: 2.0 2.1 2.2 2.3 2.4 2.5 2.6}$ 141 141 142 142
Note: See TracChangeset
for help on using the changeset viewer.