- Timestamp:
- Jul 29, 2024, 1:32:51 PM (7 months ago)
- Branches:
- master
- Children:
- f3d2a4f
- Parents:
- 38e20a80 (diff), 1661ad7 (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the(diff)
links above to see all the changes relative to each parent. - Location:
- doc
- Files:
-
- 7 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/theses/mike_brooks_MMath/array.tex
r38e20a80 rce02877 6 6 7 7 Arrays in C are possible the single most misunderstood and incorrectly used features in the language, resulting in the largest proportion of runtime errors and security violations. 8 This chapter describes the new \CFA language and library features that introduce a length-checked array-type to the \CFA standard library~\cite{Cforall}, \eg: 9 \begin{cfa} 10 @array( float, 99 )@ x; $\C{// x contains 99 floats}$ 8 This chapter describes the new \CFA language and library features that introduce a length-checked array-type to the \CFA standard library~\cite{Cforall}. 9 10 Specifically, a new \CFA array is declared: 11 \begin{cfa} 12 @array( float, 99 )@ x; $\C[2.75in]{// x contains 99 floats}$ 13 \end{cfa} 14 using generic type @array@ with arguments @float@ and @99@. 15 A function @f@ is declared with an @array@ parameter of length @42@. 16 \begin{cfa} 11 17 void f( @array( float, 42 )@ & p ) {} $\C{// p accepts 42 floats}$ 12 18 f( x ); $\C{// statically rejected: types are different, 99 != 42}$ 13 19 14 forall( T, [N] ) 15 void g( @array( T, N )@ & p, int i ) { 20 test2.cfa:3:1 error: Invalid application of existing declaration(s) in expression. 21 Applying untyped: Name: f ... to: Name: x 22 \end{cfa} 23 The call @f( x )@ is invalid because the @array@ lengths @99@ and @42@ do not match. 24 25 Next, function @g@ introduces a @forall@ prefix on type parameter @T@ and arbitrary \emph{dimension parameter} @N@, the new feature that represents a count of elements managed by the type system. 26 \begin{cfa} 27 forall( T, @[N]@ ) 28 void g( array( T, @N@ ) & p, int i ) { 16 29 T elem = p[i]; $\C{// dynamically checked: requires 0 <= i < N}$ 17 30 } 18 31 g( x, 0 ); $\C{// T is float, N is 99, dynamic subscript check succeeds}$ 19 g( x, 1000 ); $\C{// T is float, N is 99, dynamic subscript check fails}$ 20 \end{cfa} 21 This example declares variable @x@, with generic type @array@ using arguments @float@ and @99@. 22 Function @f@ is declared with an @array@ parameter of length @42@. 23 The call @f( x )@ is invalid because the @array@ lengths @99@ and @42@ do not match. 24 Next, function @g@ introduces a @forall@ prefix on type parameter @T@ and arbitrary \emph{dimension parameter} @N@, the new feature that represents a count of elements managed by the type system. 32 g( x, 1000 ); $\C{// T is float, N is 99, dynamic subscript check fails}\CRT$ 33 34 Cforall Runtime error: subscript 1000 exceeds dimension range [0,99) $for$ array 0x555555558020. 35 \end{cfa} 25 36 The call @g( x, 0 )@ is valid because @g@ accepts any length of array, where the type system infers @float@ for @T@ and length @99@ for @N@. 26 37 Inferring values for @T@ and @N@ is implicit without programmer involvement. … … 35 46 forall( [N] ) 36 47 void declDemo() { 37 float x1[N]; $\C{// built-in type ("C array")}$38 array(float, N) x2; $\C{// type from library}$48 float x1[N]; $\C{// built-in type ("C array")}$ 49 array(float, N) x2; $\C{// type from library}$ 39 50 } 40 51 \end{cfa} 41 52 Both of the locally-declared array variables, @x1@ and @x2@, have 42 elements, each element being a @float@. 42 53 The two variables have identical size and layout; they both encapsulate 42-float, stack \vs heap allocations with no additional ``bookkeeping'' allocations or headers. 43 Providing this explicit generic approach require da significant extension to the \CFA type system to support a full-feature, safe, efficient (space and time) array-type, which forms the foundation for more complex array forms in \CFA.54 Providing this explicit generic approach requires a significant extension to the \CFA type system to support a full-feature, safe, efficient (space and time) array-type, which forms the foundation for more complex array forms in \CFA. 44 55 45 56 Admittedly, the @array@ library type (type for @x2@) is syntactically different from its C counterpart. 46 57 A future goal (TODO xref) is to provide a built-in array type with syntax approaching C's (type for @x1@); 47 58 then, the library @array@ type can be removed giving \CFA a largely uniform array type. 48 At present, the built-in arrayis only partially supported, so the generic @array@ is used exclusively in the discussion;59 At present, the C syntax @array@ is only partially supported, so the generic @array@ is used exclusively in the discussion; 49 60 feature support and C compatibility are revisited in Section ? TODO. 50 61 51 Offering an @array@ type, as a distinct alternative to the C array, is consistent with \CFA's goal of backwards compatibility, \ie virtually all existing C (gcc) programs can be compiled by \CFA with only a small number of changes, similar to \CC (g++).62 Offering the @array@ type, as a distinct alternative to the C array, is consistent with \CFA's goal of backwards compatibility, \ie virtually all existing C (@gcc@) programs can be compiled by \CFA with only a small number of changes, similar to \CC (@g++@). 52 63 However, a few compatibility-breaking changes to the behaviour of the C array are necessary, both as an implementation convenience and to fix C's lax treatment of arrays. 53 64 Hence, the @array@ type is an opportunity to start from a clean slate and show a cohesive selection of features, making it unnecessary to deal with every inherited complexity introduced by the C array TODO xref. 54 65 55 My contributions are:66 My contributions in this chapter are: 56 67 \begin{enumerate} 57 68 \item A type system enhancement that lets polymorphic functions and generic types be parameterized by a numeric value: @forall( [N] )@. … … 100 111 \end{figure} 101 112 102 \VRef[Figure]{f:fHarness} shows a harness that uses the @f@ function illustratinghow dynamic values are fed into the @array@ type.103 Here, the dimension of the @x@, @y@, and @result@ arrays is specified from a command-line valueand these arrays are allocated on the stack.113 \VRef[Figure]{f:fHarness} shows a harness that uses function @f@ to illustrate how dynamic values are fed into the @array@ type. 114 Here, the dimension of arrays @x@, @y@, and @result@ is specified from a command-line value, @dim@, and these arrays are allocated on the stack. 104 115 Then the @x@ array is initialized with decreasing values, and the @y@ array with amounts offset by constant @0.005@, giving relative differences within tolerance initially and diverging for later values. 105 116 The program main is run (see figure bottom) with inputs @5@ and @7@ for sequence lengths. 106 The loops follow the familiar pattern of using the variable @ n@ to iterate through the arrays.107 Most importantly, the type system implicitly captures @ n@ at the call of @f@ and makes it available throughout @f@ as @N@.108 The example shows @ n@ adapting into a type-system managed length at the declarations of @x@, @y@, and @result@, @N@ adapting in the same way at @f@'s loop bound, and a pass-thru use of @n@ at @f@'s declaration of @ret@.117 The loops follow the familiar pattern of using the variable @dim@ to iterate through the arrays. 118 Most importantly, the type system implicitly captures @dim@ at the call of @f@ and makes it available throughout @f@ as @N@. 119 The example shows @dim@ adapting into a type-system managed length at the declarations of @x@, @y@, and @result@, @N@ adapting in the same way at @f@'s loop bound, and a pass-thru use of @dim@ at @f@'s declaration of @ret@. 109 120 Except for the lifetime-management issue of @result@, \ie explicit @free@, this program has eliminated both the syntactic and semantic problems associated with C arrays and their usage. 110 121 These benefits cannot be underestimated. … … 141 152 \CC has a (mistaken) belief that references are not objects, but pointers are objects. 142 153 In the \CC example, the arrays fall back on C arrays, which have a duality with references with respect to automatic dereferencing. 143 The \CFA array is a contiguous object with an address, which can stored as a reference or pointer.154 The \CFA array is a contiguous object with an address, which can be stored as a reference or pointer. 144 155 \item 145 156 C/\CC arrays cannot be copied, while \CFA arrays can be copied, making them a first-class object (although array copy is often avoided for efficiency). … … 151 162 152 163 @template< typename T, size_t N >@ 153 void copy( T ret[ N], T x[N] ) {164 void copy( T ret[@N@], T x[@N@] ) { 154 165 for ( int i = 0; i < N; i += 1 ) ret[i] = x[i]; 155 166 } … … 167 178 int main() { 168 179 @forall( T, [N] )@ // nested function 169 void copy( array( T, N ) & ret, array( T, N) & x ) {180 void copy( array( T, @N@ ) & ret, array( T, @N@ ) & x ) { 170 181 for ( i; 10 ) ret[i] = x[i]; 171 182 } … … 185 196 186 197 Continuing the discussion of \VRef[Figure]{f:fHarness}, the example has @f@ expecting two arrays of the same length. 187 A compile-time error occurs when attempting to call @f@ with arrays of differing lengths.198 As stated previous, a compile-time error occurs when attempting to call @f@ with arrays of differing lengths. 188 199 % removing leading whitespace 189 200 \lstinput[tabsize=1]{52-53}{hello-array.cfa} 190 201 \lstinput[tabsize=1,aboveskip=0pt]{62-64}{hello-array.cfa} 191 As is common practice in C, the programmer is free to cast, \ieto assert knowledge not shared with the type system.202 C allows casting to assert knowledge not shared with the type system. 192 203 \lstinput{70-74}{hello-array.cfa} 193 204 … … 197 208 \lstinput{10-15}{hello-accordion.cfa} 198 209 This structure's layout has the starting offset of @municipalities@ varying in @NprovTerty@, and the offset of @total_pt@ and @total_mun@ varying in both generic parameters. 199 For a function that operates on a @Can adaPop@ structure, the type system handles this variation transparently.210 For a function that operates on a @CanPop@ structure, the type system handles this variation transparently. 200 211 \lstinput{40-45}{hello-accordion.cfa} 201 \VRef[Figure]{f:checkHarness} shows program results where different offset values being used. 202 The output values show that @summarize@ and its caller agree on both the offsets (where the callee starts reading @cost_contribs@ and where the callee writes @total_cost@). 203 Yet the call site just says, ``pass the request.'' 212 \VRef[Figure]{f:checkHarness} shows the @CanPop@ harness and results with different array sizes, if the municipalities changed after a census. 204 213 205 214 \begin{figure} 206 215 \lstinput{60-68}{hello-accordion.cfa} 207 \lstinput{70-7 2}{hello-accordion.cfa}216 \lstinput{70-75}{hello-accordion.cfa} 208 217 \caption{\lstinline{check} Harness} 209 218 \label{f:checkHarness} … … 232 241 In general, storage layout is hidden by subscripting, and only appears when passing arrays among different programming languages or accessing specific hardware. 233 242 234 \VRef[Figure]{f:FixedVariable} shows two C90 approaches for manipulating contiguous arrays.243 \VRef[Figure]{f:FixedVariable} shows two C90 approaches for manipulating a contiguous matrix. 235 244 Note, C90 does not support VLAs. 236 The fixed-dimension approach uses the type system;245 The fixed-dimension approach (left) uses the type system; 237 246 however, it requires all dimensions except the first to be specified at compile time, \eg @m[][6]@, allowing all subscripting stride calculations to be generated with constants. 238 247 Hence, every matrix passed to @fp1@ must have exactly 6 columns but the row size can vary. 239 The variable-dimension approach ignores (violates) the type system, \ie argument and parameters types do not match, and manually performs pointer arithmetic for subscriptingin the macro @sub@.248 The variable-dimension approach (right) ignores (violates) the type system, \ie argument and parameters types do not match, and subscripting is performed manually using pointer arithmetic in the macro @sub@. 240 249 241 250 \begin{figure} … … 258 267 ... printf( "%d ", @sub( m, r, c )@ ); ... 259 268 } 260 int vm1[ 4][4], vm2[6][8]; // no VLA269 int vm1[@4@][@4@], vm2[@6@][@8@]; // no VLA 261 270 // initialize matrixes 262 271 fp2( 4, 4, vm1 ); … … 290 299 The language decides if the matrix and array-of-array are laid out the same or differently. 291 300 For example, an array-of-array may be an array of row pointers to arrays of columns, so the rows may not be contiguous in memory nor even the same length (triangular matrix). 292 Regardless, there is usually a uniform subscripting syntax masking the memory layout, even though the two array forms could be differentiated at the subscript level, \eg @m[1,2]@ \vs @aa[1][2]@.301 Regardless, there is usually a uniform subscripting syntax masking the memory layout, even though a language could differentiated between the two forms using subscript syntax, \eg @m[1,2]@ \vs @aa[1][2]@. 293 302 Nevertheless, controlling memory layout can make a difference in what operations are allowed and in performance (caching/NUMA effects). 294 303 … … 301 310 The focus of this work is on the contiguous multidimensional arrays in C. 302 311 The reason is that programmers are often forced to use the more complex array-of-array form when a contiguous array would be simpler, faster, and safer. 303 Nevertheless, the C array-of-array form continues to be usefulfor special circumstances.312 Nevertheless, the C array-of-array form is still important for special circumstances. 304 313 305 314 \VRef[Figure]{f:ContiguousNon-contiguous} shows the extensions made in C99 for manipulating contiguous \vs non-contiguous arrays.\footnote{C90 also supported non-contiguous arrays.} … … 313 322 While this contiguous-array capability is a step forward, it is still the programmer's responsibility to manually manage the number of dimensions and their sizes, both at the function definition and call sites. 314 323 That is, the array does not automatically carry its structure and sizes for use in computing subscripts. 315 While the non-contiguous style in @faa@ looks very similar to @fc@, the compiler only understands the unknown-sized array of row pointers, and it relies on the programmer to traverse the columns in a row correctly .324 While the non-contiguous style in @faa@ looks very similar to @fc@, the compiler only understands the unknown-sized array of row pointers, and it relies on the programmer to traverse the columns in a row correctly with a correctly bounded loop index. 316 325 Specifically, there is no requirement that the rows are the same length, like a poem with different length lines. 317 326 … … 365 374 this model has no awareness of dimensions just the ability to access memory at a distance from a reference point (base-displacement addressing), \eg @x + 23@ or @x[23}@ $\Rightarrow$ 23rd element from the start of @x@. 366 375 A programmer must manually build any notion of dimensions using other tools; 367 hence, this style is not offering multidimensional arrays \see{\VRef[Figure]{f:FixedVariable} }.376 hence, this style is not offering multidimensional arrays \see{\VRef[Figure]{f:FixedVariable} right example}. 368 377 \end{enumerate} 369 378 … … 381 390 A C/\CFA array interface includes the resulting memory layout. 382 391 The defining requirement of a type-2 system is the ability to slice a column from a column-finest matrix. 383 The required memory shape of such a slice is set, before any discussion of implementation.392 The required memory shape of such a slice is fixed, before any discussion of implementation. 384 393 The implementation presented here is how the \CFA array library wrangles the C type system, to make it do memory steps that are consistent with this layout. 385 394 TODO: do I have/need a presentation of just this layout, just the semantics of -[all]? … … 389 398 Beyond what C's array type offers, the new array brings direct support for working with a noncontiguous array slice, allowing a program to work with dimension subscripts given in a non-physical order. 390 399 391 The following examples use an @array( float, 5, 7) m@, loaded with values incremented by $0.1$, when stepping across the length-7 finely-strided column dimension, and stepping across the length-5 coarsely-strided row dimension.392 \par \noindent400 The following examples use the matrix declaration @array( float, 5, 7 ) m@, loaded with values incremented by $0.1$, when stepping across the length-7 finely-strided column dimension, and stepping across the length-5 coarsely-strided row dimension. 401 \par 393 402 \mbox{\lstinput{121-126}{hello-md.cfa}} 394 403 \par\noindent … … 405 414 Specifically, declaring the parameter @r@ with type @array@ means that @r@ is contiguous, which is unnecessarily restrictive. 406 415 That is, @r@ need only be of a container type that offers a subscript operator (of type @ptrdiff_t@ $\rightarrow$ @float@) with managed length @N@. 407 The new-array library provides the trait @ ix@, so-defined.416 The new-array library provides the trait @ar@, so-defined. 408 417 With it, the original declaration can be generalized with the same body. 409 418 \lstinput{43-44}{hello-md.cfa} 410 419 \lstinput[aboveskip=0pt]{145-145}{hello-md.cfa} 411 420 The nontrivial slicing in this example now allows passing a \emph{noncontiguous} slice to @print1d@, where the new-array library provides a ``subscript by all'' operation for this purpose. 412 In a multi-dimensional subscript operation, any dimension given as @all@ is a placeholder, \ie ``not yet subscripted by a value'', waiting for such a value, implementing the @ ix@ trait.421 In a multi-dimensional subscript operation, any dimension given as @all@ is a placeholder, \ie ``not yet subscripted by a value'', waiting for such a value, implementing the @ar@ trait. 413 422 \lstinput{150-151}{hello-md.cfa} 414 423 … … 471 480 In both cases, value 2 selects from the coarser dimension (rows of @x@), 472 481 while the value 3 selects from the finer dimension (columns of @x@). 473 The figure illustrates the value of each subexpression, comparing how numeric subscripting proceeds from @x@, vs from @x[all]@.482 The figure illustrates the value of each subexpression, comparing how numeric subscripting proceeds from @x@, \vs from @x[all]@. 474 483 Proceeding from @x@ gives the numeric indices as coarse then fine, while proceeding from @x[all]@ gives them fine then coarse. 475 484 These two starting expressions, which are the example's only multidimensional subexpressions 476 485 (those that received zero numeric indices so far), are illustrated with vertical steps where a \emph{first} numeric index would select. 477 486 478 The figure's presentation offers an intuition answering ,What is an atomic element of @x[all]@?479 From there, @x[all]@ itself is simply a two-dimensional array, in the strict C sense, of these strangebuilding blocks.487 The figure's presentation offers an intuition answering to: What is an atomic element of @x[all]@? 488 From there, @x[all]@ itself is simply a two-dimensional array, in the strict C sense, of these building blocks. 480 489 An atom (like the bottommost value, @x[all][3][2]@), is the contained value (in the square box) 481 490 and a lie about its size (the wedge above it, growing upward). 482 An array of these atoms (like the intermediate @x[all][3]@) is just a contiguous arrangement of them, 483 done according to their size, as announced. Call such an array a column. 484 A column is almost ready to be arranged into a matrix; it is the \emph{contained value} of the next-level building block, 485 but another lie about size is required. 486 At first, an atom needed to be arranged as if it were bigger, 487 but now a column needs to be arranged as if it is smaller (the wedge above it, shrinking upward). 491 An array of these atoms (like the intermediate @x[all][3]@) is just a contiguous arrangement of them, done according to their size; 492 call such an array a column. 493 A column is almost ready to be arranged into a matrix; 494 it is the \emph{contained value} of the next-level building block, but another lie about size is required. 495 At first, an atom needs to be arranged as if it were bigger, but now a column needs to be arranged as if it is smaller (the wedge above it, shrinking upward). 488 496 These lying columns, arranged contiguously according to their size (as announced) form the matrix @x[all]@. 489 Because @x[all]@ takes indices, first for the fine stride, then for the coarse stride, 490 it achieves the requirement of representing the transpose of @x@. 491 Yet every time the programmer presents an index, a mere C-array subscript is achieving the offset calculation. 497 Because @x[all]@ takes indices, first for the fine stride, then for the coarse stride, it achieves the requirement of representing the transpose of @x@. 498 Yet every time the programmer presents an index, a C-array subscript is achieving the offset calculation. 492 499 493 500 In the @x[all]@ case, after the finely strided subscript is done (column 3 is selected), … … 495 502 compared with where analogous rows appear when the row-level option is presented for @x@. 496 503 497 These size lies create an appearance of overlap.498 For example, in @x[all]@, the shaded band touches atoms 2.0, 2.1, 2.2, 2.3, 1.4, 1.5 and 1.6.504 \PAB{I don't understand this paragraph: These size lies create an appearance of overlap. 505 For example, in \lstinline{x[all]}, the shaded band touches atoms 2.0, 2.1, 2.2, 2.3, 1.4, 1.5 and 1.6. 499 506 But only the atom 2.3 is storing its value there. 500 The rest are lying about (conflicting) claims on this location, but never exercising these alleged claims. 507 The rest are lying about (conflicting) claims on this location, but never exercising these alleged claims.} 501 508 502 509 Lying is implemented as casting. … … 504 511 This structure uses one type in its internal field declaration and offers a different type as the return of its subscript operator. 505 512 The field within is a plain-C array of the fictional type, which is 7 floats long for @x[all][3][2]@ and 1 float long for @x[all][3]@. 506 The subscript operator presents what 's really inside, by casting to the type below the wedge oflie.513 The subscript operator presents what is really inside, by casting to the type below the wedge of the lie. 507 514 508 515 % Does x[all] have to lie too? The picture currently glosses over how it it advertises a size of 7 floats. I'm leaving that as an edge case benignly misrepresented in the picture. Edge cases only have to be handled right in the code. 509 516 510 Casting, overlapping and lying are unsafe. 511 The mission here is to implement a style-2 feature that the type system helps the programmer use safely. 512 The offered style-2 system is allowed to be internally unsafe, 513 just as C's implementation of a style-3 system (upon a style-4 system) is unsafe within, 514 even when the programmer is using it without casts or pointer arithmetic. 515 Having a style-2 system relieves the programmer from resorting to unsafe pointer arithmetic when working with noncontiguous slices. 516 517 The choice to implement this style-2 system upon C's style-3 arrays, rather than its style-4 pointer arithmetic, 518 reduces the attack surface of unsafe code. 519 My casting is unsafe, but I do not do any pointer arithmetic. 520 When a programmer works in the common-case style-3 subset (in the no-@[all]@ top of Figure~\ref{fig:subscr-all}), 521 my casts are identities, and the C compiler is doing its usual displacement calculations. 522 If I had implemented my system upon style-4 pointer arithmetic, 523 then this common case would be circumventing C's battle-hardened displacement calculations in favour of my own. 524 525 \noindent END: Paste looking for a home 517 Casting, overlapping, and lying are unsafe. 518 The mission is to implement a style-1 feature in the type system for safe use by a programmer. 519 The offered style-1 system is allowed to be internally unsafe, 520 just as C's implementation of a style-2 system (upon a style-3 system) is unsafe within, even when the programmer is using it without casts or pointer arithmetic. 521 Having a style-1 system relieves the programmer from resorting to unsafe pointer arithmetic when working with noncontiguous slices. 522 523 % PAB: repeat from previous paragraph. 524 % The choice to implement this style-1 system upon C's style-2 arrays, rather than its style-3 pointer arithmetic, reduces the attack surface of unsafe code. 525 % My casting is unsafe, but I do not do any pointer arithmetic. 526 % When a programmer works in the common-case style-2 subset (in the no-@[all]@ top of Figure~\ref{fig:subscr-all}), my casts are identities, and the C compiler is doing its usual displacement calculations. 527 % If I had implemented my system upon style-3 pointer arithmetic, then this common case would be circumventing C's battle-hardened displacement calculations in favour of my own. 528 529 % \noindent END: Paste looking for a home 526 530 527 531 The new-array library defines types and operations that ensure proper elements are accessed soundly in spite of the overlapping. 528 The private @arpk@ structure (array with explicit packing) is generic over these two types (and more): the contained element, what it is masquerading as. 529 This structure's public interface is the @array(...)@ construction macro and the two subscript operators. 530 Construction by @array@ initializes the masquerading-as type information to be equal to the contained-element information. 531 Subscripting by @all@ rearranges the order of masquerading-as types to achieve, in general, nontrivial striding. 532 Subscripting by a number consumes the masquerading-as size of the contained element type, does normal array stepping according to that size, and returns there element found there, in unmasked form. 533 534 The @arpk@ structure and its @-[i]@ operator are thus defined as: 535 \begin{cfa} 536 forall( ztype(N), $\C{// length of current dimension}$ 537 dtype(S) | sized(S), $\C{// masquerading-as}$ 538 dtype E_im, $\C{// immediate element, often another array}$ 539 dtype E_base $\C{// base element, e.g. float, never array}$ 532 The @arpk@ structure and its @-[i]@ operator are defined as: 533 \begin{cfa} 534 forall( 535 [N], $\C{// length of current dimension}$ 536 S & | sized(S), $\C{// masquerading-as}$ 537 Timmed &, $\C{// immediate element, often another array}$ 538 Tbase & $\C{// base element, e.g. float, never array}$ 540 539 ) { // distribute forall to each element 541 540 struct arpk { 542 541 S strides[N]; $\C{// so that sizeof(this) is N of S}$ 543 542 }; 544 // expose E_im, stride by S 545 E_im & ?[?]( arpk(N, S, E_im, E_base) & a, ptrdiff_t i ) { 546 return (E_im &) a.strides[i]; 543 // expose Timmed, stride by S 544 static inline Timmed & ?[?]( arpk( N, S, Timmed, Tbase ) & a, long int i ) { 545 subcheck( a, i, 0, N ); 546 return (Timmed &)a.strides[i]; 547 547 } 548 548 } 549 549 \end{cfa} 550 The private @arpk@ structure (array with explicit packing) is generic over four types: dimension length, masquerading-as, ... 551 This structure's public interface is hidden behind the @array(...)@ macro and the subscript operator. 552 Construction by @array@ initializes the masquerading-as type information to be equal to the contained-element information. 553 Subscripting by @all@ rearranges the order of masquerading-as types to achieve, in general, nontrivial striding. 554 Subscripting by a number consumes the masquerading-as size of the contained element type, does normal array stepping according to that size, and returns there element found there, in unmasked form. 550 555 551 556 An instantiation of the @arpk@ generic is given by the @array(E_base, N0, N1, ...)@ expansion, which is @arpk( N0, Rec, Rec, E_base )@, where @Rec@ is @array(E_base, N1, ...)@. … … 568 573 This section provides a demonstration of the effect. 569 574 570 The experiment compares the \CFA array system with the padded-room system [TODO:xref] most typically exemplified by Java arrays, but also reflected in the C++pattern where restricted vector usage models a checked array.575 The experiment compares the \CFA array system with the padded-room system [TODO:xref] most typically exemplified by Java arrays, but also reflected in the \CC pattern where restricted vector usage models a checked array. 571 576 The essential feature of this padded-room system is the one-to-one correspondence between array instances and the symbolic bounds on which dynamic checks are based. 572 The experiment compares with the C++version to keep access to generated assembly code simple.573 574 As a control case, a simple loop (with no reused dimension sizes) is seen to get the same optimization treatment in both the \CFA and C++versions.577 The experiment compares with the \CC version to keep access to generated assembly code simple. 578 579 As a control case, a simple loop (with no reused dimension sizes) is seen to get the same optimization treatment in both the \CFA and \CC versions. 575 580 When the programmer treats the array's bound correctly (making the subscript ``obviously fine''), no dynamic bound check is observed in the program's optimized assembly code. 576 581 But when the bounds are adjusted, such that the subscript is possibly invalid, the bound check appears in the optimized assembly, ready to catch an occurrence the mistake. … … 589 594 \section{Comparison with other arrays} 590 595 596 597 \subsection{Rust} 598 591 599 \CFA's array is the first lightweight application of dependently-typed bound tracking to an extension of C. 592 Other extensions of C that apply dependently-typed bound tracking are heavyweight, in that the bound tracking is part of a linearly typed ownership system thatfurther helps guarantee statically the validity of every pointer deference.600 Other extensions of C that apply dependently-typed bound tracking are heavyweight, in that the bound tracking is part of a linearly-typed ownership-system, which further helps guarantee statically the validity of every pointer deference. 593 601 These systems, therefore, ask the programmer to convince the type checker that every pointer dereference is valid. 594 602 \CFA imposes the lighter-weight obligation, with the more limited guarantee, that initially-declared bounds are respected thereafter. … … 598 606 The \CFA array, applied to accordion structures [TOD: cross-reference] \emph{implies} the necessary pointer arithmetic, generated automatically, and not appearing at all in a user's program. 599 607 600 \subsection{Safety in a padded room} 601 602 Java's array [TODO:cite] is a straightforward example of assuring safety against undefined behaviour, at a cost of expressiveness for more applied properties. 603 Consider the array parameter declarations in: 604 608 609 \subsection{Java} 610 611 Java arrays are arrays-of-arrays because all objects are references \see{\VRef{toc:mdimpl}}. 612 For each array, Java implicitly storages the array dimension in a descriptor, supporting array length, subscript checking, and allowing dynamically-sized array-parameter declarations. 613 \begin{cquote} 605 614 \begin{tabular}{rl} 606 615 C & @void f( size_t n, size_t m, float x[n][m] );@ \\ 607 Java & @void f( float [][] a);@616 Java & @void f( float x[][] );@ 608 617 \end{tabular} 609 610 Java's safety against undefined behaviour assures the callee that, if @x@ is non-null, then @a.length@ is a valid access (say, evaluating to the number $\ell$) and if @i@ is in $[0, \ell)$ then @x[i]@ is a valid access. 611 If a value of @i@ outside this range is used, a runtime error is guaranteed. 612 In these respects, C offers no guarantees at all. 613 Notably, the suggestion that @n@ is the intended size of the first dimension of @x@ is documentation only. 614 Indeed, many might prefer the technically equivalent declarations @float x[][m]@ or @float (*a)[m]@ as emphasizing the ``no guarantees'' nature of an infrequently used language feature, over using the opportunity to explain a programmer intention. 615 Moreover, even if @x[0][0]@ is valid for the purpose intended, C's basic infamous feature is the possibility of an @i@, such that @x[i][0]@ is not valid for the same purpose, and yet, its evaluation does not produce an error. 616 617 Java's lack of expressiveness for more applied properties means these outcomes are possible: 618 \begin{itemize} 619 \item @x[0][17]@ and @x[2][17]@ are valid accesses, yet @x[1][17]@ is a runtime error, because @x[1]@ is a null pointer 620 \item the same observation, now because @x[1]@ refers to an array of length 5 621 \item execution times vary, because the @float@ values within @x@ are sometimes stored nearly contiguously, and other times, not at all 622 \end{itemize} 623 C's array has none of these limitations, nor do any of the ``array language'' comparators discussed in this section. 624 625 This Java level of safety and expressiveness is also exemplified in the C family, with the commonly given advice [TODO:cite example], for C++ programmers to use @std::vector@ in place of the C++ language's array, which is essentially the C array. 626 The advice is that, while a vector is also more powerful (and quirky) than an array, its capabilities include options to preallocate with an upfront size, to use an available bound-checked accessor (@a.at(i)@ in place of @x[i]@), to avoid using @push_back@, and to use a vector of vectors. 627 Used with these restrictions, out-of-bound accesses are stopped, and in-bound accesses never exercise the vector's ability to grow, which is to say, they never make the program slow to reallocate and copy, and they never invalidate the program's other references to the contained values. 628 Allowing this scheme the same referential integrity assumption that \CFA enjoys [TODO:xref], this scheme matches Java's safety and expressiveness exactly. 629 [TODO: decide about going deeper; some of the Java expressiveness concerns have mitigations, up to even more tradeoffs.] 618 \end{cquote} 619 However, in the C prototype, the parameters @n@ and @m@ are documentation only as the intended size of the first and second dimension of @x@. 620 \VRef[Figure]{f:JavaVsCTriangularMatrix} compares a triangular matrix (array-of-arrays) in dynamically safe Java to unsafe C. 621 Each dynamically sized row in Java stores its dimension, while C requires the programmer to manage these sizes explicitly (@rlnth@). 622 All subscripting is Java has bounds checking, while C has none. 623 Both Java and C require explicit null checking, otherwise there is a runtime failure. 624 625 \begin{figure} 626 \setlength{\tabcolsep}{15pt} 627 \begin{tabular}{ll@{}} 628 \begin{java} 629 int m[][] = { // triangular matrix 630 new int [4], 631 new int [3], 632 new int [2], 633 new int [1], 634 null 635 }; 636 637 for ( int r = 0; r < m.length; r += 1 ) { 638 if ( m[r] == null ) continue; 639 for ( int c = 0; c < m[r].length; c += 1 ) { 640 m[r][c] = c + r; // subscript checking 641 } 642 643 } 644 645 for ( int r = 0; r < m.length; r += 1 ) { 646 if ( m[r] == null ) { 647 System.out.println( "null row" ); 648 continue; 649 } 650 for ( int c = 0; c < m[r].length; c += 1 ) { 651 System.out.print( m[r][c] + " " ); 652 } 653 System.out.println(); 654 655 } 656 \end{java} 657 & 658 \begin{cfa} 659 int * m[5] = { // triangular matrix 660 calloc( 4, sizeof(int) ), 661 calloc( 3, sizeof(int) ), 662 calloc( 2, sizeof(int) ), 663 calloc( 1, sizeof(int) ), 664 NULL 665 }; 666 int rlnth = 4; 667 for ( int r = 0; r < 5; r += 1 ) { 668 if ( m[r] == NULL ) continue; 669 for ( int c = 0; c < rlnth; c += 1 ) { 670 m[r][c] = c + r; // no subscript checking 671 } 672 rlnth -= 1; 673 } 674 rlnth = 4; 675 for ( int r = 0; r < 5; r += 1 ) { 676 if ( m[r] == NULL ) { 677 printf( "null row\n" ); 678 continue; 679 } 680 for ( int c = 0; c < rlnth; c += 1 ) { 681 printf( "%d ", m[r][c] ); 682 } 683 printf( "\n" ); 684 rlnth -= 1; 685 } 686 \end{cfa} 687 \end{tabular} 688 \caption{Java (left) \vs C (right) Triangular Matrix} 689 \label{f:JavaVsCTriangularMatrix} 690 \end{figure} 691 692 The downside of the arrays-of-arrays approach is performance due to pointer chasing versus pointer arithmetic for a contiguous arrays. 693 Furthermore, there is the cost of managing the implicit array descriptor. 694 It is unlikely that a JIT can dynamically rewrite an arrays-of-arrays form into a contiguous form. 695 696 697 \subsection{\CC} 698 699 Because C arrays are difficult and dangerous, the mantra for \CC programmers is to use @std::vector@ in place of the C array. 700 While the vector size can grow and shrink dynamically, \vs a fixed-size dynamic size with VLAs, the cost of this extra feature is mitigated by preallocating the maximum size (like the VLA) at the declaration (one dynamic call) to avoid using @push_back@. 701 \begin{c++} 702 vector< vector< int > > m( 5, vector<int>(8) ); // initialize size of 5 x 8 with 6 dynamic allocations 703 \end{c++} 704 Multidimensional arrays are arrays-of-arrays with associated costs. 705 Each @vector@ array has an array descriptor contain the dimension, which allows bound checked using @x.at(i)@ in place of @x[i]@. 706 Used with these restrictions, out-of-bound accesses are caught, and in-bound accesses never exercise the vector's ability to grow, preventing costly reallocate and copy, and never invalidate references to contained values. 707 This scheme matches Java's safety and expressiveness exactly, but with the inherent costs. 708 630 709 631 710 \subsection{Levels of dependently typed arrays} … … 655 734 it can also do these other cool checks, but watch how I can mess with its conservativeness and termination 656 735 657 Two current, state-of-the-art array languages, Dex\cite{arr:dex:long} and Futhark\cite{arr:futhark:tytheory}, offer offernovel contributions concerning similar, restricted dependent types for tracking array length.736 Two current, state-of-the-art array languages, Dex\cite{arr:dex:long} and Futhark\cite{arr:futhark:tytheory}, offer novel contributions concerning similar, restricted dependent types for tracking array length. 658 737 Unlike \CFA, both are garbage-collected functional languages. 659 738 Because they are garbage-collected, referential integrity is built-in, meaning that the heavyweight analysis, that \CFA aims to avoid, is unnecessary. … … 727 806 [TODO: introduce Ada in the comparators] 728 807 729 In Ada and Dex, an array is conceived as a function whose domain must satisfy only certain structural assumptions, while in C, C++, Java, Futhark and \CFA today, the domain is a prefix of the natural numbers.808 In Ada and Dex, an array is conceived as a function whose domain must satisfy only certain structural assumptions, while in C, \CC, Java, Futhark and \CFA today, the domain is a prefix of the natural numbers. 730 809 The generality has obvious aesthetic benefits for programmers working on scheduling resources to weekdays, and for programmers who prefer to count from an initial number of their own choosing. 731 810 -
doc/theses/mike_brooks_MMath/programs/hello-accordion.cfa
r38e20a80 rce02877 9 9 10 10 forall( T, @[NprovTerty]@, @[Nmunicipalities]@ ) 11 struct Can adaPop {11 struct CanPop { 12 12 array( T, @NprovTerty@ ) provTerty; $\C{// nested VLA}$ 13 13 array( T, @Nmunicipalities@ ) municipalities; $\C{// nested VLA}$ … … 19 19 20 20 forall( T, [NprovTerty], [Nmunicipalities] ) 21 void ?{}( T &, Can adaPop( T, NprovTerty, Nmunicipalities ) & this ) {}21 void ?{}( T &, CanPop( T, NprovTerty, Nmunicipalities ) & this ) {} 22 22 23 23 forall( T &, [NprovTerty], [Nmunicipalities] ) 24 void ^?{}( Can adaPop( T, NprovTerty, Nmunicipalities ) & this ) {}24 void ^?{}( CanPop( T, NprovTerty, Nmunicipalities ) & this ) {} 25 25 26 26 … … 39 39 40 40 forall( T, [NprovTerty], [Nmunicipalities] ) 41 void check( Can adaPop( T, NprovTerty, Nmunicipalities ) & pop ) with( pop ) {41 void check( CanPop( T, NprovTerty, Nmunicipalities ) & pop ) with( pop ) { 42 42 total_pt = total_mun = 0; 43 43 for ( i; NprovTerty ) total_pt += provTerty[i]; … … 60 60 int main( int argc, char * argv[] ) { 61 61 const int npt = ato( argv[1] ), nmun = ato( argv[2] ); 62 @Can adaPop( int, npt, nmun ) pop;@62 @CanPop( int, npt, nmun ) pop;@ 63 63 // read in population numbers 64 64 @check( pop );@ … … 71 71 Total province/territory: 36,991,981 72 72 Total municipalities: 36,991,981 73 $\$$ ./a.out 13 3654 74 Total province/territory: 36,991,981 75 Total municipalities: 36,991,981 73 76 */ 74 77 -
doc/theses/mike_brooks_MMath/programs/hello-array.cfa
r38e20a80 rce02877 9 9 10 10 forall( [@N@] ) $\C{// array dimension}$ 11 array( bool, @N@ ) & f( array( float, @N@ ) & x, array( float, @N@ ) & y ) {11 array( bool, @N@ ) & f( array( float, @N@ ) & x, array( float, @N@ ) & y ) { 12 12 array( bool, @N@ ) & ret = *@alloc@(); $\C{// sizeof ret used by alloc}$ 13 13 for ( i; @N@ ) { … … 29 29 30 30 int main( int argc, char * argv[] ) { 31 const int @ n@ = ato( argv[1] );$\C{// deduce conversion type}$32 array( float, @ n@ ) x, y; $\C{// VLAs}$33 for ( i; n ) {$\C{// initialize arrays}$31 const int @dim@ = ato( argv[1] ); $\C{// deduce conversion type}$ 32 array( float, @dim@ ) x, y; $\C{// VLAs}$ 33 for ( i; dim ) { $\C{// initialize arrays}$ 34 34 x[i] = 3.14 / (i + 1); 35 35 y[i] = x[i] + 0.005 ; 36 36 } 37 array( bool, @ n@ ) & result = @f( x, y )@; $\C{// call}$37 array( bool, @dim@ ) & result = @f( x, y )@; $\C{// call}$ 38 38 sout | "result: " | nonl; $\C{// print result}$ 39 for ( i; n)39 for ( i; dim ) 40 40 sout | result[i] | nonl; 41 41 sout | nl; -
doc/theses/mike_brooks_MMath/programs/hello-md.cfa
r38e20a80 rce02877 138 138 139 139 140 print1d_cstyle( m[ 2 ]); $\C{// row 2: 2.0 2.1 2.2 2.3 2.4 2.5 2.6}$140 print1d_cstyle( @m[ 2 ]@ ); $\C{// row 2: 2.0 2.1 2.2 2.3 2.4 2.5 2.6}$ 141 141 142 142 -
doc/theses/mike_brooks_MMath/uw-ethesis-frontpgs.tex
r38e20a80 rce02877 140 140 141 141 I would like to thank all the little people who made this thesis possible. 142 143 Finally, a special thank you to Huawei Canada for funding this work. 142 144 \cleardoublepage 143 145 \phantomsection % allows hyperref to link to the correct page -
doc/theses/mike_brooks_MMath/uw-ethesis.tex
r38e20a80 rce02877 105 105 \lstnewenvironment{c++}[1][]{\lstset{language=[GNU]C++,escapechar=\$,moredelim=**[is][\color{red}]{@}{@},}\lstset{#1}}{} 106 106 \lstnewenvironment{pascal}[1][]{\lstset{language=pascal,escapechar=\$,moredelim=**[is][\color{red}]{@}{@},}\lstset{#1}}{} 107 \lstnewenvironment{java}[1][]{\lstset{language=java,escapechar=\$,moredelim=**[is][\color{red}]{@}{@},}\lstset{#1}}{} 107 108 \lstset{inputpath={programs}} 108 109 -
doc/user/user.tex
r38e20a80 rce02877 11 11 %% Created On : Wed Apr 6 14:53:29 2016 12 12 %% Last Modified By : Peter A. Buhr 13 %% Last Modified On : Tue Jul 9 10:43:40202414 %% Update Count : 6 88713 %% Last Modified On : Fri Jul 26 06:56:11 2024 14 %% Update Count : 6955 15 15 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 16 16 … … 1598 1598 and implicitly opened \emph{after} a function-body open, to give them higher priority: 1599 1599 \begin{cfa} 1600 void f( S & s, char ®c® ) with ( s ) ®with( §\emph{\R{params}}§ )® { // syntax notallowed, illustration only1600 void f( S & s, char ®c® ) with ( s ) ®with( §\emph{\R{params}}§ )® { // syntax disallowed, illustration only 1601 1601 s.c = ®c;® i = 3; d = 5.5; 1602 1602 } … … 3313 3313 for example, the following is incorrect: 3314 3314 \begin{cfa} 3315 * [ int x ] f () fp; §\C{// routine name "f" is notallowed}§3316 \end{cfa} 3317 3318 3319 \section{ Named and Default Arguments}3320 3321 Named\index{named arguments}\index{arguments!named} and default\index{default arguments}\index{arguments!default} arguments~\cite{Hardgrave76}\footnote{3315 * [ int x ] f () fp; §\C{// routine name "f" is disallowed}§ 3316 \end{cfa} 3317 3318 3319 \section{Default and Named Parameter} 3320 3321 Default\index{default parameter}\index{parameter!default} and named\index{named parameter}\index{parameter!named} parameters~\cite{Hardgrave76}\footnote{ 3322 3322 Francez~\cite{Francez77} proposed a further extension to the named-parameter passing style, which specifies what type of communication (by value, by reference, by name) the argument is passed to the routine.} 3323 3323 are two mechanisms to simplify routine call. 3324 Both mechanisms are discussed with respect to \CFA. 3325 \begin{description} 3326 \item[Named (or Keyword) Arguments:] 3327 provide the ability to specify an argument to a routine call using the parameter name rather than the position of the parameter. 3328 For example, given the routine: 3329 \begin{cfa} 3330 void p( int x, int y, int z ) {...} 3331 \end{cfa} 3332 a positional call is: 3333 \begin{cfa} 3334 p( 4, 7, 3 ); 3335 \end{cfa} 3336 whereas a named (keyword) call may be: 3337 \begin{cfa} 3338 p( z : 3, x : 4, y : 7 ); §\C{// rewrite \(\Rightarrow\) p( 4, 7, 3 )}§ 3339 \end{cfa} 3340 Here the order of the arguments is unimportant, and the names of the parameters are used to associate argument values with the corresponding parameters. 3341 The compiler rewrites a named call into a positional call. 3342 The advantages of named parameters are: 3343 \begin{itemize} 3344 \item 3345 Remembering the names of the parameters may be easier than the order in the routine definition. 3346 \item 3347 Parameter names provide documentation at the call site (assuming the names are descriptive). 3348 \item 3349 Changes can be made to the order or number of parameters without affecting the call (although the call must still be recompiled). 3350 \end{itemize} 3351 3352 Unfortunately, named arguments do not work in C-style programming-languages because a routine prototype is not required to specify parameter names, nor do the names in the prototype have to match with the actual definition. 3353 For example, the following routine prototypes and definition are all valid. 3354 \begin{cfa} 3355 void p( int, int, int ); §\C{// equivalent prototypes}§ 3356 void p( int x, int y, int z ); 3357 void p( int y, int x, int z ); 3358 void p( int z, int y, int x ); 3359 void p( int q, int r, int s ) {} §\C{// match with this definition}§ 3360 \end{cfa} 3361 Forcing matching parameter names in routine prototypes with corresponding routine definitions is possible, but goes against a strong tradition in C programming. 3362 Alternatively, prototype definitions can be eliminated by using a two-pass compilation, and implicitly creating header files for exports. 3363 The former is easy to do, while the latter is more complex. 3364 3365 Furthermore, named arguments do not work well in a \CFA-style programming-languages because they potentially introduces a new criteria for type matching. 3366 For example, it is technically possible to disambiguate between these two overloaded definitions of ©f© based on named arguments at the call site: 3367 \begin{cfa} 3368 int f( int i, int j ); 3369 int f( int x, double y ); 3370 3371 f( j : 3, i : 4 ); §\C{// 1st f}§ 3372 f( x : 7, y : 8.1 ); §\C{// 2nd f}§ 3373 f( 4, 5 ); §\C{// ambiguous call}§ 3374 \end{cfa} 3375 However, named arguments compound routine resolution in conjunction with conversions: 3376 \begin{cfa} 3377 f( i : 3, 5.7 ); §\C{// ambiguous call ?}§ 3378 \end{cfa} 3379 Depending on the cost associated with named arguments, this call could be resolvable or ambiguous. 3380 Adding named argument into the routine resolution algorithm does not seem worth the complexity. 3381 Therefore, \CFA does \emph{not} attempt to support named arguments. 3382 3383 \item[Default Arguments] 3384 provide the ability to associate a default value with a parameter so it can be optionally specified in the argument list. 3385 For example, given the routine: 3386 \begin{cfa} 3387 void p( int x = 1, int y = 2, int z = 3 ) {...} 3388 \end{cfa} 3389 the allowable positional calls are: 3390 \begin{cfa} 3391 p(); §\C{// rewrite \(\Rightarrow\) p( 1, 2, 3 )}§ 3392 p( 4 ); §\C{// rewrite \(\Rightarrow\) p( 4, 2, 3 )}§ 3393 p( 4, 4 ); §\C{// rewrite \(\Rightarrow\) p( 4, 4, 3 )}§ 3394 p( 4, 4, 4 ); §\C{// rewrite \(\Rightarrow\) p( 4, 4, 4 )}§ 3395 // empty arguments 3396 p( , 4, 4 ); §\C{// rewrite \(\Rightarrow\) p( 1, 4, 4 )}§ 3397 p( 4, , 4 ); §\C{// rewrite \(\Rightarrow\) p( 4, 2, 4 )}§ 3398 p( 4, 4, ); §\C{// rewrite \(\Rightarrow\) p( 4, 4, 3 )}§ 3399 p( 4, , ); §\C{// rewrite \(\Rightarrow\) p( 4, 2, 3 )}§ 3400 p( , 4, ); §\C{// rewrite \(\Rightarrow\) p( 1, 4, 3 )}§ 3401 p( , , 4 ); §\C{// rewrite \(\Rightarrow\) p( 1, 2, 4 )}§ 3402 p( , , ); §\C{// rewrite \(\Rightarrow\) p( 1, 2, 3 )}§ 3403 \end{cfa} 3324 3325 3326 \subsection{Default} 3327 3328 A default parameter associates a default value with a parameter so it can be optionally specified in the argument list. 3329 For example, given the routine prototype: 3330 \begin{cfa} 3331 void f( int x ®= 1®, int y ®= 2®, int z ®= 3® ); 3332 \end{cfa} 3333 allowable calls are: 3334 \begin{cquote} 3335 \setlength{\tabcolsep}{0.75in} 3336 \begin{tabular}{@{}ll@{}} 3337 \textbf{positional arguments} & \textbf{empty arguments} \\ 3338 \begin{cfa} 3339 f(); §\C[0.75in]{// rewrite \(\Rightarrow\) f( 1, 2, 3 )}§ 3340 f( 4 ); §\C{// rewrite \(\Rightarrow\) f( 4, 2, 3 )}§ 3341 f( 4, 4 ); §\C{// rewrite \(\Rightarrow\) f( 4, 4, 3 )}§ 3342 f( 4, 4, 4 ); §\C{// rewrite \(\Rightarrow\) f( 4, 4, 4 )}\CRT§ 3343 3344 3345 3346 \end{cfa} 3347 & 3348 \begin{cfa} 3349 f( ®?®, 4, 4 ); §\C[1.0in]{// rewrite \(\Rightarrow\) f( 1, 4, 4 )}§ 3350 f( 4, ®?®, 4 ); §\C{// rewrite \(\Rightarrow\) f( 4, 2, 4 )}§ 3351 f( 4, 4, ®?® ); §\C{// rewrite \(\Rightarrow\) f( 4, 4, 3 )}§ 3352 f( 4, ®?®, ®?® ); §\C{// rewrite \(\Rightarrow\) f( 4, 2, 3 )}§ 3353 f( ®?®, 4, ®?® ); §\C{// rewrite \(\Rightarrow\) f( 1, 4, 3 )}§ 3354 f( ®?®, ®?®, 4 ); §\C{// rewrite \(\Rightarrow\) f( 1, 2, 4 )}§ 3355 f( ®?®, ®?®, ®?® ); §\C{// rewrite \(\Rightarrow\) f( 1, 2, 3 )}\CRT§ 3356 \end{cfa} 3357 \end{tabular} 3358 \end{cquote} 3359 where the ©?© selects the default value as the argument. 3404 3360 Here the missing arguments are inserted from the default values in the parameter list. 3405 3361 The compiler rewrites missing default values into explicit positional arguments. … … 3408 3364 \item 3409 3365 Routines with a large number of parameters are often very generalized, giving a programmer a number of different options on how a computation is performed. 3410 For many of these kinds ofroutines, there are standard or default settings that work for the majority of computations.3366 For many of these routines, there are standard or default settings that work for the majority of computations. 3411 3367 Without default values for parameters, a programmer is forced to specify these common values all the time, resulting in long argument lists that are error prone. 3412 3368 \item … … 3422 3378 Instead, a default value is used, which may not be the programmer's intent. 3423 3379 3424 Default values may only appear in a prototype versus definition context:3425 \begin{cfa} 3426 void p( int x, int y = 2, int z = 3 );§\C{// prototype: allowed}§3427 void p( int, int = 2, int = 3 );§\C{// prototype: allowed}§3428 void p( int x, int y = 2, int z = 3 ) {} §\C{// definition: notallowed}§3380 Default parameters may only appear in a prototype versus definition context: 3381 \begin{cfa} 3382 void f( int x, int y = 2, int z = 3 ); §\C{// prototype: allowed}§ 3383 void f( int, int = 2, int = 3 ); §\C{// prototype: allowed}§ 3384 void f( int x, int y = 2, int z = 3 ) ®{}® §\C{// definition: disallowed}§ 3429 3385 \end{cfa} 3430 3386 The reason for this restriction is to allow separate compilation. 3431 Multiple prototypes with different default values is an error. 3432 \end{description} 3433 3434 Ellipse (``...'') arguments present problems when used with default arguments. 3435 The conflict occurs because both named and ellipse arguments must appear after positional arguments, giving two possibilities: 3436 \begin{cfa} 3437 p( /* positional */, ... , /* named */ ); 3438 p( /* positional */, /* named */, ... ); 3439 \end{cfa} 3440 While it is possible to implement both approaches, the first possibly is more complex than the second, \eg: 3441 \begin{cfa} 3442 p( int x, int y, int z, ... ); 3443 p( 1, 4, 5, 6, z : 3, y : 2 ); §\C{// assume p( /* positional */, ... , /* named */ );}§ 3444 p( 1, z : 3, y : 2, 4, 5, 6 ); §\C{// assume p( /* positional */, /* named */, ... );}§ 3445 \end{cfa} 3446 In the first call, it is necessary for the programmer to conceptually rewrite the call, changing named arguments into positional, before knowing where the ellipse arguments begin. 3447 Hence, this approach seems significantly more difficult, and hence, confusing and error prone. 3448 In the second call, the named arguments separate the positional and ellipse arguments, making it trivial to read the call. 3449 3450 The problem is exacerbated with default arguments, \eg: 3451 \begin{cfa} 3452 void p( int x, int y = 2, int z = 3... ); 3453 p( 1, 4, 5, 6, z : 3 ); §\C{// assume p( /* positional */, ... , /* named */ );}§ 3454 p( 1, z : 3, 4, 5, 6 ); §\C{// assume p( /* positional */, /* named */, ... );}§ 3455 \end{cfa} 3456 The first call is an error because arguments 4 and 5 are actually positional not ellipse arguments; 3457 therefore, argument 5 subsequently conflicts with the named argument z : 3. 3458 In the second call, the default value for y is implicitly inserted after argument 1 and the named arguments separate the positional and ellipse arguments, making it trivial to read the call. 3459 For these reasons, \CFA requires named arguments before ellipse arguments. 3460 Finally, while ellipse arguments are needed for a small set of existing C routines, like ©printf©, the extended \CFA type system largely eliminates the need for ellipse arguments \see{\VRef{s:Overloading}}, making much of this discussion moot. 3387 Multiple prototypes with different default values is undefined. 3461 3388 3462 3389 Default arguments and overloading \see{\VRef{s:Overloading}} are complementary. … … 3466 3393 \multicolumn{1}{c@{\hspace{3em}}}{\textbf{default arguments}} & \multicolumn{1}{c}{\textbf{overloading}} \\ 3467 3394 \begin{cfa} 3468 void p( int x, int y = 2, int z = 3 ) {...}3395 void f( int x, int y = 2, int z = 3 ) {...} 3469 3396 3470 3397 … … 3472 3399 & 3473 3400 \begin{cfa} 3474 void p( int x, int y, int z ) {...}3475 void p( int x ) { p( x, 2, 3 ); }3476 void p( int x, int y ) { p( x, y, 3 ); }3401 void f( int x, int y, int z ) {...} 3402 void f( int x ) { f( x, 2, 3 ); } 3403 void f( int x, int y ) { f( x, y, 3 ); } 3477 3404 \end{cfa} 3478 3405 \end{tabular} 3479 3406 \end{cquote} 3480 3407 the number of required overloaded routines is linear in the number of default values, which is unacceptable growth. 3481 In general, overloading should only be used over default arguments if the body of the routine is significantly different. 3482 Furthermore, overloading cannot handle accessing default arguments in the middle of a positional list, via a missing argument, such as: 3483 \begin{cfa} 3484 p( 1, /* default */, 5 ); §\C{// rewrite \(\Rightarrow\) p( 1, 2, 5 )}§ 3485 \end{cfa} 3486 3487 Given the \CFA restrictions above, both named and default arguments are backwards compatible. 3488 \Index*[C++]{\CC{}} only supports default arguments; 3489 \Index*{Ada} supports both named and default arguments. 3408 In general, overloading is used over default parameters, if the body of the routine is significantly different. 3409 Furthermore, overloading cannot handle accessing default arguments in the middle of a positional list. 3410 \begin{cfa} 3411 f( 1, ®?®, 5 ); §\C{// rewrite \(\Rightarrow\) f( 1, 2, 5 )}§ 3412 \end{cfa} 3413 3414 3415 \subsection{Named (or Keyword)} 3416 3417 A named (keyword) parameter provides the ability to specify an argument to a routine call using the parameter name rather than the position of the parameter. 3418 For example, given the routine prototype: 3419 \begin{cfa} 3420 void f( int ®?®x, int ®?®y, int ®?®z ); 3421 \end{cfa} 3422 allowable calls are: 3423 \begin{cfa} 3424 f( ?x = 3, ?y = 4, ?z = 5 ); §\C{// rewrite \(\Rightarrow\) f( 3, 4, 5 )}§ 3425 f( ?y = 4, ?z = 5, ?x = 3 ); §\C{// rewrite \(\Rightarrow\) f( 3, 4, 5 )}§ 3426 f( ?z = 5, ?x = 3, ?y = 4 ); §\C{// rewrite \(\Rightarrow\) f( 3, 4, 5 )}§ 3427 f( ?x = 3, ?z = 5, ?y = 4 ); §\C{// rewrite \(\Rightarrow\) f( 3, 4, 5 )}§ 3428 \end{cfa} 3429 Here the ordering of the the parameters and arguments is unimportant, and the names of the parameters are used to associate argument values with the corresponding parameters. 3430 The compiler rewrites a named call into a positional call. 3431 Note, the syntax ©?x = 3© is necessary for the argument, because ©x = 3© has an existing meaning, \ie assign ©3© to ©x© and pass the value of ©x©. 3432 The advantages of named parameters are: 3433 \begin{itemize} 3434 \item 3435 Remembering the names of the parameters may be easier than the order in the routine definition. 3436 \item 3437 Parameter names provide documentation at the call site (assuming the names are descriptive). 3438 \item 3439 Changes can be made to the order or number of parameters without affecting the call (although the call must still be recompiled). 3440 \end{itemize} 3441 3442 Named parameters may only appear in a prototype versus definition context: 3443 \begin{cfa} 3444 void f( int x, int ?y, int ?z ); §\C{// prototype: allowed}§ 3445 void f( int ?x, int , int ?z ); §\C{// prototype: allowed}§ 3446 void f( int x, int ?y, int ?z ) ®{}® §\C{// definition: disallowed}§ 3447 \end{cfa} 3448 The reason for this restriction is to allow separate compilation. 3449 Multiple prototypes with different positional parameter names is an error. 3450 3451 The named parameter is not part of type resolution; 3452 only the type of the expression assigned to the named parameter affects type resolution. 3453 \begin{cfa} 3454 int f( int ?i, int ?j ); 3455 int f( int ?i, double ?j ); 3456 f( ?j = 3, ?i = 4 ); §\C{// 1st f}§ 3457 f( ?i = 7, ?j = 8.1 ); §\C{// 2nd f}§ 3458 \end{cfa} 3459 3460 3461 \subsection{Mixed Default/Named} 3462 3463 Default and named parameters can be intermixed and named parameters can have a default value. 3464 For example, given the routine prototype: 3465 \begin{cfa} 3466 void f( int x, int y ®= 1®, int ®?®z ®= 2® ); 3467 \end{cfa} 3468 allowable calls are: 3469 \begin{cfa} 3470 f( 3 ); §\C{// rewrite \(\Rightarrow\) f( 3, 1, 2 )}§ 3471 f( 3, 4 ); §\C{// rewrite \(\Rightarrow\) f( 3, 4, 2 )}§ 3472 f( 3, ?z = 5 ); §\C{// rewrite \(\Rightarrow\) f( 3, 1, 5 )}§ 3473 f( 3, 4, ?z = 5 ); §\C{// rewrite \(\Rightarrow\) f( 3, 4, 5 )}§ 3474 f( ?z = 5, 3 ); §\C{// rewrite \(\Rightarrow\) f( 3, 1, 5 )}§ 3475 f( 3, ?z = 5, 4 ); §\C{// rewrite \(\Rightarrow\) f( 3, 4, 5 )}§ 3476 \end{cfa} 3477 Finally, the ellipse (``...'') parameter must appear after positional and named parameters in a routine prototype. 3478 \begin{cfa} 3479 void f( int i = 1, int ?j = 2, ®...® ); 3480 \end{cfa} 3481 3482 \CFA named and default arguments are backwards compatible with C. 3483 \Index*[C++]{\CC{}} only supports default parameters; 3484 \Index*{Ada} supports both named and default parameters. 3490 3485 3491 3486
Note: See TracChangeset
for help on using the changeset viewer.