Changeset 10a9479d for doc/theses/mike_brooks_MMath/array.tex
- Timestamp:
- Nov 23, 2024, 8:28:37 PM (11 months ago)
- Branches:
- master
- Children:
- 956b389
- Parents:
- b006c51e (diff), de7b7a5 (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the(diff)
links above to see all the changes relative to each parent. - File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/theses/mike_brooks_MMath/array.tex
rb006c51e r10a9479d 2 2 \label{c:Array} 3 3 4 Arrays in C are possibly the single most misunderstood and incorrectly used feature in the language, resulting in the largest proportion of runtime errors and security violations. 5 This chapter describes the new \CFA language and library features that introduce a length-checked array type, @array@, to the \CFA standard library~\cite{Cforall}. 6 7 Offering the @array@ type, as a distinct alternative to the C array, is consistent with \CFA's goal of backwards compatibility, \ie virtually all existing C (@gcc@) programs can be compiled by \CFA with only a small number of changes, similar to \CC (@g++@). 8 However, a few compatibility-breaking changes to the behaviour of the C array are necessary, both as an implementation convenience and to fix C's lax treatment of arrays. 9 Hence, the @array@ type is an opportunity to start from a clean slate and show a cohesive selection of features, making it unnecessary to deal with every inherited complexity of the C array. 10 4 11 5 12 \section{Introduction} 6 13 \label{s:ArrayIntro} 7 14 8 Arrays in C are possibly the single most misunderstood and incorrectly used feature in the language, resulting in the largest proportion of runtime errors and security violations. 9 This chapter describes the new \CFA language and library features that introduce a length-checked array type to the \CFA standard library~\cite{Cforall}. 10 11 Specifically, a new \CFA array is declared by instantiating the generic @array@ type, 12 much like instantiating any other standard-library generic type (such as @dlist@), 15 The new \CFA array is declared by instantiating the generic @array@ type, 16 much like instantiating any other standard-library generic type (such as \CC @vector@), 13 17 though using a new style of generic parameter. 14 18 \begin{cfa} … … 16 20 \end{cfa} 17 21 Here, the arguments to the @array@ type are @float@ (element type) and @99@ (length). 18 When this type is used as a function parameter, the type-system requires that a call's argument matches, down to the length.22 When this type is used as a function parameter, the type-system requires that a call's argument is a perfect match. 19 23 \begin{cfa} 20 24 void f( @array( float, 42 )@ & p ) {} $\C{// p accepts 42 floats}$ 21 f( x ); $\C{// statically rejected: type s are different, 99 != 42}$25 f( x ); $\C{// statically rejected: type lengths are different, 99 != 42}$ 22 26 23 27 test2.cfa:3:1 error: Invalid application of existing declaration(s) in expression. … … 25 29 \end{cfa} 26 30 Here, the function @f@'s parameter @p@ is declared with length 42. 27 The call @f( x )@, with the argument being the previously-declared object, is invalid, because the @array@ lengths @99@ and @42@ do not match. 28 29 A function declaration can be polymorphic over these @array@ arguments by using the @forall@ declaration prefix. 30 This function @g@'s takes arbitrary type parameter @T@ (familiar) and \emph{dimension parameter} @N@ (new). 31 A dimension paramter represents a to-be-determined count of elements, managed by the type system. 31 However, the call @f( x )@ is invalid, because @x@'s length is @99@, which does not match @42@. 32 33 A function declaration can be polymorphic over these @array@ arguments by using the \CFA @forall@ declaration prefix. 32 34 \begin{cfa} 33 35 forall( T, @[N]@ ) … … 40 42 Cforall Runtime error: subscript 1000 exceeds dimension range [0,99) $for$ array 0x555555558020. 41 43 \end{cfa} 44 Function @g@ takes an arbitrary type parameter @T@ and a \emph{dimension parameter} @N@. 45 A dimension parameter represents a to-be-determined count of elements, managed by the type system. 42 46 The call @g( x, 0 )@ is valid because @g@ accepts any length of array, where the type system infers @float@ for @T@ and length @99@ for @N@. 43 Inferring values for @T@ and @N@ is implicit , without programmer involvement.47 Inferring values for @T@ and @N@ is implicit. 44 48 Furthermore, in this case, the runtime subscript @x[0]@ (parameter @i@ being @0@) in @g@ is valid because 0 is in the dimension range $[0,99)$ of argument @x@. 45 The call @g( x, 1000 )@ is also accepted through compile time;49 However, the call @g( x, 1000 )@ is also accepted through compile time; 46 50 however, this case's subscript, @x[1000]@, generates an error, because @1000@ is outside the dimension range $[0,99)$ of argument @x@. 51 In general, the @forall( ..., [N] )@ participates in the user-relevant declaration of the name @N@, which becomes usable in parameter/return declarations and within a function. 52 The syntactic form is chosen to parallel other @forall@ forms: 53 \begin{cfa} 54 forall( @[N]@ ) ... $\C[1.5in]{// dimension}$ 55 forall( T ) ... $\C{// value datatype (formerly, "otype")}$ 56 forall( T & ) ... $\C{// opaque datatype (formerly, "dtype")}\CRT$ 57 \end{cfa} 58 % The notation @array(thing, N)@ is a single-dimensional case, giving a generic type instance. 47 59 48 60 The generic @array@ type is comparable to the C array type, which \CFA inherits from C. 49 61 Their runtime characteristics are often identical, and some features are available in both. 50 For example, assume a caller instantiates @N@ with 42 (discussion about how to follow) in:62 For example, assume a caller has an argument that instantiates @N@ with 42. 51 63 \begin{cfa} 52 64 forall( [N] ) 53 void declDemo( ) {65 void declDemo( ... ) { 54 66 float x1[N]; $\C{// built-in type ("C array")}$ 55 67 array(float, N) x2; $\C{// type from library}$ … … 59 71 The two variables have identical size and layout; they both encapsulate 42-float stack allocations, with no additional ``bookkeeping'' allocations or headers. 60 72 Providing this explicit generic approach requires a significant extension to the \CFA type system to support a full-feature, safe, efficient (space and time) array-type, which forms the foundation for more complex array forms in \CFA. 61 62 Admittedly, the @array@ library type (type for @x2@) is syntactically different from its C counterpart. 63 A future goal (TODO xref) is to provide the new features upon a built-in type whose syntax approaches C's (declaration style of @x1@). 73 In all following discussion, ``C array'' means the types like that of @x@ and ``\CFA array'' means the standard-library @array@ type (instantiations), like the type of @x2@. 74 75 Admittedly, the @array@ library type for @x2@ is syntactically different from its C counterpart. 76 A future goal (TODO xref) is to provide the new @array@ features with syntax approaching C's (declaration style of @x1@). 64 77 Then, the library @array@ type could be removed, giving \CFA a largely uniform array type. 65 At present, the C-syntax array gets partial support for the new features, so the generic @array@ is used exclusively when introducing features;78 At present, the C-syntax @array@ is only partially supported, so the generic @array@ is used exclusively in the thesis; 66 79 feature support and C compatibility are revisited in Section ? TODO. 67 68 Offering the @array@ type, as a distinct alternative to the C array, is consistent with \CFA's goal of backwards compatibility, \ie virtually all existing C (@gcc@) programs can be compiled by \CFA with only a small number of changes, similar to \CC (@g++@).69 However, a few compatibility-breaking changes to the behaviour of the C array are necessary, both as an implementation convenience and to fix C's lax treatment of arrays.70 Hence, the @array@ type is an opportunity to start from a clean slate and show a cohesive selection of features, making it unnecessary to deal with every inherited complexity of the C array.71 72 In all discussion following, ``C array'' means the types like that of @x@ and ``\CFA array'' means the standard-library @array@ type (instantiations), like the type of @x2@.73 80 74 81 My contributions in this chapter are: … … 83 90 84 91 85 \section{Definitions and design considerations} 86 87 88 \subsection{Dependent typing} 89 90 91 92 General dependent typing allows the type system to encode arbitrary predicates (e.g. behavioural specifications for functions), 92 \section{Dependent typing} 93 94 General dependent typing allows the type system to encode arbitrary predicates (\eg behavioural specifications for functions), 93 95 which is an anti-goal for my work. 94 96 Firstly, this application is strongly associated with pure functional languages, … … 101 103 102 104 103 104 105 \section{Features added} 105 106 … … 109 110 By declaring type variables at the front of object declarations, an array dimension is lexically referenceable where it is needed. 110 111 For example, a declaration can share one length, @N@, among a pair of parameters and the return, 111 meaning that it requires both input arrays to be of the same length, and guarantees that the result with beof that length as well.112 meaning that it requires both input arrays to be of the same length, and guarantees that the result is of that length as well. 112 113 \lstinput{10-17}{hello-array.cfa} 113 This function @f@ does a pointwise comparison of its two input arrays, checking if each pair of numbers is within half a percent of each other, returning the answers in a newly allocated @bool@ array. 114 The dynamic allocation of the @ret@ array by preexisting @alloc@ uses the parameterized dimension information implicitly within its @sizeof@ determination, and casts the return type. 115 Note that alloc only sees one whole type for its @T@ (which is @f@'s @array(bool, N)@); this type's size is a computation based on @N@. 116 \begin{cfa} 117 // simplification 118 static inline forall( T & | sized(T) ) 114 Function @f@ does a pointwise comparison of its two input arrays, checking if each pair of numbers is within half a percent of each other, returning the answers in a newly allocated @bool@ array. 115 The dynamic allocation of the @ret@ array, by the library @alloc@ function, 116 \begin{cfa} 117 forall( T & | sized(T) ) 119 118 T * alloc() { 120 return (T *)malloc( sizeof(T) ); 121 } 122 \end{cfa} 123 This example illustrates how the new @array@ type plugs into existing \CFA behaviour by implementing necessary @sized@ assertions needed by other types. 124 (@sized@ implies a concrete \vs abstract type with a runtime-available size, exposed as @sizeof@.) 119 return @(T *)@malloc( @sizeof(T)@ ); 120 } 121 \end{cfa} 122 uses the parameterized dimension information implicitly within its @sizeof@ determination, and casts the return type. 123 Note that @alloc@ only sees one whole type for its @T@ (which is @f@'s @array(bool, N)@); this type's size is a computation based on @N@. 124 This example illustrates how the new @array@ type plugs into existing \CFA behaviour by implementing necessary \emph{sized} assertions needed by other types. 125 (\emph{sized} implies a concrete \vs abstract type with a runtime-available size, exposed as @sizeof@.) 125 126 As a result, there is significant programming safety by making the size accessible and implicit, compared with C's @calloc@ and non-array supporting @memalign@, which take an explicit length parameter not managed by the type system. 126 127 … … 142 143 The result is a significant improvement in safety and usability. 143 144 144 In general, the @forall( ..., [N] )@ participates in the user-relevant declaration of the name @N@, which becomes usable in parameter/return declarations and within a function.145 The syntactic form is chosen to parallel other @forall@ forms:146 \begin{cfa}147 forall( @[N]@ ) ... $\C[1.5in]{// dimension}$148 forall( T & ) ... $\C{// opaque datatype (formerly, "dtype")}$149 forall( T ) ... $\C{// value datatype (formerly, "otype")}\CRT$150 \end{cfa}151 % The notation @array(thing, N)@ is a single-dimensional case, giving a generic type instance.152 145 In summary: 153 146 \begin{itemize} … … 168 161 % agreed, though already said 169 162 \item 170 \CC does not allow a template function to be nested, while \CFA le sts its polymorphic functions to be nested.163 \CC does not allow a template function to be nested, while \CFA lets its polymorphic functions to be nested. 171 164 % why is this important? 172 165 \item … … 227 220 \end{cfa} 228 221 \end{tabular} 229 \caption{\lstinline{N}-style param ters, for \CC template \vs \CFA generic type }222 \caption{\lstinline{N}-style parameters, for \CC template \vs \CFA generic type } 230 223 \label{f:TemplateVsGenericType} 231 224 \end{figure} 232 225 233 226 Just as the first example in \VRef[Section]{s:ArrayIntro} shows a compile-time rejection of a length mismatch, 234 so are length mismatches stopped when they inv love dimension parameters.227 so are length mismatches stopped when they involve dimension parameters. 235 228 While \VRef[Figure]{f:fHarness} shows successfully calling a function @f@ expecting two arrays of the same length, 236 229 \begin{cfa} 237 230 array( bool, N ) & f( array( float, N ) &, array( float, N ) & ); 238 231 \end{cfa} 239 a static rejection occurs when attempting to call @f@ with arrays of potentiallydiffering lengths.232 a static rejection occurs when attempting to call @f@ with arrays of differing lengths. 240 233 \lstinput[tabsize=1]{70-74}{hello-array.cfa} 241 234 When the argument lengths themselves are statically unknown, … … 252 245 Orthogonally, the \CFA array type works within generic \emph{types}, \ie @forall@-on-@struct@. 253 246 The same argument safety and the associated implicit communication of array length occurs. 254 Preexisting \CFA allowed aggregate types to be generalized with type parameters, enabling parameterizing forelement types.247 Preexisting \CFA allowed aggregate types to be generalized with type parameters, enabling parameterizing of element types. 255 248 Now, \CFA also allows parameterizing them by length. 256 249 Doing so gives a refinement of C's ``flexible array member'' pattern[TODO: cite ARM 6.7.2.1 pp18]\cite{arr:gnu-flex-mbr}. … … 259 252 This flexibility, in turn, allows for multiple array members. 260 253 \lstinput{10-15}{hello-accordion.cfa} 261 This structure's layout has the starting offset of @studentIds@ varying according to the generic parameter @C@, and the offset of @preferences@ varying according to both generic parameters. 262 263 The school example has the data structure capturing many students' course-preference forms. 264 It has course- and student-level metadata (their respective display names) and a position-based preferecens' matrix. 265 The input files in \VRef[Figure]{f:checkHarness} give example data. 266 267 When a function operates on a @School@ structure, the type system handles its memory layout transparently. 268 \lstinput{30-37}{hello-accordion.cfa} 269 In the running example, this @getPref@ function answers, 270 for the student at position @sIx@, what is the position of its @pref@\textsuperscript{th}-favoured class? 271 272 \VRef[Figure]{f:checkHarness} shows the @School@ harness and results with different array sizes. 273 This example program prints the courses in each student's preferred order, all using the looked-up display names. 274 Note the declaration of the @school@ variable. 254 The structure has course- and student-level metatdata (their respective field names) and a position-based preferences' matrix. 255 Its layout has the starting offset of @studentIds@ varying according to the generic parameter @C@, and the offset of @preferences@ varying according to both generic parameters. 256 257 \VRef[Figure]{f:checkHarness} shows a program main using @School@ and results with different array sizes. 258 The @school@ variable holds many students' course-preference forms. 275 259 It is on the stack and its initialization does not use any casting or size arithmetic. 276 260 Both of these points are impossible with a C flexible array member. … … 280 264 \end{cfa} 281 265 This ability to avoid casting and size arithmetic improves safety and usability over C flexible array members. 282 266 Finally, inputs and outputs are given at the bottom for different sized schools. 267 The example program prints the courses in each student's preferred order, all using the looked-up display names. 283 268 284 269 \begin{figure} 285 % super hack to get this to line up 286 \begin{tabular}{@{}ll@{\hspace{25pt}}l@{}} 287 \begin{tabular}{@{}p{3.25in}@{}} 270 \begin{cquote} 288 271 \lstinput{50-55}{hello-accordion.cfa} 289 \vspace*{-3pt}290 272 \lstinput{90-98}{hello-accordion.cfa} 291 \end{tabular} 292 & 293 \raisebox{0.32\totalheight}{% 294 }% 295 & 296 \end{tabular} 297 298 TODO: Get Peter's layout help 299 300 \$ cat school1 301 273 \ \\ 274 @$ cat school1@ 302 275 \lstinput{}{school1} 303 276 304 \$ ./a.out < school1 305 277 @$ ./a.out < school1@ 306 278 \lstinput{}{school1.out} 307 279 308 \$ cat school2 309 280 @$ cat school2@ 310 281 \lstinput{}{school2} 311 282 312 \$ ./a.out < school2 313 283 @$ ./a.out < school2@ 314 284 \lstinput{}{school2.out} 285 \end{cquote} 315 286 316 287 \caption{\lstinline{School} harness, input and output} 317 288 \label{f:checkHarness} 318 289 \end{figure} 290 291 When a function operates on a @School@ structure, the type system handles its memory layout transparently. 292 \lstinput{30-37}{hello-accordion.cfa} 293 In the example, this @getPref@ function answers, for the student at position @is@, what is the position of its @pref@\textsuperscript{th}-favoured class? 319 294 320 295 … … 333 308 But simplifications close enough for the present discussion are: 334 309 \begin{cfa} 335 336 337 338 339 340 struct array_1d{341 342 343 \end{cfa} 344 Th is structure pattern, plus a subscript operator, is all that @array@ provides.310 forall( [N] ) 311 struct array_1d_float { 312 float items[N]; 313 }; 314 forall( T, [N] ) 315 struct array_1d_T { 316 T items[N]; 317 }; 318 \end{cfa} 319 These two structure patterns, plus a subscript operator, is all that @array@ provides. 345 320 346 321 My main work is letting a programmer define 347 such a struct re (one whose type is parameterized by @[N]@)322 such a structure (one whose type is parameterized by @[N]@) 348 323 and functions that operate on it (these being similarly parameterized). 349 324 … … 351 326 \begin{itemize} 352 327 \item 353 The resolver, providingvalues for a used declaration's type-system variables,328 Resolver provided values for a used declaration's type-system variables, 354 329 gathered from type information in scope at the usage site. 355 356 330 \item 357 331 The box pass, encoding information about type parameters 358 332 into ``extra'' regular parameters/arguments on declarations and calls. 359 333 Notably, it conveys the size of a type @foo@ as a @__sizeof_foo@ parameter, 360 and rewrites the @sizeof(foo)@ expression as @__sizeof_foo@, i.e.a use of the parameter.334 and rewrites the @sizeof(foo)@ expression as @__sizeof_foo@, \ie a use of the parameter. 361 335 \end{itemize} 362 336 … … 364 338 This work is detailed in \VRef[Section]{s:ArrayTypingC}. 365 339 However, the resolution--boxing scheme, in its preexisting state, was already equipped to work on (desugared) dimension parameters. 366 The discussion following explains the desugaring and how correctlowered code results.367 368 A n even simpler structure, and a toy function on it, demonstrate what's needed for the encoding.369 \begin{cfa} 370 forall( [@N@] ) { // [1] 371 372 void f( thing(@N@) ) { sout | @N@; } // [2], [3]373 374 375 thing( @10@ ) x; f(x); // prints 10, [4]376 thing( 100 ) y; f(y); // prints 100377 378 340 The following discussion explains the desugaring and how correctly lowered code results. 341 342 A simpler structure, and a toy function on it, demonstrate what is needed for the encoding. 343 \begin{cfa} 344 forall( [@N@] ) { $\C{// [1]}$ 345 struct thing {}; 346 void f( thing(@N@) ) { sout | @N@; } $\C{// [2], [3]}$ 347 } 348 int main() { 349 thing( @10@ ) x; f( x ); $\C{// prints 10, [4]}$ 350 thing( 100 ) y; f( y ); $\C{// prints 100}$ 351 return 0; 352 } 379 353 \end{cfa} 380 354 This example has: … … 389 363 A value like 10 being used as an argument to the parameter @N@. 390 364 \end{enumerate} 391 The chosen solution being to encode the value @N@ \emph{as a type},items 1 and 2 are immediately available for free.365 The chosen solution is to encode the value @N@ \emph{as a type}, so items 1 and 2 are immediately available for free. 392 366 Item 3 needs a way to recover the encoded value from a (valid) type (and to reject invalid types occurring here). 393 367 Item 4 needs a way to produce a type that encodes the given value. … … 400 374 \item 401 375 Given a dimension expression $e$, produce type @char[@$e$@]@ to represent it. 402 If $e$ evaluates to $n$ then the en doded type has size $n$.376 If $e$ evaluates to $n$ then the encoded type has size $n$. 403 377 \item 404 378 Given a type $T$ (produced by these rules), recover the value that it represents with the expression @sizeof(@$T$@)@. … … 410 384 The running example is lowered to: 411 385 \begin{cfa} 412 forall( @N*@ ) { // [1] 413 414 void f( thing(@N@) ) { sout | @sizeof(N)@; } // [2], [3]415 416 417 thing( char[@10@] ) x; f(x); // prints 10, [4]418 thing( char[100] ) y; f(y); // prints 100419 420 386 forall( @N *@ ) { $\C{// [1]}$ 387 struct thing {}; 388 void f( thing(@N@) ) { sout | @sizeof(N)@; } $\C{// [2], [3]}$ 389 } 390 int main() { 391 thing( char[@10@] ) x; f( x ); $\C{// prints 10, [4]}$ 392 thing( char[100] ) y; f( y ); $\C{// prints 100}$ 393 return 0; 394 } 421 395 \end{cfa} 422 396 Observe: … … 430 404 The @sout...@ expression (being an application of the @?|?@ operator) has a second argument that is an ordinary expression. 431 405 \item 432 The type of variable @x@ is another @thing(-)@ type; the argument to the generic @thing@ is a type (array type).406 The type of variable @x@ is another @thing(-)@ type; the argument to the generic @thing@ is a type (array type of bytes, @char@). 433 407 \end{enumerate} 434 408 435 409 From this point, preexisting \CFA compilation takes over lowering it the rest of the way to C. 436 Inspecting the result shows what the above translation achieves. 437 A form that shows only the relevant changes of the box pass (as informed by the resolver), leaving the rest unadulterated, is: 438 \begin{cfa} 439 // [1] 440 void f( size_t __sizeof_N, @void *@ ) { sout | @__sizeof_N@; } // [2], [3] 441 int main() { 442 struct __conc_thing_10 {} x; f(@10@, &x); // prints 10, [4] 443 struct __conc_thing_100 {} y; f(@100@, &y); // prints 100 444 return 0; 445 } 410 Here the result shows only the relevant changes of the box pass (as informed by the resolver), leaving the rest unadulterated: 411 \begin{cfa} 412 // [1] 413 void f( size_t __sizeof_N, @void *@ ) { sout | @__sizeof_N@; } $\C{// [2], [3]}$ 414 int main() { 415 struct __conc_thing_10 {} x; f( @10@, &x ); $\C{// prints 10, [4]}$ 416 struct __conc_thing_100 {} y; f( @100@, &y ); $\C{// prints 100}$ 417 return 0; 418 } 446 419 \end{cfa} 447 420 Observe: … … 452 425 The type @thing(N)@ is (replaced by @void *@, but thereby effectively) gone. 453 426 \item 454 The @sout...@ expression (being an application of the @?|?@ operator) has a second argument that is a regular variable (parameter) usage.427 The @sout...@ expression (being an application of the @?|?@ operator) has a regular variable (parameter) usage for its second argument. 455 428 \item 456 429 Information about the particular @thing@ instantiation (value 10) has moved, from the type, to a regular function-call argument. 457 430 \end{enumerate} 458 At the end of the desugaring and downstream processing, the original C idiom of ``pass both a pointer and a length parameter'' has been reconstructed.459 In the programmer-written form, only the thingis passed.431 At the end of the desugaring and downstream processing, the original C idiom of ``pass both a length parameter and a pointer'' has been reconstructed. 432 In the programmer-written form, only the @thing@ is passed. 460 433 The compiler's action produces the more complex form, which if handwritten, would be error-prone. 461 434 462 Back at the very front end, the parsing changes, AST schema extensions, and validation rules for enabling the sugared user input are:435 Back at the compiler front end, the parsing changes AST schema extensions and validation rules for enabling the sugared user input. 463 436 \begin{itemize} 464 437 \item … … 467 440 Have the new brand of type-variable, \emph{Dimension}, in the AST form of a type-variable, to represent one parsed from @[-]@. 468 441 \item 469 Allow a type variable to occur in expression position. Validate (after parsing) that only dimension-branded type variables are used here.442 Allow a type variable to occur in an expression. Validate (after parsing) that only dimension-branded type variables are used here. 470 443 \item 471 444 Allow an expression to occur in type-argument position. Brand the resulting type argument as a dimension. … … 473 446 Validate (after parsing), on a generic-type usage, \eg the type part of the declaration 474 447 \begin{cfa} 475 @array_1d( foo, bar ) x;@448 array_1d( foo, bar ) x; 476 449 \end{cfa} 450 \vspace*{-10pt} 477 451 that the brands on the generic arguments match the brands of the declared type variables. 478 452 Here, that @foo@ is a type and @bar@ is a dimension. … … 488 462 from one party who knows it, to another who is willing to work with any given length. 489 463 For scenarios where the concern is a mishandled length, 490 the interaction is between two parties who both claim to know (something about) it. 491 Such a scenario occurs in this pure C fragment, wich today's C compilers accept: 492 \begin{cfa} 493 int n = @42@; 494 float x[n]; 495 float (*xp)[@999@] = &x; 496 (*xp)[@500@]; // in "bound"? 497 \end{cfa} 498 464 the interaction is between two parties who both claim to know something about it. 465 Such a scenario occurs in this pure C fragment, which today's C compilers accept: 466 \begin{cfa} 467 int n = @42@; 468 float x[n]; 469 float (*xp)[@999@] = &x; 470 (*xp)[@500@]; $\C{// in "bound"?}$ 471 \end{cfa} 499 472 Here, the array @x@ has length 42, while a pointer to it (@xp@) claims length 999. 500 473 So, while the subscript of @xp@ at position 500 is out of bound of its referent @x@, … … 505 478 The \CFA new-array rejects the analogous case: 506 479 \begin{cfa} 507 int n = @42@; 508 array(float, n) x; 509 array(float, 999) * xp = x; // static rejection here 510 (*xp)[@500@]; // runtime check vs len 999 511 \end{cfa} 512 513 % TODO: kill the vertical whitespace around these lists 514 % nothing from https://stackoverflow.com/questions/1061112/eliminate-space-before-beginitemize is working 515 516 The way the \CFA array is implemented, 517 the type analysis of this \CFA case reduces to a case similar to the earlier C version. 480 int n = @42@; 481 array(float, n) x; 482 array(float, 999) * xp = x; $\C{// static rejection here}$ 483 (*xp)[@500@]; $\C{// runtime check vs len 999}$ 484 \end{cfa} 485 The way the \CFA array is implemented, the type analysis of this case reduces to a case similar to the earlier C version. 518 486 The \CFA compiler's compatibility analysis proceeds as: 519 \begin{itemize}[ noitemsep,partopsep=-\parskip,parsep=0pt,leftmargin=4em]487 \begin{itemize}[parsep=0pt] 520 488 \item 521 489 Is @array(float, 999)@ type-compatible with @array(float, n)@? 522 490 \item 523 Is @arrayX(float, char[999])@ type-compatible with @arrayX(float, char[n])@? 524 \footnote{Here, \lstinline{arrayX} represents the type that results491 Is @arrayX(float, char[999])@ type-compatible with @arrayX(float, char[n])@?\footnote{ 492 Here, \lstinline{arrayX} represents the type that results 525 493 from desugaring the \lstinline{array} type 526 494 into a type whose generic parameters are all types. … … 531 499 Is @char[999]@ type-compatible with @char[n]@? 532 500 \end{itemize} 533 534 I chose to achieve the necessary rejection of the \CFA case 535 by adding a rejection of the corresponding C case. 536 537 This decision is not backward compatible. 501 To achieve the necessary \CFA rejections meant rejecting the corresponding C case, which is not backward compatible. 538 502 There are two complementary mitigations for this incompatibility. 539 503 … … 542 506 This situation might arise if @n@ were known to be 999, 543 507 rather than 42, as in the introductory examples. 544 The programmer can add a cast , as in:545 \begin{cfa} 546 xp = ( float (*)[999] ) &x;547 \end{cfa} 548 This addition causes \CFA to accept, be acause now, the programmer has accepted blame.508 The programmer can add a cast in the \CFA code. 509 \begin{cfa} 510 xp = @(float (*)[999])@ &x; 511 \end{cfa} 512 This addition causes \CFA to accept, because now, the programmer has accepted blame. 549 513 This addition is benign in plain C, because the cast is valid, just unnecessary there. 550 514 Moreover, the addition can even be seen as appropriate ``eye candy,'' … … 556 520 Second, the incompatibility only affects types like pointer-to-array, 557 521 which are are infrequently used in C. 558 The more common C idiom for aliasing an array is to use the pointer-to-first-element type, 559 which does not participate in the \CFA array's length checking. 560 \footnote{Notably, the desugaring of the \lstinline{array@} type, 561 avoids letting any \lstinline{-[-]} type decay, 562 in order to preserve the length information that powers runtime bound checking.} 563 Therefore, the frequency of needing to upgrade wild C code (as discussed in the first mitigation) 522 The more common C idiom for aliasing an array is to use a pointer-to-first-element type, 523 which does not participate in the \CFA array's length checking.\footnote{ 524 Notably, the desugaring of the \lstinline{array} type avoids letting any \lstinline{-[-]} type decay, 525 in order to preserve the length information that powers runtime bound-checking.} 526 Therefore, the frequency of needing to upgrade legacy C code (as discussed in the first mitigation) 564 527 is anticipated to be low. 565 528 566 529 Because the incompatibility represents a low cost to a \CFA onboarding effort 567 530 (with a plausible side benefit of linting the original code for a missing annotation), 568 I elected not to add special measuresto retain the compatibility.531 no special measures were added to retain the compatibility. 569 532 It would be possible to flag occurrences of @-[-]@ types that come from @array@ desugaring, 570 533 treating those with stricter \CFA rules, while treating others with classic C rules. … … 573 536 574 537 Having allowed that both the initial C example's check 575 \begin{itemize} [noitemsep,partopsep=-\parskip,parsep=0pt,leftmargin=4em]538 \begin{itemize} 576 539 \item 577 540 Is @float[999]@ type-compatible with @float[n]@? 578 541 \end{itemize} 579 and the second \CFA ex mple's induced check580 \begin{itemize} [noitemsep,partopsep=-\parskip,parsep=0pt,leftmargin=4em]542 and the second \CFA example's induced check 543 \begin{itemize} 581 544 \item 582 545 Is @char[999]@ type-compatible with @char[n]@? … … 587 550 To implement the new \CFA rules, I took the syntactic recursion a step further, obtaining, 588 551 in both cases: 589 \begin{itemize} [noitemsep,partopsep=-\parskip,parsep=0pt,leftmargin=4em]552 \begin{itemize} 590 553 \item 591 Is @999@ TBD-compatible with @n@?554 Is @999@ compatible with @n@? 592 555 \end{itemize} 593 This compatibility question applies to a pair of expressions, where the earlier oneswere to types.556 This compatibility question applies to a pair of expressions, where the earlier implementation were to types. 594 557 Such an expression-compatibility question is a new addition to the \CFA compiler. 595 These questions only arise in the context of dimension expressions on (C) array types.558 Note, these questions only arise in the context of dimension expressions on (C) array types. 596 559 597 560 TODO: ensure these compiler implementation matters are treated under \CFA compiler background: … … 600 563 GenPoly. 601 564 602 The relevant technical component of the \CFA compiler is, 603 within the type resolver, the type unification procedure. 565 The relevant technical component of the \CFA compiler is the type unification procedure within the type resolver. 604 566 I added rules for continuing this unification into expressions that occur within types. 605 567 It is still fundamentally doing \emph{type} unification … … 607 569 and not participating in binding any variables that stand in for expression fragments 608 570 (for there is no such sort of variable in \CFA's analysis.) 609 610 571 An unfortunate fact about the \CFA compiler's preexisting implementation is that 611 572 type unification suffers from two forms of duplication. … … 613 574 The first duplication has (many of) the unification rules stated twice. 614 575 As a result, my additions for dimension expressions are stated twice. 615 The extra statement of the rules occurs in the GenPolymodule,576 The extra statement of the rules occurs in the @GenPoly@ module, 616 577 where concrete types like @array(int, 5)@\footnote{ 617 578 Again, the presentation is simplified 618 by leaving the \lstinline{array} macro unexpanded }579 by leaving the \lstinline{array} macro unexpanded.} 619 580 are lowered into corresponding C types @struct __conc_array_1234@ (the suffix being a generated index). 620 In this case, the struct's definition gives fields that hardcode the argument values of @float@ and @5@. 621 The next time an @array(-,-)@ concrete instance is encountered, 622 is the previous @struct __conc_array_1234@ suitable for it? 623 Yes, for another occurrance of @array(int, 5)@; 581 In this case, the struct's definition contains fields that hardcode the argument values of @float@ and @5@. 582 The next time an @array(-,-)@ concrete instance is encountered, it checks if the previous @struct __conc_array_1234@ is suitable for it. 583 Yes, for another occurrence of @array(int, 5)@; 624 584 no, for either @array(rational(int), 5)@ or @array(int, 42)@. 625 585 By the last example, this phase must ``reject'' … … 630 590 In the program 631 591 \begin{cfa} 632 void f( double );633 forall( T & ) void f( T & );634 635 636 f(x);637 638 \end{cfa} 639 when resolving the function call, the first unification stage640 compares the type s @T@, of the parameter,with @array( float, n + 1 )@, of the argument.592 void @f@( double ); 593 forall( T & ) void @f@( T & ); 594 void g( int n ) { 595 array( float, n + 1 ) x; 596 f(x); // overloaded 597 } 598 \end{cfa} 599 when resolving the function call, @g@, the first unification stage 600 compares the type @T@ of the parameter with @array( float, n + 1 )@, of the argument. 641 601 TODO: finish. 642 602 … … 647 607 TODO: Deal with directionality, that I'm doing exact-match, no ``at least as long as,'' no subtyping. 648 608 Should it be an earlier scoping principle? Feels like it should matter in more places than here.} 649 So, a ``yes'' answer must represent a guarantee that both expressions willevaluate the650 same result, while a ``no'' can tolerate ``they might, but we're not sure ,'609 So, a ``yes'' answer must represent a guarantee that both expressions evaluate the 610 same result, while a ``no'' can tolerate ``they might, but we're not sure'', 651 611 provided that practical recourses are available 652 to let programmers express theirbetter knowledge.653 The specific rule-set that I offer withthe current release is, in fact, extremely conservative.612 to let programmers express better knowledge. 613 The new rule-set in the current release is, in fact, extremely conservative. 654 614 I chose to keep things simple, 655 and allow real future needs do drive adding additional complexity, 656 within the framework that I laid out. 615 and allow future needs to drive adding additional complexity, within the new framework. 657 616 658 617 For starters, the original motivating example's rejection … … 662 621 Rather, the analysis assumes a variable's value can be anything, 663 622 and so there can be no guarantee that its value is 999. 664 So, a variable useand a literal can never match.623 So, a variable and a literal can never match. 665 624 666 625 Two occurrences of the same literal value are obviously a fine match. 667 For two occurrences of the same varia lbe, more information is needed.626 For two occurrences of the same variable, more information is needed. 668 627 For example, this one is fine 669 628 \begin{cfa} 670 671 672 float (*xp)[n] = x;// accept673 629 void f( const int n ) { 630 float x[n]; 631 float (*xp)[n] = x; // accept 632 } 674 633 \end{cfa} 675 634 while this one is not: 676 635 \begin{cfa} 677 678 679 680 681 float (*xp)[n] = x;// reject682 636 void f() { 637 int n = 42; 638 float x[n]; 639 n = 999; 640 float (*xp)[n] = x; // reject 641 } 683 642 \end{cfa} 684 643 Furthermore, the fact that the first example sees @n@ as @const@ 685 is not actually a sufficent basis.644 is not actually sufficient. 686 645 In this example, @f@'s length expression's declaration is as @const@ as it can be, 687 646 yet its value still changes between the two invocations: 688 \begin{cfa} 689 // compile unit 1 690 void g(); 691 void f( const int & const nr ) { 692 float x[nr]; 693 g(); 694 float (*xp)[nr] = x; // reject 695 } 696 // compile unit 2 697 static int n = 42; 698 void g() { 699 n = 99; 700 } 701 void f( const int & ); 702 int main () { 703 f(n); 704 return 0; 705 } 706 \end{cfa} 707 The issue in this last case is, 708 just because you aren't able to change something doesn't mean someone else can't. 709 710 My rule set also respects a feature of the C tradition. 711 In spite of the several limitations of the C rules 647 \begin{cquote} 648 \setlength{\tabcolsep}{15pt} 649 \begin{tabular}{@{}ll@{}} 650 \begin{cfa} 651 // compile unit 1 652 void g(); 653 void f( const int & const nr ) { 654 float x[nr]; 655 g(); // change n 656 @float (*xp)[nr] = x;@ // reject 657 } 658 \end{cfa} 659 & 660 \begin{cfa} 661 // compile unit 2 662 static int n = 42; 663 void g() { 664 n = 99; 665 } 666 667 f( n ); 668 \end{cfa} 669 \end{tabular} 670 \end{cquote} 671 The issue here is that knowledge needed to make a correct decision is hidden by separate compilation. 672 Even within a translation unit, static analysis might not be able to provide all the information. 673 674 My rule set also respects a traditional C feature: In spite of the several limitations of the C rules 712 675 accepting cases that produce different values, there are a few mismatches that C stops. 713 C is quite precise when working with two static values :714 \begin{cfa} 715 716 717 float (*xp1)[42] = &x;// accept718 float (*xp2)[999] = &x;// reject676 C is quite precise when working with two static values. 677 \begin{cfa} 678 enum { fortytwo = 42 }; 679 float x[fortytwo]; 680 float (*xp1)[42] = &x; // accept 681 float (*xp2)[999] = &x; // reject 719 682 \end{cfa} 720 683 My \CFA rules agree with C's on these cases. 721 684 722 Myrules classify expressions into three groups:685 In summary, the new rules classify expressions into three groups: 723 686 \begin{description} 724 687 \item[Statically Evaluable] 725 688 Expressions for which a specific value can be calculated (conservatively) 726 689 at compile-time. 727 A preexisting \CFA compiler module defines which expressions qualify,690 A preexisting \CFA compiler module defines which literals, enumerators, and expressions qualify, 728 691 and evaluates them. 729 Includes literals and enumeration values.730 692 \item[Dynamic but Stable] 731 The value of a variable declared as @const@. 732 Includes a @const@ parameter. 693 The value of a variable declared as @const@, including a @const@ parameter. 733 694 \item[Potentially Unstable] 734 695 The catch-all category. Notable examples include: 735 any function-call result (@float x[foo()];@),736 the particular function-call result that is a pointer dereference (@void f(const int * n) { float x[*n]; }@), and696 any function-call result, @float x[foo()];@, 697 the particular function-call result that is a pointer dereference, @void f(const int * n)@ @{ float x[*n]; }@, and 737 698 any use of a reference-typed variable. 738 699 \end{description} 739 740 My \CFA rules are: 700 Within these groups, my \CFA rules are: 741 701 \begin{itemize} 742 702 \item … … 744 704 Notably, this rule allows a literal to match with an enumeration value, based on the value. 745 705 \item 746 Accept a Dynamic but Stable pair, if both expressions are written out the same, e.g. refers tosame variable declaration.706 Accept a Dynamic but Stable pair, if both expressions are written out the same, \eg refers to the same variable declaration. 747 707 \item 748 708 Otherwise, reject. 749 Notably, reject all pairs from the Potentially Unstable group. 750 Notably, reject all pairs that cross groups. 709 Notably, reject all pairs from the Potentially Unstable group and all pairs that cross groups. 751 710 \end{itemize} 752 753 711 The traditional C rules are: 754 712 \begin{itemize} … … 759 717 \end{itemize} 760 718 761 762 \newcommand{\falsealarm}{{\color{orange}\small{*}}}763 \newcommand{\allowmisuse}{{\color{red}\textbf{!}}}764 \newcommand{\cmark}{\ding{51}} % from pifont765 \newcommand{\xmark}{\ding{55}}766 719 \begin{figure} 720 \newcommand{\falsealarm}{{\color{blue}\small{*}}} 721 \newcommand{\allowmisuse}{{\color{red}\textbf{!}}} 722 \newcommand{\cmark}{\ding{51}} % from pifont 723 \newcommand{\xmark}{\ding{55}} 724 767 725 \begin{tabular}{@{}l@{\hspace{16pt}}c@{\hspace{8pt}}c@{\hspace{16pt}}c@{\hspace{8pt}}c@{\hspace{16pt}}c} 768 726 & \multicolumn{2}{c}{\underline{Values Equal}} … … 778 736 \end{tabular} 779 737 780 \ vspace{12pt}781 \noindent\textbf{Legend :}782 \begin{itemize} 738 \medskip 739 \noindent\textbf{Legend} 740 \begin{itemize}[leftmargin=*] 783 741 \item 784 742 Each row gives the treatment of a test harness of the form 785 743 \begin{cfa} 786 787 744 float x[ expr1 ]; 745 float (*xp)[ expr2 ] = &x; 788 746 \end{cfa} 789 where \lstinline{expr1} and \lstinline{expr2} are metavariables varying according to the row's Case. 747 \vspace*{-10pt} 748 where \lstinline{expr1} and \lstinline{expr2} are meta-variables varying according to the row's Case. 790 749 Each row's claim applies to other harnesses too, including, 791 750 \begin{itemize} 792 751 \item 793 calling a function with a param ter like \lstinline{x} and an argument of the \lstinline{xp} type,752 calling a function with a parameter like \lstinline{x} and an argument of the \lstinline{xp} type, 794 753 \item 795 754 assignment in place of initialization, … … 801 760 \item 802 761 Each case's claim is symmetric (swapping \lstinline{expr1} with \lstinline{expr2} has no effect), 803 even though most test harnesses are asym etric.762 even though most test harnesses are asymmetric. 804 763 \item 805 764 The table treats symbolic identity (Same/Different on rows) 806 apart from value eq ality (Equal/Unequal on columns).765 apart from value equality (Equal/Unequal on columns). 807 766 \begin{itemize} 808 767 \item … … 819 778 while every Accept under Values Unequal is an allowed misuse (\allowmisuse). 820 779 \end{itemize} 821 \caption{Case comparison for array type compatibility, given pairs of dimension expressions. 822 TODO: get Peter's LaTeX help on overall appearance, probably including column spacing/centering and bullet indentation.}780 781 \caption{Case comparison for array type compatibility, given pairs of dimension expressions.} 823 782 \label{f:DimexprRuleCompare} 824 783 \end{figure} … … 826 785 827 786 Figure~\ref{f:DimexprRuleCompare} gives a case-by-case comparison of the consequences of these rule sets. 828 It demonstrates that the \CFA false alarms occur in the same cases as C treats unsafe ly.829 It also shows that C-incompatibilities only occur in cases that C treats unsafe ly.787 It demonstrates that the \CFA false alarms occur in the same cases as C treats unsafe. 788 It also shows that C-incompatibilities only occur in cases that C treats unsafe. 830 789 831 790 … … 837 796 whose reuses are rejected by the blunt current-state rules: 838 797 \begin{cfa} 839 840 841 float (*xp)[nr] = & x;// reject: nr varying (no references)842 843 float (*yp)[nv + 1] = & y; // reject: ?+? unpredicable (no functions)844 798 void f( int & nr, const int nv ) { 799 float x[nr]; 800 float (*xp)[nr] = &x; // reject: nr varying (no references) 801 float y[nv + 1]; 802 float (*yp)[nv + 1] = &y; // reject: ?+? unpredictable (no functions) 803 } 845 804 \end{cfa} 846 805 Yet, both dimension expressions are reused safely. 847 (The @nr@ reference is never written, not volatile806 The @nr@ reference is never written, not volatile 848 807 and control does not leave the function between the uses. 849 The name @?+?@ resolves to a function that is quite predictable. )850 The programmer here can add the constant declarations:851 \begin{cfa} 852 853 854 855 float (*xp)[nx] = & x; // acept856 857 858 float (*yp)[ny] = & y;// accept859 808 The name @?+?@ resolves to a function that is quite predictable. 809 Here, the programmer can add the constant declarations (cast does not work): 810 \begin{cfa} 811 void f( int & nr, const int nv ) { 812 @const int nx@ = nr; 813 float x[nx]; 814 float (*xp)[nx] = & x; // accept 815 @const int ny@ = nv + 1; 816 float y[ny]; 817 float (*yp)[ny] = & y; // accept 818 } 860 819 \end{cfa} 861 820 The result is the originally intended semantics, … … 863 822 864 823 The snapshotting trick is also used by the translation, though to achieve a different outcome. 865 Rather obviously, every array must be subscriptable, even a biz zarre one:866 \begin{cfa} 867 868 824 Rather obviously, every array must be subscriptable, even a bizarre one: 825 \begin{cfa} 826 array( float, rand(10) ) x; 827 x[0]; // 10% chance of bound-check failure 869 828 \end{cfa} 870 829 Less obvious is that the mechanism of subscripting is a function call, … … 874 833 Adjusting the example to make the function's use of length more explicit: 875 834 \begin{cfa} 876 877 878 879 835 forall ( T * ) 836 void f( T * x ) { sout | sizeof(*x); } 837 float x[ rand(10) ]; 838 f( x ); 880 839 \end{cfa} 881 840 Considering that the partly translated function declaration is, loosely, 882 841 \begin{cfa} 883 void f( size_t __sizeof_T, void * x ) { sout | __sizeof_T; } 884 \end{cfa} 885 the translated call must not go like: 886 \begin{cfa} 887 float x[ rand(10) ]; 888 f( rand(10), &x ); 889 \end{cfa} 890 Rather, its actual translation is like: 891 \begin{cfa} 892 size_t __dim_x = rand(10); 893 float x[ __dim_x ]; 894 f( __dim_x, &x ); 895 \end{cfa} 896 The occurrence of this dimension hoisting during translation was present in the preexisting \CFA compiler. 897 But its cases were buggy, particularly with determining, ``Can hoisting be skipped here?'' 898 For skipping this hoisting is clearly desirable in some cases, 899 not the least of which is when the programmer has already done so manually. 900 My work includes getting these cases right, harmonized with the accept/reject criteria, and tested. 901 902 842 void f( size_t __sizeof_T, void * x ) { sout | __sizeof_T; } 843 \end{cfa} 844 the translation must call the dimension argument twice: 845 \begin{cfa} 846 float x[ rand(10) ]; 847 f( rand(10), &x ); 848 \end{cfa} 849 Rather, the translation is: 850 \begin{cfa} 851 size_t __dim_x = rand(10); 852 float x[ __dim_x ]; 853 f( __dim_x, &x ); 854 \end{cfa} 855 The occurrence of this dimension hoisting during translation was in the preexisting \CFA compiler. 856 But its cases were buggy, particularly with determining, ``Can hoisting the expression be skipped here?'', for skipping this hoisting is clearly desirable in some cases. 857 For example, when the programmer has already done so manually. \PAB{I don't know what this means.} 858 In the new implementation, these cases are correct, harmonized with the accept/reject criteria. 903 859 904 860 TODO: Discuss the interaction of this dimension hoisting with the challenge of extra unification for cost calculation 905 861 906 862 907 \section{Multidimensional Arrays} 908 \label{toc:mdimpl} 909 910 % TODO: introduce multidimensional array feature and approaches 911 912 When working with arrays, \eg linear algebra, array dimensions are referred to as ``rows'' and ``columns'' for a matrix, adding ``planes'' for a cube. 913 (There is little terminology for higher dimensional arrays.) 914 For example, an acrostic poem\footnote{A type of poetry where the first, last or other letters in a line spell out a particular word or phrase in a vertical column.} 915 can be treated as a grid of characters, where the rows are the text and the columns are the embedded keyword(s). 916 Within a poem, there is the concept of a \newterm{slice}, \eg a row is a slice for the poem text, a column is a slice for a keyword. 917 In general, the dimensioning and subscripting for multidimensional arrays has two syntactic forms: @m[r,c]@ or @m[r][c]@. 918 919 Commonly, an array, matrix, or cube, is visualized (especially in mathematics) as a contiguous row, rectangle, or block. 920 This conceptualization is reenforced by subscript ordering, \eg $m_{r,c}$ for a matrix and $c_{p,r,c}$ for a cube. 921 Few programming languages differ from the mathematical subscript ordering. 922 However, computer memory is flat, and hence, array forms are structured in memory as appropriate for the runtime system. 923 The closest representation to the conceptual visualization is for an array object to be contiguous, and the language structures this memory using pointer arithmetic to access the values using various subscripts. 924 This approach still has degrees of layout freedom, such as row or column major order, \ie juxtaposed rows or columns in memory, even when the subscript order remains fixed. 925 For example, programming languages like MATLAB, Fortran, Julia and R store matrices in column-major order since they are commonly used for processing column-vectors in tabular data sets but retain row-major subscripting. 926 In general, storage layout is hidden by subscripting, and only appears when passing arrays among different programming languages or accessing specific hardware. 927 928 \VRef[Figure]{f:FixedVariable} shows two C90 approaches for manipulating a contiguous matrix. 929 Note, C90 does not support VLAs. 930 The fixed-dimension approach (left) uses the type system; 931 however, it requires all dimensions except the first to be specified at compile time, \eg @m[][6]@, allowing all subscripting stride calculations to be generated with constants. 932 Hence, every matrix passed to @fp1@ must have exactly 6 columns but the row size can vary. 933 The variable-dimension approach (right) ignores (violates) the type system, \ie argument and parameters types do not match, and subscripting is performed manually using pointer arithmetic in the macro @sub@. 934 935 \begin{figure} 936 \begin{tabular}{@{}l@{\hspace{40pt}}l@{}} 937 \multicolumn{1}{c}{\textbf{Fixed Dimension}} & \multicolumn{1}{c}{\textbf{Variable Dimension}} \\ 938 \begin{cfa} 939 940 void fp1( int rows, int m[][@6@] ) { 941 ... printf( "%d ", @m[r][c]@ ); ... 942 } 943 int fm1[4][@6@], fm2[6][@6@]; // no VLA 944 // initialize matrixes 945 fp1( 4, fm1 ); // implicit 6 columns 946 fp1( 6, fm2 ); 947 \end{cfa} 948 & 949 \begin{cfa} 950 #define sub( m, r, c ) *(m + r * sizeof( m[0] ) + c) 951 void fp2( int rows, int cols, int *m ) { 952 ... printf( "%d ", @sub( m, r, c )@ ); ... 953 } 954 int vm1[@4@][@4@], vm2[@6@][@8@]; // no VLA 955 // initialize matrixes 956 fp2( 4, 4, vm1 ); 957 fp2( 6, 8, vm2 ); 958 \end{cfa} 959 \end{tabular} 960 \caption{C90 Fixed \vs Variable Contiguous Matrix Styles} 961 \label{f:FixedVariable} 962 \end{figure} 963 964 Many languages allow multidimensional arrays-of-arrays, \eg in Pascal or \CC. 965 \begin{cquote} 966 \begin{tabular}{@{}ll@{}} 967 \begin{pascal} 968 var m : array[0..4, 0..4] of Integer; (* matrix *) 969 type AT = array[0..4] of Integer; (* array type *) 970 type MT = array[0..4] of AT; (* array of array type *) 971 var aa : MT; (* array of array variable *) 972 m@[1][2]@ := 1; aa@[1][2]@ := 1 (* same subscripting *) 973 \end{pascal} 974 & 975 \begin{c++} 976 int m[5][5]; 977 978 typedef vector< vector<int> > MT; 979 MT vm( 5, vector<int>( 5 ) ); 980 m@[1][2]@ = 1; aa@[1][2]@ = 1; 981 \end{c++} 982 \end{tabular} 983 \end{cquote} 984 The language decides if the matrix and array-of-array are laid out the same or differently. 985 For example, an array-of-array may be an array of row pointers to arrays of columns, so the rows may not be contiguous in memory nor even the same length (triangular matrix). 986 Regardless, there is usually a uniform subscripting syntax masking the memory layout, even though a language could differentiated between the two forms using subscript syntax, \eg @m[1,2]@ \vs @aa[1][2]@. 987 Nevertheless, controlling memory layout can make a difference in what operations are allowed and in performance (caching/NUMA effects). 988 989 C also provides non-contiguous arrays-of-arrays. 990 \begin{cfa} 991 int m[5][5]; $\C{// contiguous}$ 992 int * aa[5]; $\C{// non-contiguous}$ 993 \end{cfa} 994 both with different memory layout using the same subscripting, and both with different degrees of issues. 995 The focus of this work is on the contiguous multidimensional arrays in C. 996 The reason is that programmers are often forced to use the more complex array-of-array form when a contiguous array would be simpler, faster, and safer. 997 Nevertheless, the C array-of-array form is still important for special circumstances. 998 999 \VRef[Figure]{f:ContiguousNon-contiguous} shows the extensions made in C99 for manipulating contiguous \vs non-contiguous arrays.\footnote{C90 also supported non-contiguous arrays.} 1000 First, VLAs are supported. 1001 Second, for contiguous arrays, C99 conjoins one or more of the parameters as a downstream dimension(s), \eg @cols@, implicitly using this parameter to compute the row stride of @m@. 1002 If the declaration of @fc@ is changed to: 1003 \begin{cfa} 1004 void fc( int rows, int cols, int m[@rows@][@cols@] ) ... 1005 \end{cfa} 1006 it is possible for C to perform bound checking across all subscripting, but it does not. 1007 While this contiguous-array capability is a step forward, it is still the programmer's responsibility to manually manage the number of dimensions and their sizes, both at the function definition and call sites. 1008 That is, the array does not automatically carry its structure and sizes for use in computing subscripts. 1009 While the non-contiguous style in @faa@ looks very similar to @fc@, the compiler only understands the unknown-sized array of row pointers, and it relies on the programmer to traverse the columns in a row correctly with a correctly bounded loop index. 1010 Specifically, there is no requirement that the rows are the same length, like a poem with different length lines. 1011 1012 \begin{figure} 1013 \begin{tabular}{@{}ll@{}} 1014 \multicolumn{1}{c}{\textbf{Contiguous}} & \multicolumn{1}{c}{\textbf{ Non-contiguous}} \\ 1015 \begin{cfa} 1016 void fc( int rows, @int cols@, int m[ /* rows */ ][@cols@] ) { 1017 ... printf( "%d ", @m[r][c]@ ); ... 1018 } 1019 int m@[5][5]@; 1020 for ( int r = 0; r < 5; r += 1 ) { 1021 1022 for ( int c = 0; c < 5; c += 1 ) 1023 m[r][c] = r + c; 1024 } 1025 fc( 5, 5, m ); 1026 \end{cfa} 1027 & 1028 \begin{cfa} 1029 void faa( int rows, int cols, int * m[ @/* cols */@ ] ) { 1030 ... printf( "%d ", @m[r][c]@ ); ... 1031 } 1032 int @* aa[5]@; // row pointers 1033 for ( int r = 0; r < 5; r += 1 ) { 1034 @aa[r] = malloc( 5 * sizeof(int) );@ // create rows 1035 for ( int c = 0; c < 5; c += 1 ) 1036 aa[r][c] = r + c; 1037 } 1038 faa( 5, 5, aa ); 1039 \end{cfa} 1040 \end{tabular} 1041 \caption{C99 Contiguous \vs Non-contiguous Matrix Styles} 1042 \label{f:ContiguousNon-contiguous} 1043 \end{figure} 1044 1045 1046 \subsection{Multidimensional array implementation} 863 \section{Multidimensional array implementation} 1047 864 \label{s:ArrayMdImpl} 1048 865 … … 1221 1038 S & | sized(S), $\C{// masquerading-as}$ 1222 1039 Timmed &, $\C{// immediate element, often another array}$ 1223 Tbase & $\C{// base element, e.g.float, never array}$1040 Tbase & $\C{// base element, \eg float, never array}$ 1224 1041 ) { // distribute forall to each element 1225 1042 struct arpk { … … 1274 1091 1275 1092 1276 1277 1093 \section{Array lifecycle} 1094 1095 An array is an aggregate, like a structure; 1096 both are containers wrapping subordinate objects. 1097 Any arbitrary object type, like @string@, can be an array element or structure member. 1098 A consequence is that the lifetime of the container must match with its subordinate objects: all elements and members must be initialized/uninitialized implicitly as part of the container's allocation/deallocation. 1099 Modern programming languages implicitly perform these operations via a type's constructor and destructor. 1100 Therefore, \CFA must assure that an array's subordinate objects' lifetime operations are called. 1101 1102 Preexisting \CFA mechanisms achieve this requirement, but with poor performance. 1103 Furthermore, advanced array users need an exception to the basic mechanism, which does not occur with other aggregates. 1104 Hence, arrays introduce subleties in supporting an element's lifecycle. 1105 1106 The preexisting \CFA support for contained-element lifecycle is based on recursive occurrences of the object-type (@otype@) pseudo-trait. 1107 A type is an @otype@, if it provides a default (parameterless) constructor, copy constructor, assignment operator, and destructor (like \CC). 1108 When declaring a structure with @otype@ members, the compiler implicitly generates implementations of the four @otype@ functions for the outer structure. 1109 Then the generated default constructor for the outer structure calls the default constructor for each member, and the other @otype@ functions work similarly. 1110 For a member that is a C array, these calls occur in a loop for each array element, which even works for VLAs. 1111 This logic works the same, whether the member is a concrete type (that happens to be an @otype@) or if the member is a polymorphic type asserted to be an @otype@ (which is implicit in the syntax, @forall(T)@). 1112 The \CFA array has the simplified form (similar to one seen before): 1113 \begin{cfa} 1114 forall( T * ) // non-otype element, no lifecycle functions 1115 // forall( T ) // otype element, lifecycle functions asserted 1116 struct array5 { 1117 T __items[ 5 ]; 1118 }; 1119 \end{cfa} 1120 Being a structure with a C-array member, using the otype-form declaration for @T@ causes @array5(float)@ to implement @otype@ too. 1121 1122 But this @otype@-recursion pattern has a performance issue. 1123 For example, in a cube of @float@: 1124 \begin{cfa} 1125 array5( array5( array5( float ) ) ) 1126 \end{cfa} 1127 the first few steps of the compiler's work to find the lifecycle functions, under the @otype@-recursion pattern, are shown in \VRef[Figure]{f:OtypeRecursionBlowup}. 1128 All the work needed for the full @float@-cube would have 256 leaves. 1129 1130 %array5(T) offers 1131 %1 parameterless ctor, which asks for T to have 1132 % 1 parameterless ctor 1133 % 2 copy ctor 1134 % 3 asgt 1135 % 4 dtor 1136 %2 copy ctor, which asks for T to have 1137 % 1 parameterless ctor 1138 % 2 copy ctor 1139 % 3 asgt 1140 % 4 dtor 1141 %3 asgt, which asks for T to have 1142 % 1 parameterless ctor 1143 % 2 copy ctor 1144 % 3 asgt 1145 % 4 dtor 1146 %4 dtor, which asks for T to have 1147 % 1 parameterless ctor 1148 % 2 copy ctor 1149 % 3 asgt 1150 % 4 dtor 1151 1152 \begin{figure} 1153 \centering 1154 \setlength{\tabcolsep}{15pt} 1155 \begin{tabular}{@{}lll@{}} 1156 \begin{cfa}[deletekeywords={default}] 1157 float offers 1158 1 default ctor 1159 2 copy ctor 1160 3 asgt 1161 4 dtor 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 \end{cfa} 1187 & 1188 \begin{cfa}[deletekeywords={default}] 1189 array5(float) has 1190 1 default ctor, using float's 1191 1 default ctor 1192 2 copy ctor 1193 3 asgt 1194 4 dtor 1195 2 copy ctor, using float's 1196 1 default ctor 1197 2 copy ctor 1198 3 asgt 1199 4 dtor 1200 3 asgt, using float's 1201 1 default ctor 1202 2 copy ctor 1203 3 asgt 1204 4 dtor 1205 4 dtor, using float's 1206 1 default ctor 1207 2 copy ctor 1208 3 asgt 1209 4 dtor 1210 1211 1212 1213 1214 1215 1216 1217 1218 \end{cfa} 1219 & 1220 \begin{cfa}[deletekeywords={default}] 1221 array5(array5(float)) has 1222 1 default ctor, using array5(float)'s 1223 1 default ctor, using float's 1224 1 default ctor 1225 2 copy ctor 1226 3 asgt 1227 4 dtor 1228 2 copy ctor, using float's 1229 1 default ctor 1230 2 copy ctor 1231 3 asgt 1232 4 dtor 1233 3 asgt, using float's 1234 1 default ctor 1235 2 copy ctor 1236 3 asgt 1237 4 dtor 1238 4 dtor, using float's 1239 1 default ctor 1240 2 copy ctor 1241 3 asgt 1242 4 dtor 1243 2 copy ctor, using array5(float)'s 1244 ... 4 children, 16 leaves 1245 3 asgt, using array5(float)'s 1246 ... 4 children, 16 leaves 1247 4 dtor, using array5(float)'s 1248 ... 4 children, 16 leaves 1249 (64 leaves) 1250 \end{cfa} 1251 \end{tabular} 1252 \caption{Exponential thunk generation under the otype-recursion pattern. 1253 Each time that one type's function (\eg ctor) uses another type's function, the \CFA compiler generates a thunk, to capture the used function's dependencies, presented according to the using function's need. 1254 So, each non-leaf line represents a generated thunk and every line represents a search request for the resolver to find a satisfying function.} 1255 \label{f:OtypeRecursionBlowup} 1256 \end{figure} 1257 1258 So the @otype@-recursion pattern seeks a quantity of helper functions, and generates a quantity of thunks, that are exponential in the number of dimensions. 1259 Anecdotal experience with this solution found the resulting compile times annoyingly slow at three dimensions, and unusable at four. 1260 1261 The issue is that the otype-recursion pattern uses more assertions than needed. 1262 Consider how @array5(float)@'s default constructor is getting all four lifecycle assertions about the element type, @float@. 1263 It only needs @float@'s default constructor; 1264 the full set of operations is never used. 1265 Current work by the \CFA team aims to improve this situation. 1266 Therefore, a workaround is needed for now. 1267 1268 The workaround is to provide a default constructor, copy constructor and destructor, defined with lean, bespoke assertions: 1269 \begin{cquote} 1270 \begin{tabular}{@{}l@{\hspace{0.5in}}l@{}} 1271 \begin{cfa} 1272 // autogenerated for otype-recursion: 1273 forall( T ) 1274 void ?{}( array5(T) & this ) { 1275 for (i; 5) { ( this[i] ){}; } 1276 } 1277 forall( T ) 1278 void ?{}( array5(T) & this, array5(T) & src ) { 1279 for (i; 5) { ( this[i] ){ src[i] }; } 1280 } 1281 forall( T ) 1282 void ^?{}( array5(T) & this ) { 1283 for (i; 5) { ^( this[i] ){}; } 1284 } 1285 \end{cfa} 1286 & 1287 \begin{cfa} 1288 // lean, bespoke: 1289 forall( T* | { void @?{}( T & )@; } ) 1290 void ?{}( array5(T) & this ) { 1291 for (i; 5) { ( this[i] ){}; } 1292 } 1293 forall( T* | { void @?{}( T &, T )@; } ) 1294 void ?{}( array5(T) & this, array5(T) & src ) { 1295 for (i; 5) { ( this[i] ){ src[i] }; } 1296 } 1297 forall( T* | { void @?{}( T & )@; } ) 1298 void ^?{}( array5(T) & this ) { 1299 for (i; 5) { ^( this[i] ){}; } 1300 } 1301 \end{cfa} 1302 \end{tabular} 1303 \end{cquote} 1304 Moreover, the assignment operator is skipped, to avoid hitting a lingering growth case. 1305 Skipping assignment is tolerable because array assignment is not a common operation. 1306 With this solution, the critical lifecycle functions are available, with no growth in thunk creation. 1307 1308 Finally, the intuition that a programmer using an array always wants the elements' default constructor called \emph{automatically} is simplistic. 1309 Arrays exist to store different values at each position. 1310 So, array initialization needs to let the programmer provide different constructor arguments to each element. 1311 \begin{cfa} 1312 thread worker { int id; }; 1313 void ?{}( worker & ) = void; // remove default constructor 1314 void ?{}( worker &, int id ); 1315 array( worker, 5 ) ws = @{}@; // rejected; but desire is for no initialization yet 1316 for (i; 5) (ws[i]){ @i@ }; // explicitly initialize each thread, giving id 1317 \end{cfa} 1318 Note the use of the \CFA explicit constructor call, analogous to \CC's placement-@new@. 1319 This call is where initialization is desired, and not at the declaration of @ws@. 1320 The attempt to initialize from nothing (equivalent to dropping @= {}@ altogether) is invalid because the @worker@ type removes the default constructor. 1321 The @worker@ type is designed this way to work with the threading system. 1322 A thread type forks a thread at the end of each constructor and joins with it at the start of each destructor. 1323 But a @worker@ cannot begin its forked-thead work without knowing its @id@. 1324 Therefore, there is a conflict between the implicit actions of the builtin @thread@ type and a user's desire to defer these actions. 1325 1326 Another \CFA feature may, at first, seem viable for initializing the array @ws@, though on closer inspection, it is not. 1327 C initialization, \lstinline|array(worker, 5) ws @= {};|, ignores all \CFA lifecycle management and uses C empty initialization. 1328 This option does achieve the desired semantics on the construction side. 1329 But on destruction side, the desired semantics is for implicit destructor calls to continue, to keep the join operation tied to lexical scope. 1330 C initialization disables \emph{all} implicit lifecycle management, but the goal is to disable only the implicit construction. 1331 1332 To fix this problem, I enhanced the \CFA standard library to provide the missing semantics, available in either form: 1333 \begin{cfa} 1334 array( @uninit@(worker), 5 ) ws1; 1335 array( worker, 5) ws2 = { @delay_init@ }; 1336 \end{cfa} 1337 Both cause the @ws@-construction-time implicit call chain to stop before reaching a @worker@ constructor, while leaving the implicit destruction calls intact. 1338 Two forms are available, to parallel the development of this feature in \uCpp. 1339 Originally \uCpp offered only the @ws1@ form, using the class-template @uNoCtor@ equivalent to \CFA's @uninit@. 1340 More recently, \uCpp was extended with the declaration macro, @uArray@, with usage similar to the @ws2@ case. 1341 Based on experience piloting @uArray@ as a replacement of @uNoCtor@, it might be possible to remove the first option. 1342 1343 % note to Mike, I have more fragments on some details available in push24\fragments\uNoCtor.txt 1278 1344 1279 1345 \section{Comparison with other arrays}
Note:
See TracChangeset
for help on using the changeset viewer.