- Timestamp:
- Oct 31, 2024, 3:20:05 PM (12 months ago)
- Branches:
- master
- Children:
- ad9f593
- Parents:
- 292d6cf6
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/theses/mike_brooks_MMath/array.tex
r292d6cf6 rb7921d8 317 317 \label{f:checkHarness} 318 318 \end{figure} 319 320 321 \section{Dimension Parameter Implementation} 322 323 The core of the preexisting \CFA compiler already had the ``heavy equipment'' needed 324 to provide the feature set just reviewed (up to bugs in cases not yet exercised). 325 To apply this equipment in tracking array lengths, I encoded a dimension (array's length) as a type. 326 The type in question does not describe any data that the program actually uses at runtime. 327 It simply carries information through intermediate stages of \CFA-to-C lowering. 328 And this type takes a form such that, once \emph{it} gets lowered, the result does the right thing. 329 330 Furthermore, the @array@ type itself is really ``icing on the cake.'' 331 Presenting its full version is deferred until \VRef[Section]{s:ArrayMdImpl} 332 (where added complexity needed for multiple dimensions is considered). 333 But simplifications close enough for the present discussion are: 334 \begin{cfa} 335 forall( [N] ) 336 struct array_1d_float { 337 float items[N]; 338 }; 339 forall( T, [N] ) 340 struct array_1d { 341 T items[N]; 342 }; 343 \end{cfa} 344 This structure pattern, plus a subscript operator, is all that @array@ provides. 345 346 My main work is letting a programmer define 347 such a structre (one whose type is parameterized by @[N]@) 348 and functions that operate on it (these being similarly parameterized). 349 350 The repurposed heavy equipment is 351 \begin{itemize} 352 \item 353 The resolver, providing values for a used declaration's type-system variables, 354 gathered from type information in scope at the usage site. 355 356 \item 357 The box pass, encoding information about type parameters 358 into ``extra'' regular parameters/arguments on declarations and calls. 359 Notably, it conveys the size of a type @foo@ as a @__sizeof_foo@ parameter, 360 and rewrites the @sizeof(foo)@ expression as @__sizeof_foo@, i.e. a use of the parameter. 361 \end{itemize} 362 363 The rules for resolution had to be restricted slightly, in order to achieve important refusal cases. 364 This work is detailed in \VRef[Section]{s:ArrayTypingC}. 365 However, the resolution--boxing scheme, in its preexisting state, was already equipped to work on (desugared) dimension parameters. 366 The discussion following explains the desugaring and how correct lowered code results. 367 368 An even simpler structure, and a toy function on it, demonstrate what's needed for the encoding. 369 \begin{cfa} 370 forall( [@N@] ) { // [1] 371 struct thing {}; 372 void f( thing(@N@) ) { sout | @N@; } // [2], [3] 373 } 374 int main() { 375 thing( @10@ ) x; f(x); // prints 10, [4] 376 thing( 100 ) y; f(y); // prints 100 377 return 0; 378 } 379 \end{cfa} 380 This example has: 381 \begin{enumerate} 382 \item 383 The symbol @N@ being declared as a type variable (a variable of the type system). 384 \item 385 The symbol @N@ being used to parameterize a type. 386 \item 387 The symbol @N@ being used as an expression (value). 388 \item 389 A value like 10 being used as an argument to the parameter @N@. 390 \end{enumerate} 391 The chosen solution being to encode the value @N@ \emph{as a type}, items 1 and 2 are immediately available for free. 392 Item 3 needs a way to recover the encoded value from a (valid) type (and to reject invalid types occurring here). 393 Item 4 needs a way to produce a type that encodes the given value. 394 395 Because the box pass handles a type's size as its main datum, the encoding is chosen to use it. 396 The production and recovery are then straightforward. 397 \begin{itemize} 398 \item 399 The value $n$ is encoded as a type whose size is $n$. 400 \item 401 Given a dimension expression $e$, produce type @char[@$e$@]@ to represent it. 402 If $e$ evaluates to $n$ then the endoded type has size $n$. 403 \item 404 Given a type $T$ (produced by these rules), recover the value that it represents with the expression @sizeof(@$T$@)@. 405 If $T$ has size $n$ then the recovery expression evaluates to $n$. 406 \end{itemize} 407 408 This desugaring is applied in a translation step before the resolver. 409 The ``validate'' pass hosts it, along with several other canonicalizing and desugaring transformations (the pass's name notwithstanding). 410 The running example is lowered to: 411 \begin{cfa} 412 forall( @N*@ ) { // [1] 413 struct thing {}; 414 void f( thing(@N@) ) { sout | @sizeof(N)@; } // [2], [3] 415 } 416 int main() { 417 thing( char[@10@] ) x; f(x); // prints 10, [4] 418 thing( char[100] ) y; f(y); // prints 100 419 return 0; 420 } 421 \end{cfa} 422 Observe: 423 \begin{enumerate} 424 \item 425 @N@ is now declared to be a type. 426 It is declared to be \emph{sized} (by the @*@), meaning that the box pass shall do its @sizeof(N)@--@__sizeof_N@ extra parameter and expression translation. 427 \item 428 @thing(N)@ is a type; the argument to the generic @thing@ is a type (type variable). 429 \item 430 The @sout...@ expression (being an application of the @?|?@ operator) has a second argument that is an ordinary expression. 431 \item 432 The type of variable @x@ is another @thing(-)@ type; the argument to the generic @thing@ is a type (array type). 433 \end{enumerate} 434 435 From this point, preexisting \CFA compilation takes over lowering it the rest of the way to C. 436 Inspecting the result shows what the above translation achieves. 437 A form that shows only the relevant changes of the box pass (as informed by the resolver), leaving the rest unadulterated, is: 438 \begin{cfa} 439 // [1] 440 void f( size_t __sizeof_N, @void *@ ) { sout | @__sizeof_N@; } // [2], [3] 441 int main() { 442 struct __conc_thing_10 {} x; f(@10@, &x); // prints 10, [4] 443 struct __conc_thing_100 {} y; f(@100@, &y); // prints 100 444 return 0; 445 } 446 \end{cfa} 447 Observe: 448 \begin{enumerate} 449 \item 450 The type parameter @N@ is gone. 451 \item 452 The type @thing(N)@ is (replaced by @void *@, but thereby effectively) gone. 453 \item 454 The @sout...@ expression (being an application of the @?|?@ operator) has a second argument that is a regular variable (parameter) usage. 455 \item 456 Information about the particular @thing@ instantiation (value 10) has moved, from the type, to a regular function-call argument. 457 \end{enumerate} 458 At the end of the desugaring and downstream processing, the original C idiom of ``pass both a pointer and a length parameter'' has been reconstructed. 459 In the programmer-written form, only the thing is passed. 460 The compiler's action produces the more complex form, which if handwritten, would be error-prone. 461 462 Back at the very front end, the parsing changes, AST schema extensions, and validation rules for enabling the sugared user input are: 463 \begin{itemize} 464 \item 465 Recognize the form @[N]@ as a type-variable declaration within a @forall@. 466 \item 467 Have the new brand of type-variable, \emph{Dimension}, in the AST form of a type-variable, to represent one parsed from @[-]@. 468 \item 469 Allow a type variable to occur in expression position. Validate (after parsing) that only dimension-branded type variables are used here. 470 \item 471 Allow an expression to occur in type-argument position. Brand the resulting type argument as a dimension. 472 \item 473 Validate (after parsing), on a generic-type usage, \eg the type part of the declaration 474 \begin{cfa} 475 @array_1d( foo, bar ) x;@ 476 \end{cfa} 477 that the brands on the generic arguments match the brands of the declared type variables. 478 Here, that @foo@ is a type and @bar@ is a dimension. 479 \end{itemize} 319 480 320 481 … … 743 904 TODO: Discuss the interaction of this dimension hoisting with the challenge of extra unification for cost calculation 744 905 906 745 907 \section{Multidimensional Arrays} 746 908 \label{toc:mdimpl} … … 883 1045 884 1046 \subsection{Multidimensional array implementation} 1047 \label{s:ArrayMdImpl} 885 1048 886 1049 A multidimensional array implementation has three relevant levels of abstraction, from highest to lowest, where the array occupies \emph{contiguous memory}.
Note:
See TracChangeset
for help on using the changeset viewer.