Changeset 8d764d4f for doc/theses/mike_brooks_MMath
- Timestamp:
- Mar 22, 2026, 9:31:11 PM (4 days ago)
- Branches:
- master
- Children:
- 98da9e8
- Parents:
- f97e7be
- git-author:
- Peter A. Buhr <pabuhr@…> (03/22/26 21:27:13)
- git-committer:
- Peter A. Buhr <pabuhr@…> (03/22/26 21:31:11)
- Location:
- doc/theses/mike_brooks_MMath
- Files:
-
- 2 edited
-
array.tex (modified) (32 diffs)
-
programs/hello-accordion.cfa (modified) (1 diff)
Legend:
- Unmodified
- Added
- Removed
-
doc/theses/mike_brooks_MMath/array.tex
rf97e7be r8d764d4f 8 8 However, a few compatibility-breaking changes to the behaviour of the C array are necessary, both as an implementation convenience and to fix C's lax treatment of arrays. 9 9 Hence, the @array@ type is an opportunity to start from a clean slate and show a cohesive selection of features, making it unnecessary to deal with every inherited complexity of the C array. 10 Future work is to sugar the @array@ syntax to look like C arrays \see{\VRef{f:FutureWork}}. 10 11 11 12 … … 79 80 \item Provide a dimension/subscript-checked array-type in the \CFA standard library, where the array's length is statically managed and dynamically valued. 80 81 \item Provide argument/parameter passing safety for arrays and subscript safety. 81 \item Identify the interesting specificabilities available by the new @array@ type.82 \item Identify the interesting abilities available by the new @array@ type. 82 83 \item Where there is a gap concerning this feature's readiness for prime-time, identification of specific workable improvements that are likely to close the gap. 83 84 \end{enumerate} … … 135 136 Except for the lifetime-management issue of @result@, \ie explicit @free@, this program has eliminated both the syntactic and semantic problems associated with C arrays and their usage. 136 137 The result is a significant improvement in safety and usability. 137 138 138 In summary: 139 139 \begin{itemize}[leftmargin=*] … … 145 145 The value of an @N@-expression is the acquired length, derived from the usage site, \ie generic declaration or function call. 146 146 \item 147 @array( thing, N0, N1, ... )@ is a multi-dimensional type wrapping $\prod_i N_i$ adjacent occurrences of @thing@-typed objects.147 @array( T, N0, N1, ... )@ is a multi-dimensional type wrapping $\prod_i N_i$ contiguous @T@-typed objects. 148 148 \end{itemize} 149 149 … … 181 181 182 182 \begin{figure} 183 \setlength{\tabcolsep}{3pt} 183 184 \begin{tabular}{@{}ll@{}} 184 185 \begin{c++} … … 189 190 } 190 191 int main() { 191 const size_t n = 10; // must be constant192 int ret[n], x[n]; 192 const size_t n = 10; $\C[1.9in]{// must be constant}$ 193 int ret[n], x[n]; $\C{// static size}$ 193 194 for ( int i = 0; i < n; i += 1 ) x[i] = i; 194 195 @copy<int, n >( ret, x );@ 195 for ( int i = 0; i < n; i += 1 ) 196 cout << ret[i] << ' '; 196 for ( int i = 0; i < n; i += 1 ) cout << ret[i] << ' '; 197 197 cout << endl; 198 198 } … … 201 201 \begin{cfa} 202 202 int main() { 203 @forall( T, [N] )@ // nested function203 @forall( T, [N] )@ $\C{// nested function}$ 204 204 void copy( array( T, @N@ ) & ret, array( T, @N@ ) & x ) { 205 205 for ( i; N ) ret[i] = x[i]; … … 207 207 size_t n; 208 208 sin | n; 209 array( int, n ) ret, x; 210 for ( i; n ) x[i] = i; 211 @copy( ret, x );@ 212 for ( i; n ) 213 sout | ret[i] | nonl; 209 array( int, n ) ret, x; $\C{// dynamic-fixed size}$ 210 for ( i; n ) x[i] = i; $\C{// initialize}$ 211 @copy( ret, x );@ $\C{// alternative ret = x}$ 212 for ( i; n ) sout | ret[i] | nonl; $\C{// print}\CRT$ 214 213 sout | nl; 215 214 } … … 260 259 \end{cfa} 261 260 This ability to avoid casting and size arithmetic improves safety and usability over C flexible array members. 261 The example program prints the courses in each student's preferred order, all using the looked-up display names. 262 When a function operates on a @School@ structure, the type system handles its memory layout transparently. 263 In the example, function @getPref@ returns, for the student at position @is@, what is the position of their @pref@\textsuperscript{th}-favoured class. 262 264 Finally, inputs and outputs are given on the right for different sized schools. 263 The example program prints the courses in each student's preferred order, all using the looked-up display names.264 265 265 266 \begin{figure} 266 267 \begin{lrbox}{\myboxA} 267 268 \begin{tabular}{@{}l@{}} 269 \lstinput{30-38}{hello-accordion.cfa} \\ 268 270 \lstinput{50-55}{hello-accordion.cfa} \\ 269 271 \lstinput{90-98}{hello-accordion.cfa} … … 295 297 \end{figure} 296 298 297 When a function operates on a @School@ structure, the type system handles its memory layout transparently. 298 \lstinput{30-36}{hello-accordion.cfa} 299 In the example, function @getPref@ returns, for the student at position @is@, what is the position of their @pref@\textsuperscript{th}-favoured class? 300 301 302 \section{Dimension Parameter Implementation} 299 300 \section{Single Dimension Array Implementation} 303 301 304 302 The core of the preexisting \CFA compiler already has the ``heavy equipment'' needed to provide the feature set just reviewed (up to bugs in cases not yet exercised). … … 309 307 This new dimension type, encoding the array dimension within it, is the first limited \newterm{dependent type}~\cite{DependentType} added to the \CFA type-system and is used at appropriate points during type resolution. 310 308 311 Furthermore, the @array@ type itself is really ``icing on the cake.'' 312 Presenting its full version is deferred until \VRef[Section]{s:ArrayMdImpl} 313 (where added complexity needed for multiple dimensions is considered). 314 But simplifications close enough for the present discussion are: 309 % Furthermore, the @array@ type itself is really ``icing on the cake.'' 310 % Presenting its full version is deferred until \VRef[Section]{s:ArrayMdImpl} (where added complexity needed for multiple dimensions is considered). 311 The new array implementation is broken into single and multiple dimensions \see{\VRef[Section]{s:ArrayMdImpl}}. 312 Details of the @array@ macros are sprinkles among the implementation discussion. 313 The single dimension implementation begins with a simple example without @array@ macros. 315 314 \begin{cfa} 316 315 forall( [N] ) 317 316 struct array_1d_float { 318 float items[N]; 317 float items[N]; $\C[2in]{// monomorphic type}$ 319 318 }; 320 319 forall( T, [N] ) 321 320 struct array_1d_T { 322 T items[N]; 321 T items[N]; $\C{// polymorphic type}\CRT$ 323 322 }; 324 323 \end{cfa} … … 428 427 At the end of the desugaring and downstream processing, the original C idiom of ``pass both a length parameter and a pointer'' has been reconstructed. 429 428 In the programmer-written form, only the @thing@ is passed. 430 The compiler's action produces the more complex form, which if handwritten, would beerror-prone.429 The compiler's action produces the more complex form, which if handwritten, is error-prone. 431 430 432 431 At the compiler front end, the parsing changes AST schema extensions and validation rules for enabling the sugared user input. … … 484 483 array( float, @n@ ) x; 485 484 array( float, @999@ ) * xp = &x; $\C{// static rejection here}$ 486 (*xp)[@500@]; $\C{// runtime check passes}$485 (*xp)[@500@]; $\C{// runtime check would pass if program compiled}$ 487 486 \end{cfa} 488 487 The way the \CFA array is implemented, the type analysis for this case reduces to a case similar to the earlier C version. … … 541 540 extern float g(); 542 541 void f() { 543 float x[@n@] = { g() };542 float x[@n@] = { @g()@ }; // side-effect 544 543 float (*xp)[@n@] = x; // reject 545 544 } … … 594 593 \item[Potentially Unstable] 595 594 The catch-all category. Notable examples include: 596 any function-call result, @float x[foo()]@, the particular function-call result that is a pointer dereference, @void f(const int * n)@ @{ float x[*n]; }@, and any use of a reference-type dvariable.595 any function-call result, @float x[foo()]@, the particular function-call result that is a pointer dereference, @void f(const int * n)@ @{ float x[*n]; }@, and any use of a reference-type variable. 597 596 \end{description} 598 597 Within these groups, my \CFA rules are: … … 756 755 The conservatism of the new rule set can leave a programmer requiring a recourse, when needing to use a dimension expression whose stability argument is more subtle than current-state analysis. 757 756 This recourse is to declare an explicit constant for the dimension value. 758 Consider these two dimension expressions, whose uses arerejected by the blunt current-state rules:757 Consider these two dimension expressions, rejected by the blunt current-state rules: 759 758 \begin{cfa} 760 759 void f( int @&@ nr, @const@ int nv ) { 761 760 float x[@nr@]; 762 float (*xp)[@nr@] = &x; // reject: nr varying (no references)761 float (*xp)[@nr@] = &x; $\C[2.5in]{// reject: nr varying (no references)}$ 763 762 float y[@nv + 1@]; 764 float (*yp)[@nv + 1@] = &y; // reject: ?+? unpredictable (no functions)763 float (*yp)[@nv + 1@] = &y; $\C{// reject: ?+? unpredictable (no functions)}\CRT$ 765 764 } 766 765 \end{cfa} … … 773 772 @const int nx@ = nr; 774 773 float x[nx]; 775 float (*xp)[nx] = & x; // accept774 float (*xp)[nx] = & x; $\C[2.5in]{// accept}$ 776 775 @const int ny@ = nv + 1; 777 776 float y[ny]; 778 float (*yp)[ny] = & y; // accept777 float (*yp)[ny] = & y; $\C{// accept}\CRT$ 779 778 } 780 779 \end{cfa} … … 785 784 Rather obviously, every array must be subscriptable, even a bizarre one: 786 785 \begin{cfa} 787 array( float, @rand(10)@ ) x; 788 x[@0@]; // 10% chance of bound-check failure786 array( float, @rand(10)@ ) x; $\C[2.5in]{// x[0] => no elements}$ 787 x[@0@]; $\C{// 10\% chance of bound-check failure}\CRT$ 789 788 \end{cfa} 790 789 Less obvious is that the mechanism of subscripting is a function call, which must communicate length accurately. … … 855 854 The new \CFA standard-library @array@-datatype supports richer multidimensional features than C. 856 855 The new array implementation follows C's contiguous approach, \ie @float [r][c]@, with one contiguous object subscripted by coarsely-strided dimensions directly wrapping finely-strided dimensions. 857 Beyond what C's array type offers, the new array brings direct support for working with a noncontiguousarray slice, allowing a program to work with dimension subscripts given in a non-physical order.856 Beyond what C's array type offers, the new array brings direct support for working with a \emph{noncontiguous} array slice, allowing a program to work with dimension subscripts given in a non-physical order. 858 857 859 858 The following examples use the matrix declaration @array( float, 5, 7 ) m@, loaded with values incremented by $0.1$, when stepping across the length-7 finely-strided column dimension, and stepping across the length-5 coarsely-strided row dimension. … … 870 869 \lstinput[aboveskip=0pt]{140-140}{hello-md.cfa} 871 870 872 However, function @print1d @ is asserting too much knowledge about its parameter @r@ for printing either a row slice or a column slice.871 However, function @print1d_cstyle@ is asserting too much knowledge about its parameter @r@ for printing either a row slice or a column slice. 873 872 Specifically, declaring the parameter @r@ with type @array@ means that @r@ is contiguous, which is unnecessarily restrictive. 874 873 That is, @r@ need only be of a container type that offers a subscript operator (of type @ptrdiff_t@ $\rightarrow$ @float@) with managed length @N@. … … 942 941 Proceeding from @x@ gives the numeric indices as coarse then fine, while proceeding from @x[all]@ gives them fine then coarse. 943 942 These two starting expressions, which are the example's only multidimensional subexpressions 944 (those that received zero numeric indices so far), are illustrated with vertical steps where a \emph{first} numeric index would select. 945 946 The figure's presentation offers an intuition answer to: What is an atomic element of @x[all]@? 947 From there, @x[all]@ itself is simply a two-dimensional array, in the strict C sense, of these building blocks. 948 An atom (like the bottommost value, @x[all][3][2]@), is the contained value (in the square box) 949 and a lie about its size (the left diagonal above it, growing upward). 943 (those that received zero numeric indices so far), are illustrated with vertical steps where a \emph{first} numeric index selects. 944 945 The figure's presentation offers an intuitive answer to: What is an atomic element of @x[all]@? 946 \begin{enumerate}[leftmargin=*] 947 \item 948 @x[all]@ itself is simply a two-dimensional array, in the strict C sense, of these building blocks. 949 \item 950 950 An array of these atoms (like the intermediate @x[all][3]@) is just a contiguous arrangement of them, done according to their size; 951 951 call such an array a column. 952 952 A column is almost ready to be arranged into a matrix; 953 953 it is the \emph{contained value} of the next-level building block, but another lie about size is required. 954 \item 955 An atom (like the bottommost value, @x[all][3][2]@), is the contained value (in the square box) and a lie about its size (the left diagonal above it, growing upward). 956 \end{enumerate} 954 957 At first, an atom needs to be arranged as if it is bigger, but now a column needs to be arranged as if it is smaller (the left diagonal above it, shrinking upward). 955 958 These lying columns, arranged contiguously according to their size (as announced) form the matrix @x[all]@. … … 1028 1031 At runtime, the \CFA array is exactly a C array. 1029 1032 \CFA array subscripting is protected with runtime bound checks. 1030 The array dependent-typing provides information to the C optimizer, allowing it remove many of the bound checks.1033 The array dependent-typing provides information to the C optimizer, allowing it to remove many of the bound checks. 1031 1034 This section provides a demonstration of these effects. 1032 1035 1033 The experiment compares the \CFA array system with the simple-safety system most typically exemplified by Java arrays (\VRef[Section]{JavaCompare}), but also reflected in the \CC pattern where restricted vector usage models a checked array (\VRef[Section]{CppCompare}). 1034 The essential feature of this simple system is the one-to-one correspondence between array instances and the symbolic bounds on which dynamic checks are based. 1035 The experiment uses the \CC version to simplify access to generated assembly code. 1036 While ``\CC'' labels a participant, it is really the simple-safety system (of @vector@ with @.at@) whose limitations are being explained, and not limitations of \CC optimization. 1037 1038 As a control case, a simple loop (with no reused dimension sizes) is seen to get the same optimization treatment in both the \CFA and \CC versions. 1039 Firstly, when the programmer treats the array's bound correctly (making the subscript ``obviously fine''), no dynamic bound check is observed in the program's optimized assembly code. 1040 But when the bounds are adjusted, such that the subscript is possibly invalid, the bound check appears in the optimized assembly, ready to catch a mistake. 1036 The experiment compares the \CFA array system with the simple-safety system most typically exemplified by Java arrays (\VRef[Section]{JavaCompare}), and reflected in \CC vector using member @.at@ for subscripting (\VRef[Section]{CppCompare}). 1037 The essential feature is the one-to-one correspondence between array instances and the symbolic bounds on which dynamic checks are based. 1038 Hence, it is the implicit \CFA checked subscripting and explicit \CC @vector@ @.at@ mechanisms being explained and not limitations of \CC optimization. 1039 1040 As a control case, a simple loop receives the same optimization treatment in both the \CFA and \CC versions. 1041 When the surrounding code makes it clear subscripting is always in bound, no dynamic bound check is observed in the program's optimized assembly code. 1042 But when the environment is adjusted, such that the subscript is possibly invalid, the bound check appears in the optimized assembly, ready to catch a mistake. 1041 1043 1042 1044 \VRef[Figure]{f:ovhd-ctl} gives a control-case example of summing values in an array, where each column shows the program in languages C (a,~d,~g), \CFA (b,~e,~h), and \CC (c,~f,~i). … … 1099 1101 1100 1102 When dimension sizes get reused, \CFA has an advantage over \CC-vector at getting simply written programs well optimized. 1101 The example case of \VRef[Figures]{f:ovhd-treat-src} and \VRef[]{f:ovhd-treat-asm} is a simple matrix multiplication over a row-major encoding.1103 \VRef[Figures]{f:ovhd-treat-src} shows a simple matrix multiplication over a row-major encoding. 1102 1104 Simple means coding directly to the intuition of the mathematical definition without trying to optimize for memory layout. 1103 1105 In the assembly code of \VRef[Figure]{f:ovhd-treat-asm}, the looping pattern of \VRef[Figure]{f:ovhd-ctl} (d, e, f), ``Skip ahead on zero; loop back for next,'' occurs with three nesting levels. … … 1145 1147 Or more technically, guaranteeing @N@ as the basis for the imposed bound check \emph{of every row.} 1146 1148 As well, the \CC type does not impose the mathematically familiar constraint of $(M \times P) (P \times N) \rightarrow (M \times N)$. 1147 Even assuming away the first issue, \eg that in @lhs@ ,all minor/cell counts agree, the data format allows the @rhs@ major/row count to disagree.1149 Even assuming away the first issue, \eg that in @lhs@ all minor/cell counts agree, the data format allows the @rhs@ major/row count to disagree. 1148 1150 Or, the possibility that the row count @res.size()@ disagrees with the row count @lhs.size()@ illustrates this bound-mistake type in isolation. 1149 1151 The \CFA solution guards against this possibility by keeping length information separate from the array data, and therefore eligible for sharing. … … 1380 1382 \begin{cfa} 1381 1383 thread worker { int id; }; 1382 void ?{}( worker & ) = void; // remove default constructor1384 void ?{}( worker & ) = void; $\C[2.5in]{// remove default constructor}$ 1383 1385 void ?{}( worker &, int id ); 1384 array( worker, 5 ) ws = @{}@; // rejected; but desire is for no initialization yet1385 for ( i; 5 ) (ws[i]){ @i@ }; // explicitly initialize each thread, giving id1386 array( worker, 5 ) ws = @{}@; $\C{// rejected; but desire is for no initialization yet}$ 1387 for ( i; 5 ) (ws[i]){ @i@ }; $\C{// explicit (placement) initialization for each thread, giving id}\CRT$ 1386 1388 \end{cfa} 1387 1389 Note the use of the \CFA explicit constructor call, analogous to \CC's placement-@new@. … … 1393 1395 Therefore, there is a conflict between the implicit actions of the builtin @thread@ type and a user's desire to defer these actions. 1394 1396 1395 Another \CFA feature for providing C backwards compatibility, at first seem viable for initializing the array @ws@, though on closer inspection is not.1397 Another \CFA feature for providing C backwards compatibility, at first seems viable for initializing the array @ws@, though on closer inspection is not. 1396 1398 C initialization without a constructor is specified with \lstinline|@=|, \eg \lstinline|array(worker, 5) ws @= {}| ignores all \CFA lifecycle management and uses C empty initialization. 1397 1399 This option does achieve the desired semantics on the construction side. … … 1424 1426 let fa: [f64; 5] = [1.2, fval, 5.6, 7.3, 9.1]; $\C{// immutable, individual initialization}$ 1425 1427 \end{rust} 1426 Initialization is required even if thearray is subsequently initialized.1428 Initialization is required even if an array is subsequently initialized. 1427 1429 Rust arrays are not VLAs, but the compile-time length can be queried using member @len()@. 1428 1430 Arrays can be assigned and passed to parameters by value or reference, if and only if, the type and dimension match, meaning a different function is needed for each array size. … … 1548 1550 1549 1551 \begin{figure} 1550 \ setlength{\tabcolsep}{15pt}1551 \begin{ tabular}{ll@{}}1552 \centering 1553 \begin{lrbox}{\myboxA} 1552 1554 \begin{java} 1553 1555 int m[][] = { // triangular matrix … … 1579 1581 } 1580 1582 \end{java} 1581 & 1583 \end{lrbox} 1584 1585 \begin{lrbox}{\myboxB} 1582 1586 \begin{cfa} 1583 1587 int * m[5] = { // triangular matrix … … 1609 1613 } 1610 1614 \end{cfa} 1611 \end{tabular} 1612 \caption{Java (left) \vs C (right) Triangular Matrix} 1615 \end{lrbox} 1616 1617 \subfloat[Java]{\label{f:JavaTriangularMatrix}\usebox\myboxA} 1618 \hspace{6pt} 1619 \vrule 1620 \hspace{6pt} 1621 \subfloat[C]{\label{f:CTriangularMatrix}\usebox\myboxB} 1622 \hspace{6pt} 1623 1624 \caption{Triangular Matrix} 1613 1625 \label{f:JavaVsCTriangularMatrix} 1614 1626 \end{figure} … … 1639 1651 But none makes an allocation with a dynamically given fixed size less awkward than the vector arrangement just described. 1640 1652 1641 \CC~26 adds @std::inplace_vector@ , which provides an interesting vector--array hybrid,\footnote{1642 Like a vector, it lets a user construct elements in a loop, rather than imposing uniform construction.1643 Yet it preserves \lstinline{std::array}'s ability to be entirely stack-allocated, by avoiding an auxiliary elements' allocation.1644 } but does not changethe fundamental limit of \lstinline{std::array}, that the length, being a template parameter, must be a static value.1653 \CC~26 adds @std::inplace_vector@ providing an interesting vector--array hybrid. 1654 Like a vector, it lets a user construct elements in a loop, rather than imposing uniform construction. 1655 Yet it preserves \lstinline{std::array}'s ability to be entirely stack-allocated, by avoiding an auxiliary elements' allocation. 1656 However, it preserves the fundamental limit of \lstinline{std::array}, that the length, being a template parameter, must be a static value. 1645 1657 1646 1658 \CC~20's @std::span@ is a view that unifies true arrays, vectors, static sizes and dynamic sizes, under a common API that adds bounds checking. … … 1772 1784 1773 1785 \section{Future Work} 1786 \label{f:FutureWork} 1774 1787 1775 1788 % \subsection{Array Syntax} -
doc/theses/mike_brooks_MMath/programs/hello-accordion.cfa
rf97e7be r8d764d4f 29 29 30 30 forall( [C], [S] ) 31 int getPref( @School( C, S ) & school@, int is, int pref ) { 31 int getPref( @School( C, S ) & school@, 32 int is, int pref ) { 32 33 for ( ic; C ) { 33 if ( pref == @school.preferences@[ic][is] ) return ic; $\C{// offset calculation implicit}$ 34 if ( pref == @school.preferences@[ic][is] ) 35 return ic; // offset calculation implicit 34 36 } 35 37 assert( false ); // must find a match 36 38 } 37 38 39 39 40 40
Note:
See TracChangeset
for help on using the changeset viewer.