Changeset 8d764d4f


Ignore:
Timestamp:
Mar 22, 2026, 9:31:11 PM (4 days ago)
Author:
Peter A. Buhr <pabuhr@…>
Branches:
master
Children:
98da9e8
Parents:
f97e7be
git-author:
Peter A. Buhr <pabuhr@…> (03/22/26 21:27:13)
git-committer:
Peter A. Buhr <pabuhr@…> (03/22/26 21:31:11)
Message:

use subfloat for figure programs, inline footnote

Location:
doc/theses/mike_brooks_MMath
Files:
2 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/mike_brooks_MMath/array.tex

    rf97e7be r8d764d4f  
    88However, a few compatibility-breaking changes to the behaviour of the C array are necessary, both as an implementation convenience and to fix C's lax treatment of arrays.
    99Hence, the @array@ type is an opportunity to start from a clean slate and show a cohesive selection of features, making it unnecessary to deal with every inherited complexity of the C array.
     10Future work is to sugar the @array@ syntax to look like C arrays \see{\VRef{f:FutureWork}}.
    1011
    1112
     
    7980\item Provide a dimension/subscript-checked array-type in the \CFA standard library, where the array's length is statically managed and dynamically valued.
    8081\item Provide argument/parameter passing safety for arrays and subscript safety.
    81 \item Identify the interesting specific abilities available by the new @array@ type.
     82\item Identify the interesting abilities available by the new @array@ type.
    8283\item Where there is a gap concerning this feature's readiness for prime-time, identification of specific workable improvements that are likely to close the gap.
    8384\end{enumerate}
     
    135136Except for the lifetime-management issue of @result@, \ie explicit @free@, this program has eliminated both the syntactic and semantic problems associated with C arrays and their usage.
    136137The result is a significant improvement in safety and usability.
    137 
    138138In summary:
    139139\begin{itemize}[leftmargin=*]
     
    145145The value of an @N@-expression is the acquired length, derived from the usage site, \ie generic declaration or function call.
    146146\item
    147 @array( thing, N0, N1, ... )@ is a multi-dimensional type wrapping $\prod_i N_i$ adjacent occurrences of @thing@-typed objects.
     147@array( T, N0, N1, ... )@ is a multi-dimensional type wrapping $\prod_i N_i$ contiguous @T@-typed objects.
    148148\end{itemize}
    149149
     
    181181
    182182\begin{figure}
     183\setlength{\tabcolsep}{3pt}
    183184\begin{tabular}{@{}ll@{}}
    184185\begin{c++}
     
    189190}
    190191int main() {
    191         const size_t  n = 10;   // must be constant
    192         int ret[n], x[n];
     192        const size_t  n = 10;   $\C[1.9in]{// must be constant}$
     193        int ret[n], x[n];               $\C{// static size}$
    193194        for ( int i = 0; i < n; i += 1 ) x[i] = i;
    194195        @copy<int, n >( ret, x );@
    195         for ( int i = 0; i < n; i += 1 )
    196                 cout << ret[i] << ' ';
     196        for ( int i = 0; i < n; i += 1 ) cout << ret[i] << ' ';
    197197        cout << endl;
    198198}
     
    201201\begin{cfa}
    202202int main() {
    203         @forall( T, [N] )@              // nested function
     203        @forall( T, [N] )@              $\C{// nested function}$
    204204        void copy( array( T, @N@ ) & ret, array( T, @N@ ) & x ) {
    205205                for ( i; N ) ret[i] = x[i];
     
    207207        size_t  n;
    208208        sin | n;
    209         array( int, n ) ret, x;
    210         for ( i; n ) x[i] = i;
    211         @copy( ret,  x );@
    212         for ( i; n )
    213                 sout | ret[i] | nonl;
     209        array( int, n ) ret, x; $\C{// dynamic-fixed size}$
     210        for ( i; n ) x[i] = i;  $\C{// initialize}$
     211        @copy( ret,  x );@              $\C{// alternative ret = x}$
     212        for ( i; n ) sout | ret[i] | nonl; $\C{// print}\CRT$
    214213        sout | nl;
    215214}
     
    260259\end{cfa}
    261260This ability to avoid casting and size arithmetic improves safety and usability over C flexible array members.
     261The example program prints the courses in each student's preferred order, all using the looked-up display names.
     262When a function operates on a @School@ structure, the type system handles its memory layout transparently.
     263In the example, function @getPref@ returns, for the student at position @is@, what is the position of their @pref@\textsuperscript{th}-favoured class.
    262264Finally, inputs and outputs are given on the right for different sized schools.
    263 The example program prints the courses in each student's preferred order, all using the looked-up display names.
    264265
    265266\begin{figure}
    266267\begin{lrbox}{\myboxA}
    267268\begin{tabular}{@{}l@{}}
     269\lstinput{30-38}{hello-accordion.cfa} \\
    268270\lstinput{50-55}{hello-accordion.cfa} \\
    269271\lstinput{90-98}{hello-accordion.cfa}
     
    295297\end{figure}
    296298
    297 When a function operates on a @School@ structure, the type system handles its memory layout transparently.
    298 \lstinput{30-36}{hello-accordion.cfa}
    299 In the example, function @getPref@ returns, for the student at position @is@, what is the position of their @pref@\textsuperscript{th}-favoured class?
    300 
    301 
    302 \section{Dimension Parameter Implementation}
     299
     300\section{Single Dimension Array Implementation}
    303301
    304302The core of the preexisting \CFA compiler already has the ``heavy equipment'' needed to provide the feature set just reviewed (up to bugs in cases not yet exercised).
     
    309307This new dimension type, encoding the array dimension within it, is the first limited \newterm{dependent type}~\cite{DependentType} added to the \CFA type-system and is used at appropriate points during type resolution.
    310308
    311 Furthermore, the @array@ type itself is really ``icing on the cake.''
    312 Presenting its full version is deferred until \VRef[Section]{s:ArrayMdImpl}
    313 (where added complexity needed for multiple dimensions is considered).
    314 But simplifications close enough for the present discussion are:
     309% Furthermore, the @array@ type itself is really ``icing on the cake.''
     310% Presenting its full version is deferred until \VRef[Section]{s:ArrayMdImpl} (where added complexity needed for multiple dimensions is considered).
     311The new array implementation is broken into single and multiple dimensions \see{\VRef[Section]{s:ArrayMdImpl}}.
     312Details of the @array@ macros are sprinkles among the implementation discussion.
     313The single dimension implementation begins with a simple example without @array@ macros.
    315314\begin{cfa}
    316315forall( [N] )
    317316struct array_1d_float {
    318         float items[N];
     317        float items[N];                 $\C[2in]{// monomorphic type}$
    319318};
    320319forall( T, [N] )
    321320struct array_1d_T {
    322         T items[N];
     321        T items[N];                             $\C{// polymorphic type}\CRT$
    323322};
    324323\end{cfa}
     
    428427At the end of the desugaring and downstream processing, the original C idiom of ``pass both a length parameter and a pointer'' has been reconstructed.
    429428In the programmer-written form, only the @thing@ is passed.
    430 The compiler's action produces the more complex form, which if handwritten, would be error-prone.
     429The compiler's action produces the more complex form, which if handwritten, is error-prone.
    431430
    432431At the compiler front end, the parsing changes AST schema extensions and validation rules for enabling the sugared user input.
     
    484483array( float, @n@ ) x;
    485484array( float, @999@ ) * xp = &x; $\C{// static rejection here}$
    486 (*xp)[@500@]; $\C{// runtime check passes}$
     485(*xp)[@500@]; $\C{// runtime check would pass if program compiled}$
    487486\end{cfa}
    488487The way the \CFA array is implemented, the type analysis for this case reduces to a case similar to the earlier C version.
     
    541540extern float g();
    542541void f() {
    543         float x[@n@] = { g() };
     542        float x[@n@] = { @g()@ };                // side-effect
    544543        float (*xp)[@n@] = x;                   // reject
    545544}
     
    594593\item[Potentially Unstable]
    595594        The catch-all category.  Notable examples include:
    596         any function-call result, @float x[foo()]@, the particular function-call result that is a pointer dereference, @void f(const int * n)@ @{ float x[*n]; }@, and any use of a reference-typed variable.
     595        any function-call result, @float x[foo()]@, the particular function-call result that is a pointer dereference, @void f(const int * n)@ @{ float x[*n]; }@, and any use of a reference-type variable.
    597596\end{description}
    598597Within these groups, my \CFA rules are:
     
    756755The conservatism of the new rule set can leave a programmer requiring a recourse, when needing to use a dimension expression whose stability argument is more subtle than current-state analysis.
    757756This recourse is to declare an explicit constant for the dimension value.
    758 Consider these two dimension expressions, whose uses are rejected by the blunt current-state rules:
     757Consider these two dimension expressions, rejected by the blunt current-state rules:
    759758\begin{cfa}
    760759void f( int @&@ nr, @const@ int nv ) {
    761760        float x[@nr@];
    762         float (*xp)[@nr@] = &x;                 // reject: nr varying (no references)
     761        float (*xp)[@nr@] = &x;                 $\C[2.5in]{// reject: nr varying (no references)}$
    763762        float y[@nv + 1@];
    764         float (*yp)[@nv + 1@] = &y;             // reject: ?+? unpredictable (no functions)
     763        float (*yp)[@nv + 1@] = &y;             $\C{// reject: ?+? unpredictable (no functions)}\CRT$
    765764}
    766765\end{cfa}
     
    773772        @const int nx@ = nr;
    774773        float x[nx];
    775         float (*xp)[nx] = & x;                  // accept
     774        float (*xp)[nx] = & x;                  $\C[2.5in]{// accept}$
    776775        @const int ny@ = nv + 1;
    777776        float y[ny];
    778         float (*yp)[ny] = & y;                  // accept
     777        float (*yp)[ny] = & y;                  $\C{// accept}\CRT$
    779778}
    780779\end{cfa}
     
    785784Rather obviously, every array must be subscriptable, even a bizarre one:
    786785\begin{cfa}
    787 array( float, @rand(10)@ ) x;
    788 x[@0@];  // 10% chance of bound-check failure
     786array( float, @rand(10)@ ) x;           $\C[2.5in]{// x[0] => no elements}$
     787x[@0@];                                                         $\C{// 10\% chance of bound-check failure}\CRT$
    789788\end{cfa}
    790789Less obvious is that the mechanism of subscripting is a function call, which must communicate length accurately.
     
    855854The new \CFA standard-library @array@-datatype supports richer multidimensional features than C.
    856855The new array implementation follows C's contiguous approach, \ie @float [r][c]@, with one contiguous object subscripted by coarsely-strided dimensions directly wrapping finely-strided dimensions.
    857 Beyond what C's array type offers, the new array brings direct support for working with a noncontiguous array slice, allowing a program to work with dimension subscripts given in a non-physical order.
     856Beyond what C's array type offers, the new array brings direct support for working with a \emph{noncontiguous} array slice, allowing a program to work with dimension subscripts given in a non-physical order.
    858857
    859858The following examples use the matrix declaration @array( float, 5, 7 ) m@, loaded with values incremented by $0.1$, when stepping across the length-7 finely-strided column dimension, and stepping across the length-5 coarsely-strided row dimension.
     
    870869\lstinput[aboveskip=0pt]{140-140}{hello-md.cfa}
    871870
    872 However, function @print1d@ is asserting too much knowledge about its parameter @r@ for printing either a row slice or a column slice.
     871However, function @print1d_cstyle@ is asserting too much knowledge about its parameter @r@ for printing either a row slice or a column slice.
    873872Specifically, declaring the parameter @r@ with type @array@ means that @r@ is contiguous, which is unnecessarily restrictive.
    874873That is, @r@ need only be of a container type that offers a subscript operator (of type @ptrdiff_t@ $\rightarrow$ @float@) with managed length @N@.
     
    942941Proceeding from @x@ gives the numeric indices as coarse then fine, while proceeding from @x[all]@ gives them fine then coarse.
    943942These two starting expressions, which are the example's only multidimensional subexpressions
    944 (those that received zero numeric indices so far), are illustrated with vertical steps where a \emph{first} numeric index would select.
    945 
    946 The figure's presentation offers an intuition answer to: What is an atomic element of @x[all]@?
    947 From there, @x[all]@ itself is simply a two-dimensional array, in the strict C sense, of these building blocks.
    948 An atom (like the bottommost value, @x[all][3][2]@), is the contained value (in the square box)
    949 and a lie about its size (the left diagonal above it, growing upward).
     943(those that received zero numeric indices so far), are illustrated with vertical steps where a \emph{first} numeric index selects.
     944
     945The figure's presentation offers an intuitive answer to: What is an atomic element of @x[all]@?
     946\begin{enumerate}[leftmargin=*]
     947\item
     948@x[all]@ itself is simply a two-dimensional array, in the strict C sense, of these building blocks.
     949\item
    950950An array of these atoms (like the intermediate @x[all][3]@) is just a contiguous arrangement of them, done according to their size;
    951951call such an array a column.
    952952A column is almost ready to be arranged into a matrix;
    953953it is the \emph{contained value} of the next-level building block, but another lie about size is required.
     954\item
     955An atom (like the bottommost value, @x[all][3][2]@), is the contained value (in the square box) and a lie about its size (the left diagonal above it, growing upward).
     956\end{enumerate}
    954957At first, an atom needs to be arranged as if it is bigger, but now a column needs to be arranged as if it is smaller (the left diagonal above it, shrinking upward).
    955958These lying columns, arranged contiguously according to their size (as announced) form the matrix @x[all]@.
     
    10281031At runtime, the \CFA array is exactly a C array.
    10291032\CFA array subscripting is protected with runtime bound checks.
    1030 The array dependent-typing provides information to the C optimizer, allowing it remove many of the bound checks.
     1033The array dependent-typing provides information to the C optimizer, allowing it to remove many of the bound checks.
    10311034This section provides a demonstration of these effects.
    10321035
    1033 The experiment compares the \CFA array system with the simple-safety system most typically exemplified by Java arrays (\VRef[Section]{JavaCompare}), but also reflected in the \CC pattern where restricted vector usage models a checked array (\VRef[Section]{CppCompare}).
    1034 The essential feature of this simple system is the one-to-one correspondence between array instances and the symbolic bounds on which dynamic checks are based.
    1035 The experiment uses the \CC version to simplify access to generated assembly code.
    1036 While ``\CC'' labels a participant, it is really the simple-safety system (of @vector@ with @.at@) whose limitations are being explained, and not limitations of \CC optimization.
    1037 
    1038 As a control case, a simple loop (with no reused dimension sizes) is seen to get the same optimization treatment in both the \CFA and \CC versions.
    1039 Firstly, when the programmer treats the array's bound correctly (making the subscript ``obviously fine''), no dynamic bound check is observed in the program's optimized assembly code.
    1040 But when the bounds are adjusted, such that the subscript is possibly invalid, the bound check appears in the optimized assembly, ready to catch a mistake.
     1036The experiment compares the \CFA array system with the simple-safety system most typically exemplified by Java arrays (\VRef[Section]{JavaCompare}), and reflected in \CC vector using member @.at@ for subscripting (\VRef[Section]{CppCompare}).
     1037The essential feature is the one-to-one correspondence between array instances and the symbolic bounds on which dynamic checks are based.
     1038Hence, it is the implicit \CFA checked subscripting and explicit \CC @vector@ @.at@ mechanisms being explained and not limitations of \CC optimization.
     1039
     1040As a control case, a simple loop receives the same optimization treatment in both the \CFA and \CC versions.
     1041When the surrounding code makes it clear subscripting is always in bound, no dynamic bound check is observed in the program's optimized assembly code.
     1042But when the environment is adjusted, such that the subscript is possibly invalid, the bound check appears in the optimized assembly, ready to catch a mistake.
    10411043
    10421044\VRef[Figure]{f:ovhd-ctl} gives a control-case example of summing values in an array, where each column shows the program in languages C (a,~d,~g), \CFA (b,~e,~h), and \CC (c,~f,~i).
     
    10991101
    11001102When dimension sizes get reused, \CFA has an advantage over \CC-vector at getting simply written programs well optimized.
    1101 The example case of \VRef[Figures]{f:ovhd-treat-src} and \VRef[]{f:ovhd-treat-asm} is a simple matrix multiplication over a row-major encoding.
     1103\VRef[Figures]{f:ovhd-treat-src} shows a simple matrix multiplication over a row-major encoding.
    11021104Simple means coding directly to the intuition of the mathematical definition without trying to optimize for memory layout.
    11031105In the assembly code of \VRef[Figure]{f:ovhd-treat-asm}, the looping pattern of \VRef[Figure]{f:ovhd-ctl} (d, e, f), ``Skip ahead on zero; loop back for next,'' occurs with three nesting levels.
     
    11451147Or more technically, guaranteeing @N@ as the basis for the imposed bound check \emph{of every row.}
    11461148As well, the \CC type does not impose the mathematically familiar constraint of $(M \times P) (P \times N) \rightarrow (M \times N)$.
    1147 Even assuming away the first issue, \eg that in @lhs@, all minor/cell counts agree, the data format allows the @rhs@ major/row count to disagree.
     1149Even assuming away the first issue, \eg that in @lhs@ all minor/cell counts agree, the data format allows the @rhs@ major/row count to disagree.
    11481150Or, the possibility that the row count @res.size()@ disagrees with the row count @lhs.size()@ illustrates this bound-mistake type in isolation.
    11491151The \CFA solution guards against this possibility by keeping length information separate from the array data, and therefore eligible for sharing.
     
    13801382\begin{cfa}
    13811383thread worker { int id; };
    1382 void ?{}( worker & ) = void; // remove default constructor
     1384void ?{}( worker & ) = void; $\C[2.5in]{// remove default constructor}$
    13831385void ?{}( worker &, int id );
    1384 array( worker, 5 ) ws = @{}@; // rejected; but desire is for no initialization yet
    1385 for ( i; 5 ) (ws[i]){ @i@ }; // explicitly initialize each thread, giving id
     1386array( worker, 5 ) ws = @{}@; $\C{// rejected; but desire is for no initialization yet}$
     1387for ( i; 5 ) (ws[i]){ @i@ }; $\C{// explicit (placement) initialization for each thread, giving id}\CRT$
    13861388\end{cfa}
    13871389Note the use of the \CFA explicit constructor call, analogous to \CC's placement-@new@.
     
    13931395Therefore, there is a conflict between the implicit actions of the builtin @thread@ type and a user's desire to defer these actions.
    13941396
    1395 Another \CFA feature for providing C backwards compatibility, at first seem viable for initializing the array @ws@, though on closer inspection is not.
     1397Another \CFA feature for providing C backwards compatibility, at first seems viable for initializing the array @ws@, though on closer inspection is not.
    13961398C initialization without a constructor is specified with \lstinline|@=|, \eg \lstinline|array(worker, 5) ws @= {}| ignores all \CFA lifecycle management and uses C empty initialization.
    13971399This option does achieve the desired semantics on the construction side.
     
    14241426let fa: [f64; 5] = [1.2, fval, 5.6, 7.3, 9.1];  $\C{// immutable, individual initialization}$
    14251427\end{rust}
    1426 Initialization is required even if the array is subsequently initialized.
     1428Initialization is required even if an array is subsequently initialized.
    14271429Rust arrays are not VLAs, but the compile-time length can be queried using member @len()@.
    14281430Arrays can be assigned and passed to parameters by value or reference, if and only if, the type and dimension match, meaning a different function is needed for each array size.
     
    15481550
    15491551\begin{figure}
    1550 \setlength{\tabcolsep}{15pt}
    1551 \begin{tabular}{ll@{}}
     1552\centering
     1553\begin{lrbox}{\myboxA}
    15521554\begin{java}
    15531555int m[][] = {  // triangular matrix
     
    15791581}
    15801582\end{java}
    1581 &
     1583\end{lrbox}
     1584
     1585\begin{lrbox}{\myboxB}
    15821586\begin{cfa}
    15831587int * m[5] = {  // triangular matrix
     
    16091613}
    16101614\end{cfa}
    1611 \end{tabular}
    1612 \caption{Java (left) \vs C (right) Triangular Matrix}
     1615\end{lrbox}
     1616
     1617\subfloat[Java]{\label{f:JavaTriangularMatrix}\usebox\myboxA}
     1618\hspace{6pt}
     1619\vrule
     1620\hspace{6pt}
     1621\subfloat[C]{\label{f:CTriangularMatrix}\usebox\myboxB}
     1622\hspace{6pt}
     1623
     1624\caption{Triangular Matrix}
    16131625\label{f:JavaVsCTriangularMatrix}
    16141626\end{figure}
     
    16391651But none makes an allocation with a dynamically given fixed size less awkward than the vector arrangement just described.
    16401652
    1641 \CC~26 adds @std::inplace_vector@, which provides an interesting vector--array hybrid,\footnote{
    1642   Like a vector, it lets a user construct elements in a loop, rather than imposing uniform construction.
    1643   Yet it preserves \lstinline{std::array}'s ability to be entirely stack-allocated, by avoiding an auxiliary elements' allocation.
    1644 } but does not change the fundamental limit of \lstinline{std::array}, that the length, being a template parameter, must be a static value.
     1653\CC~26 adds @std::inplace_vector@ providing an interesting vector--array hybrid.
     1654Like a vector, it lets a user construct elements in a loop, rather than imposing uniform construction.
     1655Yet it preserves \lstinline{std::array}'s ability to be entirely stack-allocated, by avoiding an auxiliary elements' allocation.
     1656However, it preserves the fundamental limit of \lstinline{std::array}, that the length, being a template parameter, must be a static value.
    16451657
    16461658\CC~20's @std::span@ is a view that unifies true arrays, vectors, static sizes and dynamic sizes, under a common API that adds bounds checking.
     
    17721784
    17731785\section{Future Work}
     1786\label{f:FutureWork}
    17741787
    17751788% \subsection{Array Syntax}
  • doc/theses/mike_brooks_MMath/programs/hello-accordion.cfa

    rf97e7be r8d764d4f  
    2929
    3030forall( [C], [S] )
    31 int getPref( @School( C, S ) & school@, int is, int pref ) {
     31int getPref( @School( C, S ) & school@,
     32                                 int is, int pref ) {
    3233        for ( ic; C ) {
    33                 if ( pref == @school.preferences@[ic][is] ) return ic; $\C{// offset calculation implicit}$
     34                if ( pref == @school.preferences@[ic][is] )
     35                        return ic; // offset calculation implicit
    3436        }
    3537        assert( false );        // must find a match
    3638}
    37 
    38 
    3939
    4040
Note: See TracChangeset for help on using the changeset viewer.