Context Navigation

← Previous Changeset
Next Changeset →

Changeset 1037b4c

Timestamp:

Apr 27, 2026, 2:03:17 AM (2 days ago)

Author:

Michael Brooks <mlbrooks@…>

Branches:

Children:

Parents:

Message:

proof-read array chapter

Location:

doc/theses/mike_brooks_MMath

Files:

: 3 edited

array.tex (modified) (25 diffs)
programs/hello-accordion.cfa (modified) (1 diff)
programs/hello-md.cfa (modified) (1 diff)

Legend:

: Unmodified
: Added
: Removed

doc/theses/mike_brooks_MMath/array.tex

-              r037dc57
+              r1037b4c
 \end{cfa}
 Here, the arguments to the @array@ type are @float@ (element type) and @99@ (dimension).
 When this type is used as a function parameter, the type-system requires the argument is a perfect match.
+When this type is used as a function parameter, the type system requires the argument to be a perfect match.
 \begin{cfa}
 void f( @array( float, 42 )@ & p ) {}   $\C{// p accepts 42 floats}$
 …
+}
 \end{cfa}
 Both of the stack declared array variables, @x1@ and @x2@, have 42 elements, each element being a @float@.
+Both of the variables, @x1@ and @x2@, name 42 contiguous, stack-allocated @float@ values.
 The two variables have identical size and layout, with no additional ``bookkeeping'' allocations or headers.
 The C array, @x1@, has no subscript checking, while \CFA array, @x2@, does.
 Providing this explicit generic approach requires a significant extension to the \CFA type system to support a full-feature, safe, efficient (space and time) array-type, which forms the foundation for more complex array forms in \CFA.
+This explicit, generic approach requires a significant extension to the \CFA type system to support a full-feature, safe, efficient (space and time) array type, which forms the foundation for more complex array forms in \CFA.
 In all following discussion, ``C array'' means types like @x1@ and ``\CFA array'' means types like @x2@.
 A future goal is to provide the new @array@ features with syntax approaching C's (declaration style of @x1@).
 Then, the library @array@ type could be removed, giving \CFA a largely uniform array type.
 At present, the C-syntax @array@ is only partially supported, so the generic @array@ is used exclusively in the thesis.
+At present, new-array features are only partially supported through C-style syntax, so the generic @array@ is used exclusively in the thesis.
 My contributions in this chapter are:
 \begin{enumerate}[leftmargin=*]
 \item A type system enhancement that lets polymorphic functions and generic types be parameterized by a numeric value: @forall( [N] )@.
 \item Provide a dimension/subscript-checked array-type in the \CFA standard library, where the array's length is statically managed and dynamically valued.
+\item Provide a dimension/subscript-checked array type in the \CFA standard library, where the array's length is statically managed and dynamically valued.
 \item Provide argument/parameter passing safety for arrays and subscript safety.
 \item Identify the interesting abilities available by the new @array@ type.
-\item Where there is a gap concerning this feature's readiness for prime-time, identification of specific workable improvements that are likely to close the gap.
 \end{enumerate}
 …
 \begin{figure}
+\begin{cfa}
+forall( [N] ) array( bool, N ) & f( array( float, N ) &, array( float, N ) & );
+\end{cfa}
 \lstinput{30-43}{hello-array.cfa}
 \lstinput{45-48}{hello-array.cfa}
 \caption{\lstinline{f} Example}
+\caption{Example of calling a dimension-generic function.}
 \label{f:fExample}
 \end{figure}
 …
 \item
 The value of an @N@-expression is the acquired length, derived from the usage site, \ie generic declaration or function call.
+\begin{comment} % was not discussed yet
 \item
 @array( T, N0, N1, ... )@ is a multi-dimensional type wrapping $\prod_i N_i$ contiguous @T@-typed objects.
+\end{comment}
 \end{itemize}
+\VRef[Figure]{f:TemplateVsGenericType} shows @N@ is not the same as a @size_t@ declaration in a \CC \lstinline[language=C++]{template}.\footnote{
+The \CFA program requires a snapshot declaration for \lstinline{n} to compile, as described at the end of \Vref{s:ArrayTypingC}.}
+\VRef[Figure]{f:TemplateVsGenericType} shows @N@ is not the same as a @size_t@ declaration in a \CC \lstinline[language=C++]{template}.  It illustrates these differences:
 \begin{enumerate}[leftmargin=*]
 \item
 …
 \item
 \label{p:DimensionPassing}
 The \CC template @N@ must be passed explicitly at the call, unless @N@ has a default value, even when \CC can deduct the type of @T@.
+The \CC template @N@ must be passed explicitly at the call, unless @N@ has a default value, even when \CC can deduce the type of @T@.
 The \CFA @N@ is part of the array type and passed implicitly at the call.
 % fixed by comparing to std::array
 % mycode/arrr/thesis-examples/check-peter/cs-cpp.cpp, v2
+\begin{comment} % does not appear in the example; I think we've found CFA suffering a similar "belief"
 \item
 \CC cannot have an array of references, but can have an array of @const@ pointers.
 …
 In the \CC example, the arrays fall back on C arrays, which have a duality with references with respect to automatic dereferencing.
 The \CFA array is a contiguous object with an address, which can be stored as a reference or pointer.
+\end{comment}
 % not really about forall-N vs template-N
 % any better CFA support is how Rob left references, not what Mike did to arrays
 …
         for ( int i = 0; i < N; i += 1 ) ret[i] = x[i];
+}
 int main() {
         const size_t  n = 10;   $\C[1.9in]{// must be constant}$
         int ret[n], x[n];               $\C{// static size}$
+void demo() {
+        const size_t  n = 10;
+        int ret[n], x[n];               $\C[1.9in]{// static size}$
         for ( int i = 0; i < n; i += 1 ) x[i] = i;
         @copy<int, n >( ret, x );@
         for ( int i = 0; i < n; i += 1 ) cout << ret[i] << ' ';
         cout << endl;
+}
+int main() {
+        demo();
+}
 \end{c++}
 …
                 for ( i; N ) ret[i] = x[i];
+        }
+        size_t  n;
+        sin | n;
+        array( int, n ) ret, x; $\C{// dynamic-fixed size}$
+        for ( i; n ) x[i] = i;  $\C{// initialize}$
+        @copy( ret,  x );@              $\C{// alternative ret = x}$
+        for ( i; n ) sout | ret[i] | nonl; $\C{// print}\CRT$
+        sout | nl;
+        void demo( const size_t n ) {
+                array( int, n ) ret, x; $\C{// dynamic-fixed size}$
+                for ( i; n ) x[i] = i;  $\C{// initialize}$
+                @copy( ret,  x );@              $\C{// alternative ret = x}$
+                for ( i; n ) sout | ret[i] | nonl;
+                sout | nl;
+        }
+        size_t  n;   sin | n;   $\C{// obtain size}\CRT$
+        demo( n );
+}
 \end{cfa}
 …
 \CFA allows length-parameterized array members to be nested at arbitrary locations, with intervening member declarations.
 \lstinput{10-15}{hello-accordion.cfa}
 The structure has course- and student-level metatdata (their respective field names) and a position-based preferences' matrix.
 Its layout has the starting offset of @studentIds@ varying according to the generic parameter @C@, and the offset of @preferences@ varying according to both generic parameters.
+The structure has course- and student-level metadata (their respective field names) and a position-based preferences' matrix.
+The structure's layout is dynamic; the starting offset of @student_ids@ varies according to the generic parameter @C@; the offset of @preferences@ varies by both.
 \VRef[Figure]{f:checkExample} shows a program main using @School@ and results with different array sizes.
 …
 \section{Single Dimension Array Implementation}
+The core of the preexisting \CFA compiler already has the ``heavy equipment'' needed to provide the feature set just reviewed (up to bugs in cases not yet exercised).
+\label{s:1d-impl}
+The core of the preexisting \CFA compiler already has the ``heavy equipment'' needed to provide the feature set just reviewed.
 To apply this equipment in tracking array lengths, I encoded a dimension (array's length) as a type.
 The type in question does not describe any data that the program actually uses at runtime.
 …
 % Furthermore, the @array@ type itself is really ``icing on the cake.''
 % Presenting its full version is deferred until \VRef[Section]{s:ArrayMdImpl} (where added complexity needed for multiple dimensions is considered).
+The new array implementation is broken into single and multiple dimensions \see{\VRef[Section]{s:ArrayMdImpl}}.
+Details of the @array@ macros are sprinkles among the implementation discussion.
+The single dimension implementation begins with a simple example without @array@ macros.
+Discussion of the new array implementation is broken into single- and multi-dimensional features.
+The multidimensional feature set of \VRef[Section]{s:ArrayMdImpl} is a toolkit for managing repeated application of the single-dimensional features.
+Details of the @array@ type are introduced as needed.
+The single-dimension discussions begins with a simple example stripped of the @array@ type.
 \begin{cfa}
 forall( [N] )
 struct array_1d_float {
         float items[N];                 $\C[2in]{// monomorphic type}$
+        float items[N];                 $\C[2in]{// float elements }$
 };
 forall( T, [N] )
 struct array_1d_T {
         T items[N];                             $\C{// polymorphic type}\CRT$
+struct array_1d {
+        T items[N];                             $\C{// any-typed elements}\CRT$
 };
 \end{cfa}
 These two structure patterns, plus a subscript operator, is all that @array@ provides.
 My main work is letting a programmer define such a structure (one whose type is parameterized by @[N]@) and functions that operate on it (these being similarly parameterized).
+My main work is letting a programmer define such a structure (one whose type is parameterized by @[N]@) and define functions that operate on it (these being parameterized similarly).
 The repurposed heavy equipment is
 \begin{itemize}[leftmargin=*]
 \item
         Resolver provided values for a declaration's type-system variables, gathered from type information in scope at the usage site.
 \item
         The box pass, encoding information about type parameters into ``extra'' regular parameters and arguments on declarations and calls.
+        The ``resolver'' compiler pass, which provides argument values for a declaration's type-system parameters, gathered from type information in scope at the usage site.
+\item
+        The ``box'' compiler pass, which encodes information about type parameters into ``extra'' regular parameters on declarations and and arguments on calls.
         Notably, it conveys the size of a type @foo@ as a @__sizeof_foo@ parameter, and rewrites the @sizeof(foo)@ expression as @__sizeof_foo@, \ie a use of the parameter.
 \end{itemize}
 …
 This work is detailed in \VRef[Section]{s:ArrayTypingC}.
 However, the resolution--boxing scheme, in its preexisting state, is equipped to work on (desugared) dimension parameters.
 The following discussion explains the desugaring and how correctly lowered code results.
+The following discussion explains the desugaring and why its lowered code is correct.
 A simpler structure, and a toy function on it, demonstrate what is needed for the encoding.
 …
 The compiler's action produces the more complex form, which if handwritten, is error-prone.
 At the compiler front end, the parsing changes AST schema extensions and validation rules for enabling the sugared user input.
+At the \CFA compiler's front end, changes to the parser are AST schema extensions and validation rules for enabling the sugared user input.
 \begin{itemize}[leftmargin=*]
 \item
         Recognize the form @[N]@ as a type-variable declaration within a @forall@.
 \item
         Have the new brand of type-variable, \emph{Dimension}, in the AST form of a type-variable, to represent one parsed from @[-]@.
+        Recognize the syntax @forall( ... [N] ... )@ as a type-variable declaration.
+\item
+        Where the AST's representation of a type variable encodes its \newterm{brand}, previously including full-object types @T@ and opaque datatypes @T&@, define new brand, \newterm{Dimension}.  Apply this brand to the new syntax.
 \item
         Allow a type variable to occur in an expression.  Validate (after parsing) that only dimension-branded type-variables are used here.
 …
 \end{itemize}
 \VRef[Figure]{f:DimexprRuleCompare} gives a case-by-case comparison of the consequences of these rule sets.
 It demonstrates that the \CFA false alarms occur in the same cases as C treats unsafe.
 It also shows that C-incompatibilities only occur in cases that C treats unsafe.
+It demonstrates that the \CFA false alarms occur in the same cases as C treats unsafely.
+It also shows that C-incompatibilities only occur in cases that C treats unsafely.
 \begin{figure}
 …
 \end{comment}
+These rules being conservative is the reason that \VRef{s:1d-impl} calls the \CFA array type only \emph{limited} dependent.
+In programming languages featuring general dependent typing, considerably more effort is invested in being able to conclude that two type-parameterizing expressions always compute the same value.
+While conservatism is inescapable, these languages use good approximations of eventual runtime behaviour to enable conclusions about more dynamic behaviours, such as a program never popping from an empty stack.
 The conservatism of the new rule set can leave a programmer requiring a recourse, when needing to use a dimension expression whose stability argument is more subtle than current-state analysis.
 This recourse is to declare an explicit constant for the dimension value.
 …
 The @nr@ reference is never written, no implicit code (load) between declarations, and control does not leave the function between the uses.
 As well, the build-in @?+?@ function is predictable.
+To make these cases work, the programmer must add the follow constant declarations (cast does not work):
+To make these cases work, the programmer adds these constant declarations:
 \begin{cfa}
 void f( int & nr, const int nv ) {
 …
 The result is the originally intended semantics,
 achieved by adding a superfluous ``snapshot it as of now'' directive.
+Note that casting does not help with this issue, because it does not express information about values at different points in time.
 The snapshot trick is also used by the \CFA translation, though to achieve a different outcome.
 …
 \end{cfa}
 Dimension hoisting already existed in the \CFA compiler.
 However, it was buggy, particularly with determining, ``Can hoisting the expression be skipped here?'', for skipping this hoisting is clearly desirable in some cases.
 For example, when a programmer has already hoisted to perform an optimization to prelude duplicate code (expression) and/or expression evaluation.
+However, it was buggy, particularly with determining, ``Can hoisting the expression be skipped here?'' for skipping this hoisting is clearly desirable in some cases.
+For example, when a programmer has already done manual hoisting.
 In the new implementation, these cases are correct, harmonized with the accept/reject criteria.
 …
 Fixed-stride memory:
 this model binds the first subscript and the first memory layout dimension, offering the ability to slice by (provide an index for) only the coarsest dimension, @m[row][*]@ or @c[plane][*][*]@, \eg slice only by row (2D) or plane (3D).
-After which, subscripting and memory layout are independent.
 \item
 Explicit-displacement memory:
 this model has no awareness of dimensions just the ability to access memory at a distance from a reference point (base-displacement addressing), \eg @x + 23@ or @x[23}@ $\Rightarrow$ 23rd element from the start of @x@.
+this model has no awareness of dimensions just the ability to access memory at a distance from a reference point (base-displacement addressing), \eg @x + 23@ or @x[23]@ $\Rightarrow$ 23rd element after the start of @x@.
 A programmer must manually build any notion of dimensions using other tools;
 hence, this style is not offering multidimensional arrays \see{\VRef[Figure]{f:FixedVariable} right example}.
 \end{enumerate}
 There is some debate as to whether the abstraction point ordering above goes $\{1, 2\} < 3$, rather than my numerically-ordering.
+There is some debate as to whether the abstraction point ordering above goes $\{1, 2\} < 3$, rather than my numeric ordering.
 That is, styles 1 and 2 are at the same abstraction level, with 3 offering a limited set of functionality.
 I chose to build the \CFA style-1 array upon a style-2 abstraction.
+(Justification of the decision follows, after the description of the design.)
+Justification of the decision follows, after the description of the design.
 Style 3 is the inevitable target of any array implementation.
 …
 Casting a multidimensional array as a single-dimensional array/pointer, then using @x[i]@ syntax to access its elements, is still a form of pointer arithmetic.
 To step into the implementation of \CFA's new type-1 multidimensional arrays in terms of C's existing type-2 multidimensional arrays, it helps to clarify that the interface is low-level, \ie a C/\CFA array interface includes the resulting memory layout.
 Specifically, the defining requirement of a type-2 system is the ability to slice a column from a column-finest matrix.
+To step into the implementation of \CFA's new type-1 multidimensional arrays in terms of C's existing type-2 multidimensional arrays, it helps to clarify that the interface is low-level, \ie a C/\CFA array interface includes memory-layout detail.
+Specifically, the defining requirement of a type-1 system is the ability to slice a column from a column-finest matrix.
 Hence, the required memory shape of such a slice is fixed, before any discussion of implementation.
 The implementation presented here is how the \CFA array-library wrangles the C type system to make it do memory steps that are consistent with this layout while not affecting legacy C programs.
 …
 Beyond what C's array type offers, the new array brings direct support for working with a \emph{noncontiguous} array slice, allowing a program to work with dimension subscripts given in a non-physical order.
 The following examples use the matrix declaration @array( float, 5, 7 ) m@, loaded with values incremented by $0.1$, when stepping across the length-7 finely-strided column dimension, and stepping across the length-5 coarsely-strided row dimension.
+The following examples use the matrix declaration @array( float, 5, 7 ) m@, loaded with values incremented by $0.1$, when stepping across the length-7 finely-strided column dimension, and incremented by $1$, when stepping across the length-5 coarsely-strided row dimension.
 \par
 \mbox{\lstinput{121-126}{hello-md.cfa}}
 …
 \lstinput{150-151}{hello-md.cfa}
 The example shows @x[2]@ and @x[[2, all]]@ both refer to the same, ``2.*'' slice.
 Indeed, the various @print1d@ calls under discussion access the entry with value @2.3@ as @x[2][3]@, @x[[2,all]][3]@, and @x[[all,3]][2]@.
 This design preserves (and extends) C array semantics by defining @x[[i,j]]@ to be @x[i][j]@ for numeric subscripts, but also for ``subscripting by all''.
+The example shows @x[2]@ and @x[2, all]@ both refer to the same, ``2.*'' slice.
+Indeed, the various @print1d@ calls under discussion access the entry with value @2.3@ as @x[2][3]@, @x[2,all][3]@, and @x[all,3][2]@.
+This design preserves (and extends) C array semantics by defining @x[i,j]@ to be @x[i][j]@ for numeric subscripts, but also for ``subscripting by all''.
 That is:
 \begin{cquote}
 \begin{tabular}{@{}cccccl@{}}
 @x[[2,all]][3]@ & $\equiv$      & @x[2][all][3]@  & $\equiv$    & @x[2][3]@  & (here, @all@ is redundant)  \\
 @x[[all,3]][2]@ & $\equiv$      & @x[all][3][2]@  & $\equiv$    & @x[2][3]@  & (here, @all@ is effective)
+@x[2,all][3]@   & $\equiv$      & @x[2][all][3]@  & $\equiv$    & @x[2][3]@  & (here, @all@ is redundant)  \\
+@x[all,3][2]@   & $\equiv$      & @x[all][3][2]@  & $\equiv$    & @x[2][3]@  & (here, @all@ is effective)
 \end{tabular}
 \end{cquote}
 …
 The semantics of @-[i]@ is to dequeue from the front of the ``want subscripts'' list and lock its value to be @i@.
 Contiguous arrays, and slices of them, are all represented by the same underlying parameterized type, which includes stride information in its metatdata.
+Contiguous arrays, and slices of them, are all represented by the same underlying parameterized type, which includes stride information in its metadata.
 The \lstinline{-[all]} operation takes subscripts, \eg @x[2][all]@, @x[all][3]@, \etc, and converts (transposes) from the base reference @x[all]@ to a specific reference of the appropriate form.
 The running example's @all@-effective step, stated more concretely, is:
 …
 \end{figure}
 \VRef[Figure]{fig:subscr-all} shows one element (in the white band) accessed two different ways: as @x[2][3]@ and as @x[all][3][2]@.
+\VRef[Figure]{fig:subscr-all} shows one element (in the shaded band) accessed two different ways: as @x[2][3]@ and as @x[all][3][2]@.
 In both cases, subscript 2 selects from the coarser dimension (rows of @x@),
 while subscript 3 selects from the finer dimension (columns of @x@).
 …
 Java arrays are references, so multi-dimension arrays are arrays-of-arrays \see{\VRef{toc:mdimpl}}.
 For each array, Java implicitly storages the array dimension in a descriptor, supporting array length, subscript checking, and allowing dynamically-sized array-parameter declarations.
+For each array, Java implicitly stores the array dimension in a descriptor, supporting array length, subscript checking, and allowing dynamically-sized array-parameter declarations.
 \begin{cquote}
 \begin{tabular}{rl}

doc/theses/mike_brooks_MMath/programs/hello-accordion.cfa

r037dc57	r1037b4c
12	12	@array@( string, C ) course_codes; $\C{// nested VLAs}$
13	13	@array@( string, S ) student_ids;
14		@array@( int, C, S ) preferences; $\C{// m~~ultidimensional~~}$
	14	@array@( int, C, S ) preferences; $\C{// matrix, C x S}$
15	15	};
16	16

doc/theses/mike_brooks_MMath/programs/hello-md.cfa

-              r037dc57
+              r1037b4c
 fill( m );
 /*
 r/c   0     1     2     3     4     5     6
   0.0  0.1  0.2  @0.3@  0.4  0.5  0.6
   1.0  1.1  1.2  @1.3@  1.4  1.5  1.6
   @2.0  2.1  2.2  2.3  2.4  2.5  2.6@
   3.0  3.1  3.2  @3.3@  3.4  3.5  3.6
   4.0  4.1  4.2  @4.3@  4.4  4.5  4.6
+r\c     0      1      2      3      4      5      6
+    0.0   0.1   0.2   @0.3@   0.4   0.5   0.6
+    1.0   1.1   1.2   @1.3@   1.4   1.5   1.6
+    @2.0   2.1   2.2   2.3   2.4   2.5   2.6@
+    3.0   3.1   3.2   @3.3@   3.4   3.5   3.6
+    4.0   4.1   4.2   @4.3@   4.4   4.5   4.6
 */

Note: See TracChangeset for help on using the changeset viewer.

Download in other formats: