Ignore:
Timestamp:
Oct 31, 2024, 3:20:05 PM (3 months ago)
Author:
Michael Brooks <mlbrooks@…>
Branches:
master
Children:
ad9f593
Parents:
292d6cf6
Message:

Thesis, array: Add section Dimension Parameter Implementation

File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/mike_brooks_MMath/array.tex

    r292d6cf6 rb7921d8  
    317317\label{f:checkHarness}
    318318\end{figure}
     319
     320
     321\section{Dimension Parameter Implementation}
     322
     323The core of the preexisting \CFA compiler already had the ``heavy equipment'' needed
     324to provide the feature set just reviewed (up to bugs in cases not yet exercised).
     325To apply this equipment in tracking array lengths, I encoded a dimension (array's length) as a type.
     326The type in question does not describe any data that the program actually uses at runtime.
     327It simply carries information through intermediate stages of \CFA-to-C lowering.
     328And this type takes a form such that, once \emph{it} gets lowered, the result does the right thing.
     329
     330Furthermore, the @array@ type itself is really ``icing on the cake.''
     331Presenting its full version is deferred until \VRef[Section]{s:ArrayMdImpl}
     332(where added complexity needed for multiple dimensions is considered).
     333But simplifications close enough for the present discussion are:
     334\begin{cfa}
     335        forall( [N] )
     336        struct array_1d_float {
     337                float items[N];
     338        };
     339        forall( T, [N] )
     340        struct array_1d {
     341                T items[N];
     342        };
     343\end{cfa}
     344This structure pattern, plus a subscript operator, is all that @array@ provides.
     345
     346My main work is letting a programmer define
     347such a structre (one whose type is parameterized by @[N]@)
     348and functions that operate on it (these being similarly parameterized).
     349
     350The repurposed heavy equipment is
     351\begin{itemize}
     352\item
     353        The resolver, providing values for a used declaration's type-system variables,
     354        gathered from type information in scope at the usage site.
     355
     356\item
     357        The box pass, encoding information about type parameters
     358        into ``extra'' regular parameters/arguments on declarations and calls.
     359        Notably, it conveys the size of a type @foo@ as a @__sizeof_foo@ parameter,
     360        and rewrites the @sizeof(foo)@ expression as @__sizeof_foo@, i.e. a use of the parameter.
     361\end{itemize}
     362
     363The rules for resolution had to be restricted slightly, in order to achieve important refusal cases.
     364This work is detailed in \VRef[Section]{s:ArrayTypingC}.
     365However, the resolution--boxing scheme, in its preexisting state, was already equipped to work on (desugared) dimension parameters.
     366The discussion following explains the desugaring and how correct lowered code results.
     367
     368An even simpler structure, and a toy function on it, demonstrate what's needed for the encoding.
     369\begin{cfa}
     370        forall( [@N@] ) { // [1]
     371                struct thing {};
     372                void f( thing(@N@) ) { sout | @N@; } // [2], [3]
     373        }
     374        int main() {
     375                thing( @10@ ) x;  f(x);  // prints 10, [4]
     376                thing( 100 ) y;  f(y);  // prints 100
     377                return 0;
     378        }
     379\end{cfa}
     380This example has:
     381\begin{enumerate}
     382\item
     383        The symbol @N@ being declared as a type variable (a variable of the type system).
     384\item
     385        The symbol @N@ being used to parameterize a type.
     386\item
     387        The symbol @N@ being used as an expression (value).
     388\item
     389        A value like 10 being used as an argument to the parameter @N@.
     390\end{enumerate}
     391The chosen solution being to encode the value @N@ \emph{as a type}, items 1 and 2 are immediately available for free.
     392Item 3 needs a way to recover the encoded value from a (valid) type (and to reject invalid types occurring here).
     393Item 4 needs a way to produce a type that encodes the given value.
     394
     395Because the box pass handles a type's size as its main datum, the encoding is chosen to use it.
     396The production and recovery are then straightforward.
     397\begin{itemize}
     398\item
     399        The value $n$ is encoded as a type whose size is $n$.
     400\item
     401        Given a dimension expression $e$, produce type @char[@$e$@]@ to represent it.
     402        If $e$ evaluates to $n$ then the endoded type has size $n$.
     403\item
     404        Given a type $T$ (produced by these rules), recover the value that it represents with the expression @sizeof(@$T$@)@.
     405        If $T$ has size $n$ then the recovery expression evaluates to $n$.
     406\end{itemize}
     407
     408This desugaring is applied in a translation step before the resolver.
     409The ``validate'' pass hosts it, along with several other canonicalizing and desugaring transformations (the pass's name notwithstanding).
     410The running example is lowered to:
     411\begin{cfa}
     412        forall( @N*@ ) { // [1]
     413                struct thing {};
     414                void f( thing(@N@) ) { sout | @sizeof(N)@; } // [2], [3]
     415        }
     416        int main() {
     417                thing( char[@10@] ) x;  f(x);  // prints 10, [4]
     418                thing( char[100] ) y;  f(y);  // prints 100
     419                return 0;
     420        }
     421\end{cfa}
     422Observe:
     423\begin{enumerate}
     424\item
     425        @N@ is now declared to be a type.
     426        It is declared to be \emph{sized} (by the @*@), meaning that the box pass shall do its @sizeof(N)@--@__sizeof_N@ extra parameter and expression translation.
     427\item
     428        @thing(N)@ is a type; the argument to the generic @thing@ is a type (type variable).
     429\item
     430        The @sout...@ expression (being an application of the @?|?@ operator) has a second argument that is an ordinary expression.
     431\item
     432        The type of variable @x@ is another @thing(-)@ type;  the argument to the generic @thing@ is a type (array type).
     433\end{enumerate}
     434
     435From this point, preexisting \CFA compilation takes over lowering it the rest of the way to C.
     436Inspecting the result shows what the above translation achieves.
     437A form that shows only the relevant changes of the box pass (as informed by the resolver), leaving the rest unadulterated, is:
     438\begin{cfa}
     439        // [1]
     440        void f( size_t __sizeof_N, @void *@ ) { sout | @__sizeof_N@; } // [2], [3]
     441        int main() {
     442                struct __conc_thing_10 {} x;  f(@10@, &x);  // prints 10, [4]
     443                struct __conc_thing_100 {} y;  f(@100@, &y);  // prints 100
     444                return 0;
     445        }
     446\end{cfa}
     447Observe:
     448\begin{enumerate}
     449\item
     450        The type parameter @N@ is gone.
     451\item
     452        The type @thing(N)@ is (replaced by @void *@, but thereby effectively) gone.
     453\item
     454        The @sout...@ expression (being an application of the @?|?@ operator) has a second argument that is a regular variable (parameter) usage.
     455\item
     456        Information about the particular @thing@ instantiation (value 10) has moved, from the type, to a regular function-call argument.
     457\end{enumerate}
     458At the end of the desugaring and downstream processing, the original C idiom of ``pass both a pointer and a length parameter'' has been reconstructed.
     459In the programmer-written form, only the thing is passed.
     460The compiler's action produces the more complex form, which if handwritten, would be error-prone.
     461
     462Back at the very front end, the parsing changes, AST schema extensions, and validation rules for enabling the sugared user input are:
     463\begin{itemize}
     464\item
     465        Recognize the form @[N]@ as a type-variable declaration within a @forall@.
     466\item
     467        Have the new brand of type-variable, \emph{Dimension}, in the AST form of a type-variable, to represent one parsed from @[-]@.
     468\item
     469        Allow a type variable to occur in expression position.  Validate (after parsing) that only dimension-branded type variables are used here.
     470\item
     471        Allow an expression to occur in type-argument position.  Brand the resulting type argument as a dimension.
     472\item
     473        Validate (after parsing), on a generic-type usage, \eg the type part of the declaration
     474        \begin{cfa}
     475                @array_1d( foo, bar ) x;@
     476        \end{cfa}
     477        that the brands on the generic arguments match the brands of the declared type variables.
     478        Here, that @foo@ is a type and @bar@ is a dimension.
     479\end{itemize}
    319480
    320481
     
    743904TODO: Discuss the interaction of this dimension hoisting with the challenge of extra unification for cost calculation
    744905
     906
    745907\section{Multidimensional Arrays}
    746908\label{toc:mdimpl}
     
    8831045
    8841046\subsection{Multidimensional array implementation}
     1047\label{s:ArrayMdImpl}
    8851048
    8861049A multidimensional array implementation has three relevant levels of abstraction, from highest to lowest, where the array occupies \emph{contiguous memory}.
Note: See TracChangeset for help on using the changeset viewer.