Context Navigation

-              r3c5fdb5
+              r9a35b43
 \section{Zero Overhead}
+\label{s:zero-overhead}
 At runtime, the \CFA array is exactly a C array.
 …
 \label{f:FutureWork}
+% \subsection{Array Syntax}
+The ultimate goal is to replace all C arrays with \CFA arrays, establishing a higher level of safety in C programs.
+To that end, this section surveys possible next steps that could support this shift.
+\subsection{Array Syntax}
 An important goal is to recast \CFA-style @array(...)@ syntax into C-style array syntax.
 …
 \end{cfa}
 However, there is an ambiguity for a single dimension array, where the syntax for old and new arrays are the same, @int ar[10]@.
 The solution is to use a terminating comma to denote a \CFA-style single-dimension array.
+A solution is to use a terminating comma to denote a \CFA-style single-dimension array.
 \begin{cfa}
 int ar[2$\Huge\color{red},$];  // single dimension new array
 …
 The extra comma in the dimension is only mildly annoying, and acts as eye-candy differentiating old and new arrays.
 Hence, \CFA can repurpose the comma expression in this context for a list of dimensions.
 The ultimate goal is to replace all C arrays with \CFA arrays, establishing a higher level of safety in C programs, and eliminating the need for the terminating comma.
 With respect to the subscript expression, the comma expression is allowed.
 However, a comma expression in this context is rare, and is most commonly a (silent) mistake: subscripting a matrix with @m[i, j]@ instead of @m[i][j]@ selects the @j@th row not the @i, j@ element.
+Should the C/\CFA array distinction disappear in the future the need for the terminating comma woudl too.
+With respect to subscript expression, C allows a comma expression here.
+However, a comma expression in this context is rare, and is most commonly a (silent) mistake: subscripting a C matrix with @m[i, j]@ instead of @m[i][j]@ selects the @j@th row not the @i, j@ element.
 It is still possible to write @m[(i, j)]@ in the new syntax to achieve the equivalent of the old @m[i, j]@.
 Internally, the compiler must de-sugar @[i, j, k]@ into @[i][j][k]@ to match with three calls to subscript operators.
+Internally, the compiler must desugar @[i, j, k]@ into @[i][j][k]@ to match with three calls to subscript operators.
 Note, there is no ambiguity for subscripting a single dimensional array, as the subscript operator selects the correct form from the array type.
 Currently, @array@ supports the old and new subscript syntax \see{\VRef[Figure]{f:ovhd-treat-src}}, including combinations of new and old, @arr[1, 2][3]@.
 …
+\subsection{Error Messages}
+Length was a first foray into non-type polymorphic parameters.
+In one respect, this work's solution is less general than \CC's non-type template parameters\footnote{
+        In another respect, use of dynamic values, the \CFA solution can do more.
+}, which can accept strings, functions and values of many more types.
+A @forall( [N] )@ parameter is @size_t@, only.
+To generalize from \CFA's present solution, to \CC-equivalent type coverage, would be a significant change.
+The \CFA compiler's resolver, the most complex piece, would need to be upgraded to understand these non-type parameters directly.
+Today's solution leverages a special relationship between types and @size_t@: the two-way mapping afforded by @char[n]@ and @sizeof(T)@.
+This mapping allows for desugaring user input into the language that the pre-existing resolver understands, as detailed in \VRef[Section]{s:1d-impl}.
+The downside of this desugaring, as \VRef[Section]{s:ll-user-integration} admonishes, is that it does not provide a means for reporting a user-caused error, from its discovery in the resolver, to the user's attention, without including clutter from the desugaring.
+The \CFA team generally recognizes substantial room for improvement in even general, resolver-alone, errors.
+I fully acknowledge that the current state of errors, from new-array usage mistakes, makes the new-array system hard to learn.
+\subsection{Retire Pointer Arithmetic}
+\CFA has a tremendously flexible core machinery, in which facts about C are expressed.
+Even the fact that there is a @?+?@ operator for a pair of @int@s is not baked into the compiler proper; it is expressed, as a \CFA source-code declaration, by an implicit pseudo-include, in the \newterm{prelude}.
+More controversial facts are stated there too, notably, that every pointer type has arithmetic and subscript operators.
+As \VRef[Section]{s:ArraysDecay} explains, this fact is a necessary component in how a \emph{C array} is indexed.
+But this fact is unused when dereferencing a \CFA @array@, as the subscript operator, in this case, is defined directly on @array@, and not via an intermediate pointer type.
+I conducted a successful proof of concept for the technical elements that underpin this vision.
+\begin{itemize}
+        \item
+        Demote pointer-arithmetic operators from permanent prelude members to header-includable.
+        \item
+        Include them when in a C-compatibility mode, but not when in an enhanced-safety mode.
+        \item
+        Let a development team who maintains a compile unit that they believe they have onboarded to use only new \CFA arrays, set it to build in enhanced-safety mode.
+        \item
+        When a developer falls back on habit and forgetfully uses a C array in the onboarded module, the compiler's error serves as a reminder and a protector of the onboarded code quality.
+\end{itemize}
+This notion should be explored beyond the toy examples it has seen to date.
+\subsection{Defer Size Tracking}
+The current length checking and communication works well in the style of example illustrated in this thesis.
+The characteristic of this style is:
+\begin{itemize}
+        \item
+        Alice holds and array and knows its size.
+        \item
+        Bob offers a service that can apply to an array of any size.
+        \item
+        Alice calls Bob, passing her array; Bob (implicitly) learns the array's (correct) size.
+\end{itemize}
+This interaction is straightforward universal polymorphism, where Bob is parameterized by size.
+And compared with the following alternative, it is much more common.
+But size knowledge could just as well flow the other way:
+\begin{itemize}
+        \item
+        Bob offers a service that provides access to an array of a specific size known to Bob.
+        \item
+        Alice is content to work on an array of any size.
+        \item
+        Alice calls Bob, and receives his array as a return; Alice (implicitly) learns the array's (correct) size.
+\end{itemize}
+This interaction is the dual notion of existential polymorphism, working upon size information.
+Work\cite{arr:futhark:tytheory} accompanying the array-centric language Futhark has detailed its theory.
+An existential size helps cope with unnecessary repetition that occurs as program size increases and more intermediaries separate an array producer and consumer.
+Consider the @School@ structure of \VRef[Figure]{DAM-declaration}, and imagine passing one through a chain of intermediaries.
+While the producer and consumer have every reason to know the school's sizes, the intermediaries do not.
+Yet, they too would have to be parameterized by the numbers of courses and students, so that they could have them in scope at the point where they name the school's type.
+The desired solution is to write the size down, somewhere in relation to the school on which producer and consumer agree, but where it need not bother the intermediaries.
+\begin{cfa}
+struct SchoolRef {
+        size_t classes;
+        size_t students;
+        School(classes, students) * school;
+};
+\end{cfa}
+This declaration is wishful; at the @school@ member declaration site, the names @classes@ and @students@ are not actually in scope.
+But if it worked, @SchoolRef@ would be doing the intermediaries a tremendous service.
+It is not a polymorphic type.
+The intermediaries no longer need size information in scope to pass one along.
+@SchoolRef@ stops the polymorphism of @School@ from flowing farther outward.
+The final consumer need only open the box, discover what's inside, and proceed.
+\begin{cfa}
+forall( [C], [S] ) static void process( School(C, S) * );
+void receive( SchoolRef r ) {
+        with( r ) {
+                classes; students; // are in scope
+                School( classes, students ) * s2 = school; // we can state the type
+                process( s2 ); // we can communicate the type
+        }
+}
+\end{cfa}
+\subsection{Guarantee Zero Overhead}
+\VRef[Section]{s:zero-overhead} demonstrates how \CFA does well at skipping unnecessary runtime bound checks.
+Nevertheless, the scheme presented puts a programmer's hopes in the hands of the optimizer.
+The programmer writes code in which it appears that a bound check ought to be unnecessary, but feedback about success or failure there is indirect.
+Dex\cite{arr:dex:long} is a contemporary array-centric language that follows in the Ada array tradition\cite{arr:ada:learn} of designating a type to be used for subscripting an array.
+While \CFA has its new @[N]@ living in the type system, this @N@ is used to represent an array's size.
+An indexing type would be the type of @i@, in @ar[i]@.
+\begin{cfa}
+const size_t n = ask_the_user();
+typedef range(0, n) till_n_t; // does not exist in CFA yet
+till_n_t x = 42; // static reject
+till_n_t y = check(42); // static accept, dynamic check
+void f( array(int, n) & ar, till_n_t i ) {
+        sout | ar[ i ];  // bound check unnecessary
+}
+\end{cfa}
+In today's \CFA solution, any such @i@ is @size_t@.
+\begin{cfa}
+void f2( array(int, n) & ar, size_t i ) {
+        sout | ar[ i ];  // bound check necessary
+}
+\end{cfa}
+With a constrained index type, bound checks shift from being implicit at the time of an array access, to explicit at the time of a conversion.
+A user then has the option to import only the index-typed subscript operator.
+In such a setup, no implicit bound check can occur.
+Bound checks occur exactly where the user places them.
+If the user succeeds at needing no explicit bound checks, then the user is assured that there are no bound checks.
+Prior \CFA work\cite{Liang24} has touched upon this topic.
+Therein, a user's enumeration type is adapted to serve as the domain of an array.
+However, this work progressed as far as inferring an array size from the enumeration type, \eg that @int happy_hours[ weekday_t ]@ should be a declaration with length seven.
+It did not address forbidding @happy_hours[ 42 ]@.
+Both enumerations, and the above hand-waived @range(0, n)@ are valid indexing types.
+Supporting indexing types would put the user more in control of a program's performance and failure modes.
 \begin{comment}
 \subsection{Range Slicing}
 …
-\subsection{Retire Pointer Arithmetic}

Note: See TracChangeset for help on using the changeset viewer.

Context Navigation

Changeset 9a35b43 for doc/theses/mike_brooks_MMath/array.tex

Legend:

doc/theses/mike_brooks_MMath/array.tex

Download in other formats: