Ignore:
Timestamp:
Apr 28, 2026, 4:50:54 AM (19 hours ago)
Author:
Michael Brooks <mlbrooks@…>
Branches:
master
Children:
bf8112b
Parents:
3c5fdb5
Message:

add more array future work

File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/mike_brooks_MMath/array.tex

    r3c5fdb5 r9a35b43  
    10841084
    10851085\section{Zero Overhead}
     1086\label{s:zero-overhead}
    10861087
    10871088At runtime, the \CFA array is exactly a C array.
     
    18451846\label{f:FutureWork}
    18461847
    1847 % \subsection{Array Syntax}
     1848The ultimate goal is to replace all C arrays with \CFA arrays, establishing a higher level of safety in C programs.
     1849To that end, this section surveys possible next steps that could support this shift.
     1850
     1851\subsection{Array Syntax}
    18481852
    18491853An important goal is to recast \CFA-style @array(...)@ syntax into C-style array syntax.
     
    18661870\end{cfa}
    18671871However, there is an ambiguity for a single dimension array, where the syntax for old and new arrays are the same, @int ar[10]@.
    1868 The solution is to use a terminating comma to denote a \CFA-style single-dimension array.
     1872A solution is to use a terminating comma to denote a \CFA-style single-dimension array.
    18691873\begin{cfa}
    18701874int ar[2$\Huge\color{red},$];  // single dimension new array
     
    18731877The extra comma in the dimension is only mildly annoying, and acts as eye-candy differentiating old and new arrays.
    18741878Hence, \CFA can repurpose the comma expression in this context for a list of dimensions.
    1875 The ultimate goal is to replace all C arrays with \CFA arrays, establishing a higher level of safety in C programs, and eliminating the need for the terminating comma.
    1876 With respect to the subscript expression, the comma expression is allowed.
    1877 However, a comma expression in this context is rare, and is most commonly a (silent) mistake: subscripting a matrix with @m[i, j]@ instead of @m[i][j]@ selects the @j@th row not the @i, j@ element.
     1879Should the C/\CFA array distinction disappear in the future the need for the terminating comma woudl too.
     1880With respect to subscript expression, C allows a comma expression here.
     1881However, a comma expression in this context is rare, and is most commonly a (silent) mistake: subscripting a C matrix with @m[i, j]@ instead of @m[i][j]@ selects the @j@th row not the @i, j@ element.
    18781882It is still possible to write @m[(i, j)]@ in the new syntax to achieve the equivalent of the old @m[i, j]@.
    1879 Internally, the compiler must de-sugar @[i, j, k]@ into @[i][j][k]@ to match with three calls to subscript operators.
     1883Internally, the compiler must desugar @[i, j, k]@ into @[i][j][k]@ to match with three calls to subscript operators.
    18801884Note, there is no ambiguity for subscripting a single dimensional array, as the subscript operator selects the correct form from the array type.
    18811885Currently, @array@ supports the old and new subscript syntax \see{\VRef[Figure]{f:ovhd-treat-src}}, including combinations of new and old, @arr[1, 2][3]@.
     
    18831887
    18841888
     1889
     1890\subsection{Error Messages}
     1891
     1892Length was a first foray into non-type polymorphic parameters.
     1893In one respect, this work's solution is less general than \CC's non-type template parameters\footnote{
     1894        In another respect, use of dynamic values, the \CFA solution can do more.
     1895}, which can accept strings, functions and values of many more types.
     1896A @forall( [N] )@ parameter is @size_t@, only.
     1897
     1898To generalize from \CFA's present solution, to \CC-equivalent type coverage, would be a significant change.
     1899The \CFA compiler's resolver, the most complex piece, would need to be upgraded to understand these non-type parameters directly.
     1900
     1901Today's solution leverages a special relationship between types and @size_t@: the two-way mapping afforded by @char[n]@ and @sizeof(T)@.
     1902This mapping allows for desugaring user input into the language that the pre-existing resolver understands, as detailed in \VRef[Section]{s:1d-impl}.
     1903
     1904The downside of this desugaring, as \VRef[Section]{s:ll-user-integration} admonishes, is that it does not provide a means for reporting a user-caused error, from its discovery in the resolver, to the user's attention, without including clutter from the desugaring.
     1905The \CFA team generally recognizes substantial room for improvement in even general, resolver-alone, errors.
     1906I fully acknowledge that the current state of errors, from new-array usage mistakes, makes the new-array system hard to learn.
     1907
     1908
     1909\subsection{Retire Pointer Arithmetic}
     1910
     1911\CFA has a tremendously flexible core machinery, in which facts about C are expressed.
     1912Even the fact that there is a @?+?@ operator for a pair of @int@s is not baked into the compiler proper; it is expressed, as a \CFA source-code declaration, by an implicit pseudo-include, in the \newterm{prelude}.
     1913More controversial facts are stated there too, notably, that every pointer type has arithmetic and subscript operators.
     1914As \VRef[Section]{s:ArraysDecay} explains, this fact is a necessary component in how a \emph{C array} is indexed.
     1915But this fact is unused when dereferencing a \CFA @array@, as the subscript operator, in this case, is defined directly on @array@, and not via an intermediate pointer type.
     1916
     1917I conducted a successful proof of concept for the technical elements that underpin this vision.
     1918\begin{itemize}
     1919        \item
     1920        Demote pointer-arithmetic operators from permanent prelude members to header-includable.
     1921        \item
     1922        Include them when in a C-compatibility mode, but not when in an enhanced-safety mode.
     1923        \item
     1924        Let a development team who maintains a compile unit that they believe they have onboarded to use only new \CFA arrays, set it to build in enhanced-safety mode.
     1925        \item
     1926        When a developer falls back on habit and forgetfully uses a C array in the onboarded module, the compiler's error serves as a reminder and a protector of the onboarded code quality.
     1927\end{itemize}
     1928
     1929This notion should be explored beyond the toy examples it has seen to date.
     1930
     1931
     1932\subsection{Defer Size Tracking}
     1933
     1934The current length checking and communication works well in the style of example illustrated in this thesis.
     1935The characteristic of this style is:
     1936\begin{itemize}
     1937        \item
     1938        Alice holds and array and knows its size.
     1939        \item
     1940        Bob offers a service that can apply to an array of any size.
     1941        \item
     1942        Alice calls Bob, passing her array; Bob (implicitly) learns the array's (correct) size.
     1943\end{itemize}
     1944This interaction is straightforward universal polymorphism, where Bob is parameterized by size.
     1945And compared with the following alternative, it is much more common.
     1946
     1947But size knowledge could just as well flow the other way:
     1948\begin{itemize}
     1949        \item
     1950        Bob offers a service that provides access to an array of a specific size known to Bob.
     1951        \item
     1952        Alice is content to work on an array of any size.
     1953        \item
     1954        Alice calls Bob, and receives his array as a return; Alice (implicitly) learns the array's (correct) size.
     1955\end{itemize}
     1956This interaction is the dual notion of existential polymorphism, working upon size information.
     1957Work\cite{arr:futhark:tytheory} accompanying the array-centric language Futhark has detailed its theory.
     1958
     1959An existential size helps cope with unnecessary repetition that occurs as program size increases and more intermediaries separate an array producer and consumer.
     1960Consider the @School@ structure of \VRef[Figure]{DAM-declaration}, and imagine passing one through a chain of intermediaries.
     1961While the producer and consumer have every reason to know the school's sizes, the intermediaries do not.
     1962Yet, they too would have to be parameterized by the numbers of courses and students, so that they could have them in scope at the point where they name the school's type.
     1963
     1964The desired solution is to write the size down, somewhere in relation to the school on which producer and consumer agree, but where it need not bother the intermediaries.
     1965\begin{cfa}
     1966struct SchoolRef {
     1967        size_t classes;
     1968        size_t students;
     1969        School(classes, students) * school;
     1970};
     1971\end{cfa}
     1972This declaration is wishful; at the @school@ member declaration site, the names @classes@ and @students@ are not actually in scope.
     1973But if it worked, @SchoolRef@ would be doing the intermediaries a tremendous service.
     1974It is not a polymorphic type.
     1975The intermediaries no longer need size information in scope to pass one along.
     1976@SchoolRef@ stops the polymorphism of @School@ from flowing farther outward.
     1977The final consumer need only open the box, discover what's inside, and proceed.
     1978\begin{cfa}
     1979forall( [C], [S] ) static void process( School(C, S) * );
     1980void receive( SchoolRef r ) {
     1981        with( r ) {
     1982                classes; students; // are in scope
     1983                School( classes, students ) * s2 = school; // we can state the type
     1984                process( s2 ); // we can communicate the type
     1985        }
     1986}
     1987\end{cfa}
     1988
     1989
     1990\subsection{Guarantee Zero Overhead}
     1991
     1992\VRef[Section]{s:zero-overhead} demonstrates how \CFA does well at skipping unnecessary runtime bound checks.
     1993Nevertheless, the scheme presented puts a programmer's hopes in the hands of the optimizer.
     1994The programmer writes code in which it appears that a bound check ought to be unnecessary, but feedback about success or failure there is indirect.
     1995
     1996Dex\cite{arr:dex:long} is a contemporary array-centric language that follows in the Ada array tradition\cite{arr:ada:learn} of designating a type to be used for subscripting an array.
     1997While \CFA has its new @[N]@ living in the type system, this @N@ is used to represent an array's size.
     1998An indexing type would be the type of @i@, in @ar[i]@.
     1999\begin{cfa}
     2000const size_t n = ask_the_user();
     2001typedef range(0, n) till_n_t; // does not exist in CFA yet
     2002till_n_t x = 42; // static reject
     2003till_n_t y = check(42); // static accept, dynamic check
     2004void f( array(int, n) & ar, till_n_t i ) {
     2005        sout | ar[ i ];  // bound check unnecessary
     2006}
     2007\end{cfa}
     2008In today's \CFA solution, any such @i@ is @size_t@.
     2009\begin{cfa}
     2010void f2( array(int, n) & ar, size_t i ) {
     2011        sout | ar[ i ];  // bound check necessary
     2012}
     2013\end{cfa}
     2014
     2015With a constrained index type, bound checks shift from being implicit at the time of an array access, to explicit at the time of a conversion.
     2016A user then has the option to import only the index-typed subscript operator.
     2017In such a setup, no implicit bound check can occur.
     2018Bound checks occur exactly where the user places them.
     2019If the user succeeds at needing no explicit bound checks, then the user is assured that there are no bound checks.
     2020
     2021Prior \CFA work\cite{Liang24} has touched upon this topic.
     2022Therein, a user's enumeration type is adapted to serve as the domain of an array.
     2023However, this work progressed as far as inferring an array size from the enumeration type, \eg that @int happy_hours[ weekday_t ]@ should be a declaration with length seven.
     2024It did not address forbidding @happy_hours[ 42 ]@.
     2025
     2026Both enumerations, and the above hand-waived @range(0, n)@ are valid indexing types.
     2027Supporting indexing types would put the user more in control of a program's performance and failure modes.
     2028
    18852029\begin{comment}
    18862030\subsection{Range Slicing}
     
    18892033
    18902034
    1891 \subsection{Retire Pointer Arithmetic}
    18922035
    18932036
Note: See TracChangeset for help on using the changeset viewer.