Changeset 9a35b43 for doc/theses/mike_brooks_MMath/array.tex
- Timestamp:
- Apr 28, 2026, 4:50:54 AM (19 hours ago)
- Branches:
- master
- Children:
- bf8112b
- Parents:
- 3c5fdb5
- File:
-
- 1 edited
-
doc/theses/mike_brooks_MMath/array.tex (modified) (6 diffs)
Legend:
- Unmodified
- Added
- Removed
-
doc/theses/mike_brooks_MMath/array.tex
r3c5fdb5 r9a35b43 1084 1084 1085 1085 \section{Zero Overhead} 1086 \label{s:zero-overhead} 1086 1087 1087 1088 At runtime, the \CFA array is exactly a C array. … … 1845 1846 \label{f:FutureWork} 1846 1847 1847 % \subsection{Array Syntax} 1848 The ultimate goal is to replace all C arrays with \CFA arrays, establishing a higher level of safety in C programs. 1849 To that end, this section surveys possible next steps that could support this shift. 1850 1851 \subsection{Array Syntax} 1848 1852 1849 1853 An important goal is to recast \CFA-style @array(...)@ syntax into C-style array syntax. … … 1866 1870 \end{cfa} 1867 1871 However, there is an ambiguity for a single dimension array, where the syntax for old and new arrays are the same, @int ar[10]@. 1868 Thesolution is to use a terminating comma to denote a \CFA-style single-dimension array.1872 A solution is to use a terminating comma to denote a \CFA-style single-dimension array. 1869 1873 \begin{cfa} 1870 1874 int ar[2$\Huge\color{red},$]; // single dimension new array … … 1873 1877 The extra comma in the dimension is only mildly annoying, and acts as eye-candy differentiating old and new arrays. 1874 1878 Hence, \CFA can repurpose the comma expression in this context for a list of dimensions. 1875 The ultimate goal is to replace all C arrays with \CFA arrays, establishing a higher level of safety in C programs, and eliminating the need for the terminating comma.1876 With respect to the subscript expression, the comma expression is allowed.1877 However, a comma expression in this context is rare, and is most commonly a (silent) mistake: subscripting a matrix with @m[i, j]@ instead of @m[i][j]@ selects the @j@th row not the @i, j@ element.1879 Should the C/\CFA array distinction disappear in the future the need for the terminating comma woudl too. 1880 With respect to subscript expression, C allows a comma expression here. 1881 However, a comma expression in this context is rare, and is most commonly a (silent) mistake: subscripting a C matrix with @m[i, j]@ instead of @m[i][j]@ selects the @j@th row not the @i, j@ element. 1878 1882 It is still possible to write @m[(i, j)]@ in the new syntax to achieve the equivalent of the old @m[i, j]@. 1879 Internally, the compiler must de -sugar @[i, j, k]@ into @[i][j][k]@ to match with three calls to subscript operators.1883 Internally, the compiler must desugar @[i, j, k]@ into @[i][j][k]@ to match with three calls to subscript operators. 1880 1884 Note, there is no ambiguity for subscripting a single dimensional array, as the subscript operator selects the correct form from the array type. 1881 1885 Currently, @array@ supports the old and new subscript syntax \see{\VRef[Figure]{f:ovhd-treat-src}}, including combinations of new and old, @arr[1, 2][3]@. … … 1883 1887 1884 1888 1889 1890 \subsection{Error Messages} 1891 1892 Length was a first foray into non-type polymorphic parameters. 1893 In one respect, this work's solution is less general than \CC's non-type template parameters\footnote{ 1894 In another respect, use of dynamic values, the \CFA solution can do more. 1895 }, which can accept strings, functions and values of many more types. 1896 A @forall( [N] )@ parameter is @size_t@, only. 1897 1898 To generalize from \CFA's present solution, to \CC-equivalent type coverage, would be a significant change. 1899 The \CFA compiler's resolver, the most complex piece, would need to be upgraded to understand these non-type parameters directly. 1900 1901 Today's solution leverages a special relationship between types and @size_t@: the two-way mapping afforded by @char[n]@ and @sizeof(T)@. 1902 This mapping allows for desugaring user input into the language that the pre-existing resolver understands, as detailed in \VRef[Section]{s:1d-impl}. 1903 1904 The downside of this desugaring, as \VRef[Section]{s:ll-user-integration} admonishes, is that it does not provide a means for reporting a user-caused error, from its discovery in the resolver, to the user's attention, without including clutter from the desugaring. 1905 The \CFA team generally recognizes substantial room for improvement in even general, resolver-alone, errors. 1906 I fully acknowledge that the current state of errors, from new-array usage mistakes, makes the new-array system hard to learn. 1907 1908 1909 \subsection{Retire Pointer Arithmetic} 1910 1911 \CFA has a tremendously flexible core machinery, in which facts about C are expressed. 1912 Even the fact that there is a @?+?@ operator for a pair of @int@s is not baked into the compiler proper; it is expressed, as a \CFA source-code declaration, by an implicit pseudo-include, in the \newterm{prelude}. 1913 More controversial facts are stated there too, notably, that every pointer type has arithmetic and subscript operators. 1914 As \VRef[Section]{s:ArraysDecay} explains, this fact is a necessary component in how a \emph{C array} is indexed. 1915 But this fact is unused when dereferencing a \CFA @array@, as the subscript operator, in this case, is defined directly on @array@, and not via an intermediate pointer type. 1916 1917 I conducted a successful proof of concept for the technical elements that underpin this vision. 1918 \begin{itemize} 1919 \item 1920 Demote pointer-arithmetic operators from permanent prelude members to header-includable. 1921 \item 1922 Include them when in a C-compatibility mode, but not when in an enhanced-safety mode. 1923 \item 1924 Let a development team who maintains a compile unit that they believe they have onboarded to use only new \CFA arrays, set it to build in enhanced-safety mode. 1925 \item 1926 When a developer falls back on habit and forgetfully uses a C array in the onboarded module, the compiler's error serves as a reminder and a protector of the onboarded code quality. 1927 \end{itemize} 1928 1929 This notion should be explored beyond the toy examples it has seen to date. 1930 1931 1932 \subsection{Defer Size Tracking} 1933 1934 The current length checking and communication works well in the style of example illustrated in this thesis. 1935 The characteristic of this style is: 1936 \begin{itemize} 1937 \item 1938 Alice holds and array and knows its size. 1939 \item 1940 Bob offers a service that can apply to an array of any size. 1941 \item 1942 Alice calls Bob, passing her array; Bob (implicitly) learns the array's (correct) size. 1943 \end{itemize} 1944 This interaction is straightforward universal polymorphism, where Bob is parameterized by size. 1945 And compared with the following alternative, it is much more common. 1946 1947 But size knowledge could just as well flow the other way: 1948 \begin{itemize} 1949 \item 1950 Bob offers a service that provides access to an array of a specific size known to Bob. 1951 \item 1952 Alice is content to work on an array of any size. 1953 \item 1954 Alice calls Bob, and receives his array as a return; Alice (implicitly) learns the array's (correct) size. 1955 \end{itemize} 1956 This interaction is the dual notion of existential polymorphism, working upon size information. 1957 Work\cite{arr:futhark:tytheory} accompanying the array-centric language Futhark has detailed its theory. 1958 1959 An existential size helps cope with unnecessary repetition that occurs as program size increases and more intermediaries separate an array producer and consumer. 1960 Consider the @School@ structure of \VRef[Figure]{DAM-declaration}, and imagine passing one through a chain of intermediaries. 1961 While the producer and consumer have every reason to know the school's sizes, the intermediaries do not. 1962 Yet, they too would have to be parameterized by the numbers of courses and students, so that they could have them in scope at the point where they name the school's type. 1963 1964 The desired solution is to write the size down, somewhere in relation to the school on which producer and consumer agree, but where it need not bother the intermediaries. 1965 \begin{cfa} 1966 struct SchoolRef { 1967 size_t classes; 1968 size_t students; 1969 School(classes, students) * school; 1970 }; 1971 \end{cfa} 1972 This declaration is wishful; at the @school@ member declaration site, the names @classes@ and @students@ are not actually in scope. 1973 But if it worked, @SchoolRef@ would be doing the intermediaries a tremendous service. 1974 It is not a polymorphic type. 1975 The intermediaries no longer need size information in scope to pass one along. 1976 @SchoolRef@ stops the polymorphism of @School@ from flowing farther outward. 1977 The final consumer need only open the box, discover what's inside, and proceed. 1978 \begin{cfa} 1979 forall( [C], [S] ) static void process( School(C, S) * ); 1980 void receive( SchoolRef r ) { 1981 with( r ) { 1982 classes; students; // are in scope 1983 School( classes, students ) * s2 = school; // we can state the type 1984 process( s2 ); // we can communicate the type 1985 } 1986 } 1987 \end{cfa} 1988 1989 1990 \subsection{Guarantee Zero Overhead} 1991 1992 \VRef[Section]{s:zero-overhead} demonstrates how \CFA does well at skipping unnecessary runtime bound checks. 1993 Nevertheless, the scheme presented puts a programmer's hopes in the hands of the optimizer. 1994 The programmer writes code in which it appears that a bound check ought to be unnecessary, but feedback about success or failure there is indirect. 1995 1996 Dex\cite{arr:dex:long} is a contemporary array-centric language that follows in the Ada array tradition\cite{arr:ada:learn} of designating a type to be used for subscripting an array. 1997 While \CFA has its new @[N]@ living in the type system, this @N@ is used to represent an array's size. 1998 An indexing type would be the type of @i@, in @ar[i]@. 1999 \begin{cfa} 2000 const size_t n = ask_the_user(); 2001 typedef range(0, n) till_n_t; // does not exist in CFA yet 2002 till_n_t x = 42; // static reject 2003 till_n_t y = check(42); // static accept, dynamic check 2004 void f( array(int, n) & ar, till_n_t i ) { 2005 sout | ar[ i ]; // bound check unnecessary 2006 } 2007 \end{cfa} 2008 In today's \CFA solution, any such @i@ is @size_t@. 2009 \begin{cfa} 2010 void f2( array(int, n) & ar, size_t i ) { 2011 sout | ar[ i ]; // bound check necessary 2012 } 2013 \end{cfa} 2014 2015 With a constrained index type, bound checks shift from being implicit at the time of an array access, to explicit at the time of a conversion. 2016 A user then has the option to import only the index-typed subscript operator. 2017 In such a setup, no implicit bound check can occur. 2018 Bound checks occur exactly where the user places them. 2019 If the user succeeds at needing no explicit bound checks, then the user is assured that there are no bound checks. 2020 2021 Prior \CFA work\cite{Liang24} has touched upon this topic. 2022 Therein, a user's enumeration type is adapted to serve as the domain of an array. 2023 However, this work progressed as far as inferring an array size from the enumeration type, \eg that @int happy_hours[ weekday_t ]@ should be a declaration with length seven. 2024 It did not address forbidding @happy_hours[ 42 ]@. 2025 2026 Both enumerations, and the above hand-waived @range(0, n)@ are valid indexing types. 2027 Supporting indexing types would put the user more in control of a program's performance and failure modes. 2028 1885 2029 \begin{comment} 1886 2030 \subsection{Range Slicing} … … 1889 2033 1890 2034 1891 \subsection{Retire Pointer Arithmetic}1892 2035 1893 2036
Note:
See TracChangeset
for help on using the changeset viewer.