Context Navigation

Reverse Diff

Changes in / [3a08cb1:e8b5ba4]

File:

: 1 edited

doc/theses/mike_brooks_MMath/array.tex (modified) (1 diff)

Legend:

: Unmodified
: Added
: Removed

doc/theses/mike_brooks_MMath/array.tex

-              r3a08cb1
+              re8b5ba4
 \end{figure}
-\section{Typing of C Arrays}
-Essential in giving a guarantee of accurate length is the compiler's ability
-to reject a program that presumes to mishandle length.
-By contrast, most discussion so far dealt with communicating length,
-from one party who knows it, to another who is willing to work with any given length.
-For scenarios where the concern is a mishandled length,
-the interaction is between two parties who both claim to know (something about) it.
-Such a scenario occurs in this pure C fragment, wich today's C compilers accept:
-\begin{cfa}
-        int n = @42@;
-        float x[n];
-        float (*xp)[@999@] = &x;
-        (*xp)[@500@];  // in "bound"?
-\end{cfa}
-Here, the array @x@ has length 42, while a pointer to it (@xp@) claims length 999.
-So, while the subscript of @xp@ at position 500 is out of bound of its referent @x@,
-the access appears in-bound of the type information available on @xp@.
-Truly, length is being mishandled in the previous step,
-where the type-carried length information on @x@ is not compatible with that of @xp@.
-The \CFA new-array rejects the analogous case:
-\begin{cfa}
-        int n = @42@;
-        array(float, n) x;
-        array(float, 999) * xp = x; // static rejection here
-        (*xp)[@500@]; // runtime check vs len 999
-\end{cfa}
-% TODO: kill the vertical whitespace around these lists
-% nothing from https://stackoverflow.com/questions/1061112/eliminate-space-before-beginitemize is working
-The way the \CFA array is implemented,
-the type analysis of this \CFA case reduces to a case similar to the earlier C version.
-The \CFA compiler's compatibility analysis proceeds as:
-\begin{itemize}[noitemsep,partopsep=-\parskip,parsep=0pt,leftmargin=4em]
-\item
-        Is @array(float, 999)@ type-compatible with @array(float, n)@?
-\item
-        Is @arrayX(float, char[999])@ type-compatible with @arrayX(float, char[n])@?
-        \footnote{Here, \lstinline{arrayX} represents the type that results
-                from desugaring the \lstinline{array} type
-                into a type whose generic parameters are all types.
-                This presentation elides the noisy fact that
-                \lstinline{array} is actually a macro for something bigger;
-                the reduction to \lstinline{char[-]} still proceeds as sketched.}
-\item
-        Is @char[999]@ type-compatible with @char[n]@?
-\end{itemize}
-I chose to achieve the necessary rejection of the \CFA case
-by adding a rejection of the corresponding C case.
-This decision is not backward compatible.
-There are two complementary mitigations for this incompatibility.
-First, a simple recourse is available to a programmer who intends to proceed
-with the statically unsound assignment.
-This situation might arise if @n@ were known to be 999,
-rather than 42, as in the introductory examples.
-The programmer can add a cast, as in:
-\begin{cfa}
-        xp = ( float (*)[999] ) & x;
-\end{cfa}
-This addition causes \CFA to accept, beacause now, the programmer has accepted blame.
-This addition is benign in plain C, because the cast is valid, just unnecessary there.
-Moreover, the addition can even be seen as appropriate ``eye candy,''
-marking where the unchecked length knowledge is used.
-Therefore, a program being onboarded to \CFA can receive a simple upgrade,
-to satisfy the \CFA rules (and arguably become clearer),
-without giving up its validity to a plain C compiler.
-Second, the incompatibility only affects types like pointer-to-array,
-which are are infrequently used in C.
-The more common C idiom for aliasing an array is to use the pointer-to-first-element type,
-which does not participate in the \CFA array's length checking.
-\footnote{Notably, the desugaring of the \lstinline{array@} type,
-        avoids letting any \lstinline{-[-]} type decay,
-        in order to preserve the length information that powers runtime bound checking.}
-Therefore, the frequency of needing to upgrade wild C code (as discussed in the first mitigation)
-is anticipated to be low.
-Because the incompatibility represents a low cost to a \CFA onboarding effort
-(with a plausible side benefit of linting the original code for a missing annotation),
-I elected not to add special measures to retain the compatibility.
-It would be possible to flag occurrences of @-[-]@ types that come from @array@ desugaring,
-treating those with stricter \CFA rules, while treating others with classic C rules.
-If future lessons from C project onboarding warrant it,
-this special compatibility measure can be added.
-Having allowed that both the initial C example's check
-\begin{itemize}[noitemsep,partopsep=-\parskip,parsep=0pt,leftmargin=4em]
-        \item
-                Is @float[999]@ type-compatible with @float[n]@?
-\end{itemize}
-and the second \CFA exmple's induced check
-\begin{itemize}[noitemsep,partopsep=-\parskip,parsep=0pt,leftmargin=4em]
-        \item
-                Is @char[999]@ type-compatible with @char[n]@?
-\end{itemize}
-shall have the same answer, (``no''),
-discussion turns to how I got the \CFA compiler to produce this answer.
-In its preexisting form, it produced a (buggy) approximation of the C rules.
-To implement the new \CFA rules, I took the syntactic recursion a step further, obtaining,
-in both cases:
-\begin{itemize}[noitemsep,partopsep=-\parskip,parsep=0pt,leftmargin=4em]
-        \item
-                Is @999@ TBD-compatible with @n@?
-\end{itemize}
-This compatibility question applies to a pair of expressions, where the earlier ones were to types.
-Such an expression-compatibility question is a new addition to the \CFA compiler.
-These questions only arise in the context of dimension expressions on (C) array types.
-TODO: ensure these compiler implementation matters are treated under \CFA compiler background:
-type unification,
-cost calculation,
-GenPoly.
-The relevant technical component of the \CFA compiler is,
-within the type resolver, the type unification procedure.
-I added rules for continuing this unification into expressions that occur within types.
-It is still fundamentally doing \emph{type} unification
-because it is participating in binding type variables,
-and not participating in binding any variables that stand in for expression fragments
-(for there is no such sort of variable in \CFA's analysis.)
-An unfortunate fact about the \CFA compiler's preexisting implementation is that
-type unification suffers from two forms of duplication.
-The first duplication has (many of) the unification rules stated twice.
-As a result, my additions for dimension expressions are stated twice.
-The extra statement of the rules occurs in the GenPoly module,
-where concrete types like @array(int, 5)@\footnote{
-        Again, the presentation is simplified
-        by leaving the \lstinline{array} macro unexpanded}
-are lowered into corresponding C types @struct __conc_array_1234@ (the suffix being a generated index).
-In this case, the struct's definition gives fields that hardcode the argument values of @float@ and @5@.
-The next time an @array(-,-)@ concrete instance is encountered,
-is the previous @struct __conc_array_1234@ suitable for it?
-Yes, for another occurrance of @array(int, 5)@;
-no, for either @array(rational(int), 5)@ or @array(int, 42)@.
-By the last example, this phase must ``reject''
-the hypothesis that it should reuse the dimension-5 instance's C-lowering for a dimension-42 instance.
-The second duplication has unification (proper) being invoked at two stages of expression resolution.
-As a result, my added rule set needs to handle more cases than the preceding discussion motivates.
-In the program
-\begin{cfa}
-        void f( double );
-        forall( T & ) void f( T & );
-        void g( int n ) {
-                array( float, n + 1 ) x;
-                f(x);
+        }
-\end{cfa}
-when resolving the function call, the first unification stage
-compares the types @T@, of the parameter, with @array( float, n + 1 )@, of the argument.
-TODO: finish.
-The actual rules for comparing two dimension expressions are conservative.
-To answer, ``yes, consider this pair of expressions to be matching,''
-is to imply, ``all else being equal, allow an array with length calculated by $e_1$
-to be passed to a function expecting a length-$e_2$ array.''\footnote{
-        TODO: Deal with directionality, that I'm doing exact-match, no ``at least as long as,'' no subtyping.
-        Should it be an earlier scoping principle?  Feels like it should matter in more places than here.}
-So, a ``yes'' answer must represent a guarantee that both expressions will evaluate the
-same result, while a ``no'' can tolerate ``they might, but we're not sure,'
-provided that practical recourses are available
-to let programmers express their better knowledge.
-The specific rule-set that I offer with the current release is, in fact, extremely conservative.
-I chose to keep things simple,
-and allow real future needs do drive adding additional complexity,
-within the framework that I laid out.
-For starters, the original motivating example's rejection
-is not based on knowledge that
-the @xp@ length of (the literal) 999 is value-unequal to
-the (obvious) runtime value of the variable @n@, which is the @x@ length.
-Rather, the analysis assumes a variable's value can be anything,
-and so there can be no guarantee that its value is 999.
-So, a variable use and a literal can never match.
-Two occurrences of the same literal value are obviously a fine match.
-For two occurrences of the same varialbe, more information is needed.
-For example, this one is fine
-\begin{cfa}
-        void f( const int n ) {
-                float x[n];
-                float (*xp)[n] = x; // accept
+        }
-\end{cfa}
-while this one is not:
-\begin{cfa}
-        void f() {
-                int n = 42;
-                float x[n];
-                n = 999;
-                float (*xp)[n] = x; // reject
+        }
-\end{cfa}
-Furthermore, the fact that the first example sees @n@ as @const@
-is not actually a sufficent basis.
-In this example, @f@'s length expression's declaration is as @const@ as it can be,
-yet its value still changes between the two invocations:
-\begin{cfa}
-        // compile unit 1
-        void g();
-        void f( const int & const nr ) {
-                float x[nr];
-                g();
-                float (*xp)[nr] = x; // reject
+        }
-        // compile unit 2
-        static int n = 42;
-        void g() {
-                n = 99;
+        }
-        void f( const int & );
-        int main () {
-                f(n);
-                return 0;
+        }
-\end{cfa}
-The issue in this last case is,
-just because you aren't able to change something doesn't mean someone else can't.
-My rule set also respects a feature of the C tradition.
-In spite of the several limitations of the C rules
-accepting cases that produce different values, there are a few mismatches that C stops.
-C is quite precise when working with two static values:
-\begin{cfa}
-        enum { fortytwo = 42 };
-        float x[fortytwo];
-        float (*xp1)[42] = &x; // accept
-        float (*xp2)[999] = &x; // reject
-\end{cfa}
-My \CFA rules agree with C's on these cases.
-My rules classify expressions into three groups:
-\begin{description}
-\item[Statically Evaluable]
-        Expressions for which a specific value can be calculated (conservatively)
-        at compile-time.
-        A preexisting \CFA compiler module defines which expressions qualify,
-        and evaluates them.
-        Includes literals and enumeration values.
-\item[Dynamic but Stable]
-        The value of a variable declared as @const@.
-        Includes a @const@ parameter.
-\item[Potentially Unstable]
-        The catch-all category.  Notable examples include:
-        any function-call result (@float x[foo()];@),
-        the particular function-call result that is a pointer dereference (@void f(const int * n) { float x[*n]; }@), and
-        any use of a reference-typed variable.
-\end{description}
-My \CFA rules are:
-\begin{itemize}
-\item
-        Accept a Statically Evaluable pair, if both expressions have the same value.
-        Notably, this rule allows a literal to match with an enumeration value, based on the value.
-\item
-        Accept a Dynamic but Stable pair, if both expressions are written out the same, e.g. refers to same variable declaration.
-\item
-        Otherwise, reject.
-        Notably, reject all pairs from the Potentially Unstable group.
-        Notably, reject all pairs that cross groups.
-\end{itemize}
-TODO: summarize the C rules and add the case-comparison table
-TODO: Discuss Recourse
-TODO: Discuss dimension hoisting, which addresses the challenge of extra unification for cost calculation
 \section{Multidimensional Arrays}

Note: See TracChangeset for help on using the changeset viewer.

Context Navigation

Changes in / [3a08cb1:e8b5ba4]

Legend:

doc/theses/mike_brooks_MMath/array.tex

Download in other formats: