Context Navigation

-                      r02ea981
+                      r187be97
 \end{figure}
+\section{Typing of C Arrays}
+Essential in giving a guarantee of accurate length is the compiler's ability
+to reject a program that presumes to mishandle length.
+By contrast, most discussion so far dealt with communicating length,
+from one party who knows it, to another who is willing to work with any given length.
+For scenarios where the concern is a mishandled length,
+the interaction is between two parties who both claim to know (something about) it.
+Such a scenario occurs in this pure C fragment, wich today's C compilers accept:
+\begin{cfa}
+        int n = @42@;
+        float x[n];
+        float (*xp)[@999@] = &x;
+        (*xp)[@500@];  // in "bound"?
+\end{cfa}
+Here, the array @x@ has length 42, while a pointer to it (@xp@) claims length 999.
+So, while the subscript of @xp@ at position 500 is out of bound of its referent @x@,
+the access appears in-bound of the type information available on @xp@.
+Truly, length is being mishandled in the previous step,
+where the type-carried length information on @x@ is not compatible with that of @xp@.
+The \CFA new-array rejects the analogous case:
+\begin{cfa}
+        int n = @42@;
+        array(float, n) x;
+        array(float, 999) * xp = x; // static rejection here
+        (*xp)[@500@]; // runtime check vs len 999
+\end{cfa}
+% TODO: kill the vertical whitespace around these lists
+% nothing from https://stackoverflow.com/questions/1061112/eliminate-space-before-beginitemize is working
+The way the \CFA array is implemented,
+the type analysis of this \CFA case reduces to a case similar to the earlier C version.
+The \CFA compiler's compatibility analysis proceeds as:
+\begin{itemize}[noitemsep,partopsep=-\parskip,parsep=0pt,leftmargin=4em]
+\item
+        Is @array(float, 999)@ type-compatible with @array(float, n)@?
+\item
+        Is @arrayX(float, char[999])@ type-compatible with @arrayX(float, char[n])@?
+        \footnote{Here, \lstinline{arrayX} represents the type that results
+                from desugaring the \lstinline{array} type
+                into a type whose generic parameters are all types.
+                This presentation elides the noisy fact that
+                \lstinline{array} is actually a macro for something bigger;
+                the reduction to \lstinline{char[-]} still proceeds as sketched.}
+\item
+        Is @char[999]@ type-compatible with @char[n]@?
+\end{itemize}
+I chose to achieve the necessary rejection of the \CFA case
+by adding a rejection of the corresponding C case.
+This decision is not backward compatible.
+There are two complementary mitigations for this incompatibility.
+First, a simple recourse is available to a programmer who intends to proceed
+with the statically unsound assignment.
+This situation might arise if @n@ were known to be 999,
+rather than 42, as in the introductory examples.
+The programmer can add a cast, as in:
+\begin{cfa}
+        xp = ( float (*)[999] ) & x;
+\end{cfa}
+This addition causes \CFA to accept, beacause now, the programmer has accepted blame.
+This addition is benign in plain C, because the cast is valid, just unnecessary there.
+Moreover, the addition can even be seen as appropriate ``eye candy,''
+marking where the unchecked length knowledge is used.
+Therefore, a program being onboarded to \CFA can receive a simple upgrade,
+to satisfy the \CFA rules (and arguably become clearer),
+without giving up its validity to a plain C compiler.
+Second, the incompatibility only affects types like pointer-to-array,
+which are are infrequently used in C.
+The more common C idiom for aliasing an array is to use the pointer-to-first-element type,
+which does not participate in the \CFA array's length checking.
+\footnote{Notably, the desugaring of the \lstinline{array@} type,
+        avoids letting any \lstinline{-[-]} type decay,
+        in order to preserve the length information that powers runtime bound checking.}
+Therefore, the frequency of needing to upgrade wild C code (as discussed in the first mitigation)
+is anticipated to be low.
+Because the incompatibility represents a low cost to a \CFA onboarding effort
+(with a plausible side benefit of linting the original code for a missing annotation),
+I elected not to add special measures to retain the compatibility.
+It would be possible to flag occurrences of @-[-]@ types that come from @array@ desugaring,
+treating those with stricter \CFA rules, while treating others with classic C rules.
+If future lessons from C project onboarding warrant it,
+this special compatibility measure can be added.
+Having allowed that both the initial C example's check
+\begin{itemize}[noitemsep,partopsep=-\parskip,parsep=0pt,leftmargin=4em]
+        \item
+                Is @float[999]@ type-compatible with @float[n]@?
+\end{itemize}
+and the second \CFA exmple's induced check
+\begin{itemize}[noitemsep,partopsep=-\parskip,parsep=0pt,leftmargin=4em]
+        \item
+                Is @char[999]@ type-compatible with @char[n]@?
+\end{itemize}
+shall have the same answer, (``no''),
+discussion turns to how I got the \CFA compiler to produce this answer.
+In its preexisting form, it produced a (buggy) approximation of the C rules.
+To implement the new \CFA rules, I took the syntactic recursion a step further, obtaining,
+in both cases:
+\begin{itemize}[noitemsep,partopsep=-\parskip,parsep=0pt,leftmargin=4em]
+        \item
+                Is @999@ TBD-compatible with @n@?
+\end{itemize}
+This compatibility question applies to a pair of expressions, where the earlier ones were to types.
+Such an expression-compatibility question is a new addition to the \CFA compiler.
+These questions only arise in the context of dimension expressions on (C) array types.
+TODO: ensure these compiler implementation matters are treated under \CFA compiler background:
+type unification,
+cost calculation,
+GenPoly.
+The relevant technical component of the \CFA compiler is,
+within the type resolver, the type unification procedure.
+I added rules for continuing this unification into expressions that occur within types.
+It is still fundamentally doing \emph{type} unification
+because it is participating in binding type variables,
+and not participating in binding any variables that stand in for expression fragments
+(for there is no such sort of variable in \CFA's analysis.)
+An unfortunate fact about the \CFA compiler's preexisting implementation is that
+type unification suffers from two forms of duplication.
+The first duplication has (many of) the unification rules stated twice.
+As a result, my additions for dimension expressions are stated twice.
+The extra statement of the rules occurs in the GenPoly module,
+where concrete types like @array(int, 5)@\footnote{
+        Again, the presentation is simplified
+        by leaving the \lstinline{array} macro unexpanded}
+are lowered into corresponding C types @struct __conc_array_1234@ (the suffix being a generated index).
+In this case, the struct's definition gives fields that hardcode the argument values of @float@ and @5@.
+The next time an @array(-,-)@ concrete instance is encountered,
+is the previous @struct __conc_array_1234@ suitable for it?
+Yes, for another occurrance of @array(int, 5)@;
+no, for either @array(rational(int), 5)@ or @array(int, 42)@.
+By the last example, this phase must ``reject''
+the hypothesis that it should reuse the dimension-5 instance's C-lowering for a dimension-42 instance.
+The second duplication has unification (proper) being invoked at two stages of expression resolution.
+As a result, my added rule set needs to handle more cases than the preceding discussion motivates.
+In the program
+\begin{cfa}
+        void f( double );
+        forall( T & ) void f( T & );
+        void g( int n ) {
+                array( float, n + 1 ) x;
+                f(x);
+        }
+\end{cfa}
+when resolving the function call, the first unification stage
+compares the types @T@, of the parameter, with @array( float, n + 1 )@, of the argument.
+TODO: finish.
+The actual rules for comparing two dimension expressions are conservative.
+To answer, ``yes, consider this pair of expressions to be matching,''
+is to imply, ``all else being equal, allow an array with length calculated by $e_1$
+to be passed to a function expecting a length-$e_2$ array.''\footnote{
+        TODO: Deal with directionality, that I'm doing exact-match, no ``at least as long as,'' no subtyping.
+        Should it be an earlier scoping principle?  Feels like it should matter in more places than here.}
+So, a ``yes'' answer must represent a guarantee that both expressions will evaluate the
+same result, while a ``no'' can tolerate ``they might, but we're not sure,'
+provided that practical recourses are available
+to let programmers express their better knowledge.
+The specific rule-set that I offer with the current release is, in fact, extremely conservative.
+I chose to keep things simple,
+and allow real future needs do drive adding additional complexity,
+within the framework that I laid out.
+For starters, the original motivating example's rejection
+is not based on knowledge that
+the @xp@ length of (the literal) 999 is value-unequal to
+the (obvious) runtime value of the variable @n@, which is the @x@ length.
+Rather, the analysis assumes a variable's value can be anything,
+and so there can be no guarantee that its value is 999.
+So, a variable use and a literal can never match.
+Two occurrences of the same literal value are obviously a fine match.
+For two occurrences of the same varialbe, more information is needed.
+For example, this one is fine
+\begin{cfa}
+        void f( const int n ) {
+                float x[n];
+                float (*xp)[n] = x; // accept
+        }
+\end{cfa}
+while this one is not:
+\begin{cfa}
+        void f() {
+                int n = 42;
+                float x[n];
+                n = 999;
+                float (*xp)[n] = x; // reject
+        }
+\end{cfa}
+Furthermore, the fact that the first example sees @n@ as @const@
+is not actually a sufficent basis.
+In this example, @f@'s length expression's declaration is as @const@ as it can be,
+yet its value still changes between the two invocations:
+\begin{cfa}
+        // compile unit 1
+        void g();
+        void f( const int & const nr ) {
+                float x[nr];
+                g();
+                float (*xp)[nr] = x; // reject
+        }
+        // compile unit 2
+        static int n = 42;
+        void g() {
+                n = 99;
+        }
+        void f( const int & );
+        int main () {
+                f(n);
+                return 0;
+        }
+\end{cfa}
+The issue in this last case is,
+just because you aren't able to change something doesn't mean someone else can't.
+My rule set also respects a feature of the C tradition.
+In spite of the several limitations of the C rules
+accepting cases that produce different values, there are a few mismatches that C stops.
+C is quite precise when working with two static values:
+\begin{cfa}
+        enum { fortytwo = 42 };
+        float x[fortytwo];
+        float (*xp1)[42] = &x; // accept
+        float (*xp2)[999] = &x; // reject
+\end{cfa}
+My \CFA rules agree with C's on these cases.
+My rules classify expressions into three groups:
+\begin{description}
+\item[Statically Evaluable]
+        Expressions for which a specific value can be calculated (conservatively)
+        at compile-time.
+        A preexisting \CFA compiler module defines which expressions qualify,
+        and evaluates them.
+        Includes literals and enumeration values.
+\item[Dynamic but Stable]
+        The value of a variable declared as @const@.
+        Includes a @const@ parameter.
+\item[Potentially Unstable]
+        The catch-all category.  Notable examples include:
+        any function-call result (@float x[foo()];@),
+        the particular function-call result that is a pointer dereference (@void f(const int * n) { float x[*n]; }@), and
+        any use of a reference-typed variable.
+\end{description}
+My \CFA rules are:
+\begin{itemize}
+\item
+        Accept a Statically Evaluable pair, if both expressions have the same value.
+        Notably, this rule allows a literal to match with an enumeration value, based on the value.
+\item
+        Accept a Dynamic but Stable pair, if both expressions are written out the same, e.g. refers to same variable declaration.
+\item
+        Otherwise, reject.
+        Notably, reject all pairs from the Potentially Unstable group.
+        Notably, reject all pairs that cross groups.
+\end{itemize}
+TODO: summarize the C rules and add the case-comparison table
+TODO: Discuss Recourse
+TODO: Discuss dimension hoisting, which addresses the challenge of extra unification for cost calculation
 \section{Multidimensional Arrays}

Note: See TracChangeset for help on using the changeset viewer.

Context Navigation

Changeset 187be97

Legend:

doc/theses/mike_brooks_MMath/array.tex

Download in other formats: