Changeset 187be97


Ignore:
Timestamp:
Oct 18, 2024, 2:42:59 PM (4 weeks ago)
Author:
Michael Brooks <mlbrooks@…>
Branches:
master
Children:
3a08cb1, d3942b9
Parents:
02ea981
Message:

Thesis, array, add content on C typing changes.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/mike_brooks_MMath/array.tex

    r02ea981 r187be97  
    231231\end{figure}
    232232
     233
     234\section{Typing of C Arrays}
     235
     236Essential in giving a guarantee of accurate length is the compiler's ability
     237to reject a program that presumes to mishandle length.
     238By contrast, most discussion so far dealt with communicating length,
     239from one party who knows it, to another who is willing to work with any given length.
     240For scenarios where the concern is a mishandled length,
     241the interaction is between two parties who both claim to know (something about) it.
     242Such a scenario occurs in this pure C fragment, wich today's C compilers accept:
     243\begin{cfa}
     244        int n = @42@;
     245        float x[n];
     246        float (*xp)[@999@] = &x;
     247        (*xp)[@500@];  // in "bound"?
     248\end{cfa}
     249
     250Here, the array @x@ has length 42, while a pointer to it (@xp@) claims length 999.
     251So, while the subscript of @xp@ at position 500 is out of bound of its referent @x@,
     252the access appears in-bound of the type information available on @xp@.
     253Truly, length is being mishandled in the previous step,
     254where the type-carried length information on @x@ is not compatible with that of @xp@.
     255
     256The \CFA new-array rejects the analogous case:
     257\begin{cfa}
     258        int n = @42@;
     259        array(float, n) x;
     260        array(float, 999) * xp = x; // static rejection here
     261        (*xp)[@500@]; // runtime check vs len 999
     262\end{cfa}
     263
     264% TODO: kill the vertical whitespace around these lists
     265% nothing from https://stackoverflow.com/questions/1061112/eliminate-space-before-beginitemize is working
     266
     267The way the \CFA array is implemented,
     268the type analysis of this \CFA case reduces to a case similar to the earlier C version.
     269The \CFA compiler's compatibility analysis proceeds as:
     270\begin{itemize}[noitemsep,partopsep=-\parskip,parsep=0pt,leftmargin=4em]
     271\item
     272        Is @array(float, 999)@ type-compatible with @array(float, n)@?
     273\item
     274        Is @arrayX(float, char[999])@ type-compatible with @arrayX(float, char[n])@?
     275        \footnote{Here, \lstinline{arrayX} represents the type that results
     276                from desugaring the \lstinline{array} type
     277                into a type whose generic parameters are all types.
     278                This presentation elides the noisy fact that
     279                \lstinline{array} is actually a macro for something bigger;
     280                the reduction to \lstinline{char[-]} still proceeds as sketched.}
     281\item
     282        Is @char[999]@ type-compatible with @char[n]@?
     283\end{itemize}
     284
     285I chose to achieve the necessary rejection of the \CFA case
     286by adding a rejection of the corresponding C case.
     287
     288This decision is not backward compatible.
     289There are two complementary mitigations for this incompatibility.
     290
     291First, a simple recourse is available to a programmer who intends to proceed
     292with the statically unsound assignment.
     293This situation might arise if @n@ were known to be 999,
     294rather than 42, as in the introductory examples.
     295The programmer can add a cast, as in:
     296\begin{cfa}
     297        xp = ( float (*)[999] ) & x;
     298\end{cfa}
     299This addition causes \CFA to accept, beacause now, the programmer has accepted blame.
     300This addition is benign in plain C, because the cast is valid, just unnecessary there.
     301Moreover, the addition can even be seen as appropriate ``eye candy,''
     302marking where the unchecked length knowledge is used.
     303Therefore, a program being onboarded to \CFA can receive a simple upgrade,
     304to satisfy the \CFA rules (and arguably become clearer),
     305without giving up its validity to a plain C compiler.
     306
     307Second, the incompatibility only affects types like pointer-to-array,
     308which are are infrequently used in C.
     309The more common C idiom for aliasing an array is to use the pointer-to-first-element type,
     310which does not participate in the \CFA array's length checking.
     311\footnote{Notably, the desugaring of the \lstinline{array@} type,
     312        avoids letting any \lstinline{-[-]} type decay,
     313        in order to preserve the length information that powers runtime bound checking.}
     314Therefore, the frequency of needing to upgrade wild C code (as discussed in the first mitigation)
     315is anticipated to be low.
     316
     317Because the incompatibility represents a low cost to a \CFA onboarding effort
     318(with a plausible side benefit of linting the original code for a missing annotation),
     319I elected not to add special measures to retain the compatibility.
     320It would be possible to flag occurrences of @-[-]@ types that come from @array@ desugaring,
     321treating those with stricter \CFA rules, while treating others with classic C rules.
     322If future lessons from C project onboarding warrant it,
     323this special compatibility measure can be added.
     324
     325Having allowed that both the initial C example's check
     326\begin{itemize}[noitemsep,partopsep=-\parskip,parsep=0pt,leftmargin=4em]
     327        \item
     328                Is @float[999]@ type-compatible with @float[n]@?
     329\end{itemize}
     330and the second \CFA exmple's induced check
     331\begin{itemize}[noitemsep,partopsep=-\parskip,parsep=0pt,leftmargin=4em]
     332        \item
     333                Is @char[999]@ type-compatible with @char[n]@?
     334\end{itemize}
     335shall have the same answer, (``no''),
     336discussion turns to how I got the \CFA compiler to produce this answer.
     337In its preexisting form, it produced a (buggy) approximation of the C rules.
     338To implement the new \CFA rules, I took the syntactic recursion a step further, obtaining,
     339in both cases:
     340\begin{itemize}[noitemsep,partopsep=-\parskip,parsep=0pt,leftmargin=4em]
     341        \item
     342                Is @999@ TBD-compatible with @n@?
     343\end{itemize}
     344This compatibility question applies to a pair of expressions, where the earlier ones were to types.
     345Such an expression-compatibility question is a new addition to the \CFA compiler.
     346These questions only arise in the context of dimension expressions on (C) array types.
     347
     348TODO: ensure these compiler implementation matters are treated under \CFA compiler background:
     349type unification,
     350cost calculation,
     351GenPoly.
     352
     353The relevant technical component of the \CFA compiler is,
     354within the type resolver, the type unification procedure.
     355I added rules for continuing this unification into expressions that occur within types.
     356It is still fundamentally doing \emph{type} unification
     357because it is participating in binding type variables,
     358and not participating in binding any variables that stand in for expression fragments
     359(for there is no such sort of variable in \CFA's analysis.)
     360
     361An unfortunate fact about the \CFA compiler's preexisting implementation is that
     362type unification suffers from two forms of duplication.
     363
     364The first duplication has (many of) the unification rules stated twice.
     365As a result, my additions for dimension expressions are stated twice.
     366The extra statement of the rules occurs in the GenPoly module,
     367where concrete types like @array(int, 5)@\footnote{
     368        Again, the presentation is simplified
     369        by leaving the \lstinline{array} macro unexpanded}
     370are lowered into corresponding C types @struct __conc_array_1234@ (the suffix being a generated index).
     371In this case, the struct's definition gives fields that hardcode the argument values of @float@ and @5@.
     372The next time an @array(-,-)@ concrete instance is encountered,
     373is the previous @struct __conc_array_1234@ suitable for it?
     374Yes, for another occurrance of @array(int, 5)@;
     375no, for either @array(rational(int), 5)@ or @array(int, 42)@.
     376By the last example, this phase must ``reject''
     377the hypothesis that it should reuse the dimension-5 instance's C-lowering for a dimension-42 instance.
     378
     379The second duplication has unification (proper) being invoked at two stages of expression resolution.
     380As a result, my added rule set needs to handle more cases than the preceding discussion motivates.
     381In the program
     382\begin{cfa}
     383        void f( double );
     384        forall( T & ) void f( T & );
     385        void g( int n ) {
     386                array( float, n + 1 ) x;
     387                f(x);
     388        }
     389\end{cfa}
     390when resolving the function call, the first unification stage
     391compares the types @T@, of the parameter, with @array( float, n + 1 )@, of the argument.
     392TODO: finish.
     393
     394The actual rules for comparing two dimension expressions are conservative.
     395To answer, ``yes, consider this pair of expressions to be matching,''
     396is to imply, ``all else being equal, allow an array with length calculated by $e_1$
     397to be passed to a function expecting a length-$e_2$ array.''\footnote{
     398        TODO: Deal with directionality, that I'm doing exact-match, no ``at least as long as,'' no subtyping.
     399        Should it be an earlier scoping principle?  Feels like it should matter in more places than here.}
     400So, a ``yes'' answer must represent a guarantee that both expressions will evaluate the
     401same result, while a ``no'' can tolerate ``they might, but we're not sure,'
     402provided that practical recourses are available
     403to let programmers express their better knowledge.
     404The specific rule-set that I offer with the current release is, in fact, extremely conservative.
     405I chose to keep things simple,
     406and allow real future needs do drive adding additional complexity,
     407within the framework that I laid out.
     408
     409For starters, the original motivating example's rejection
     410is not based on knowledge that
     411the @xp@ length of (the literal) 999 is value-unequal to
     412the (obvious) runtime value of the variable @n@, which is the @x@ length.
     413Rather, the analysis assumes a variable's value can be anything,
     414and so there can be no guarantee that its value is 999.
     415So, a variable use and a literal can never match.
     416
     417Two occurrences of the same literal value are obviously a fine match.
     418For two occurrences of the same varialbe, more information is needed.
     419For example, this one is fine
     420\begin{cfa}
     421        void f( const int n ) {
     422                float x[n];
     423                float (*xp)[n] = x; // accept
     424        }
     425\end{cfa}
     426while this one is not:
     427\begin{cfa}
     428        void f() {
     429                int n = 42;
     430                float x[n];
     431                n = 999;
     432                float (*xp)[n] = x; // reject
     433        }
     434\end{cfa}
     435Furthermore, the fact that the first example sees @n@ as @const@
     436is not actually a sufficent basis.
     437In this example, @f@'s length expression's declaration is as @const@ as it can be,
     438yet its value still changes between the two invocations:
     439\begin{cfa}
     440        // compile unit 1
     441        void g();
     442        void f( const int & const nr ) {
     443                float x[nr];
     444                g();
     445                float (*xp)[nr] = x; // reject
     446        }
     447        // compile unit 2
     448        static int n = 42;
     449        void g() {
     450                n = 99;
     451        }
     452        void f( const int & );
     453        int main () {
     454                f(n);
     455                return 0;
     456        }
     457\end{cfa}
     458The issue in this last case is,
     459just because you aren't able to change something doesn't mean someone else can't.
     460
     461My rule set also respects a feature of the C tradition.
     462In spite of the several limitations of the C rules
     463accepting cases that produce different values, there are a few mismatches that C stops.
     464C is quite precise when working with two static values:
     465\begin{cfa}
     466        enum { fortytwo = 42 };
     467        float x[fortytwo];
     468        float (*xp1)[42] = &x; // accept
     469        float (*xp2)[999] = &x; // reject
     470\end{cfa}
     471My \CFA rules agree with C's on these cases.
     472
     473My rules classify expressions into three groups:
     474\begin{description}
     475\item[Statically Evaluable]
     476        Expressions for which a specific value can be calculated (conservatively)
     477        at compile-time.
     478        A preexisting \CFA compiler module defines which expressions qualify,
     479        and evaluates them.
     480        Includes literals and enumeration values.
     481\item[Dynamic but Stable]
     482        The value of a variable declared as @const@.
     483        Includes a @const@ parameter.
     484\item[Potentially Unstable]
     485        The catch-all category.  Notable examples include:
     486        any function-call result (@float x[foo()];@),
     487        the particular function-call result that is a pointer dereference (@void f(const int * n) { float x[*n]; }@), and
     488        any use of a reference-typed variable.
     489\end{description}
     490
     491My \CFA rules are:
     492\begin{itemize}
     493\item
     494        Accept a Statically Evaluable pair, if both expressions have the same value.
     495        Notably, this rule allows a literal to match with an enumeration value, based on the value.
     496\item
     497        Accept a Dynamic but Stable pair, if both expressions are written out the same, e.g. refers to same variable declaration.
     498\item
     499        Otherwise, reject.
     500        Notably, reject all pairs from the Potentially Unstable group.
     501        Notably, reject all pairs that cross groups.
     502\end{itemize}
     503
     504TODO: summarize the C rules and add the case-comparison table
     505
     506TODO: Discuss Recourse
     507
     508TODO: Discuss dimension hoisting, which addresses the challenge of extra unification for cost calculation
    233509
    234510\section{Multidimensional Arrays}
Note: See TracChangeset for help on using the changeset viewer.