Changes in / [3a08cb1:e8b5ba4]


Ignore:
File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/mike_brooks_MMath/array.tex

    r3a08cb1 re8b5ba4  
    231231\end{figure}
    232232
    233 
    234 \section{Typing of C Arrays}
    235 
    236 Essential in giving a guarantee of accurate length is the compiler's ability
    237 to reject a program that presumes to mishandle length.
    238 By contrast, most discussion so far dealt with communicating length,
    239 from one party who knows it, to another who is willing to work with any given length.
    240 For scenarios where the concern is a mishandled length,
    241 the interaction is between two parties who both claim to know (something about) it.
    242 Such a scenario occurs in this pure C fragment, wich today's C compilers accept:
    243 \begin{cfa}
    244         int n = @42@;
    245         float x[n];
    246         float (*xp)[@999@] = &x;
    247         (*xp)[@500@];  // in "bound"?
    248 \end{cfa}
    249 
    250 Here, the array @x@ has length 42, while a pointer to it (@xp@) claims length 999.
    251 So, while the subscript of @xp@ at position 500 is out of bound of its referent @x@,
    252 the access appears in-bound of the type information available on @xp@.
    253 Truly, length is being mishandled in the previous step,
    254 where the type-carried length information on @x@ is not compatible with that of @xp@.
    255 
    256 The \CFA new-array rejects the analogous case:
    257 \begin{cfa}
    258         int n = @42@;
    259         array(float, n) x;
    260         array(float, 999) * xp = x; // static rejection here
    261         (*xp)[@500@]; // runtime check vs len 999
    262 \end{cfa}
    263 
    264 % TODO: kill the vertical whitespace around these lists
    265 % nothing from https://stackoverflow.com/questions/1061112/eliminate-space-before-beginitemize is working
    266 
    267 The way the \CFA array is implemented,
    268 the type analysis of this \CFA case reduces to a case similar to the earlier C version.
    269 The \CFA compiler's compatibility analysis proceeds as:
    270 \begin{itemize}[noitemsep,partopsep=-\parskip,parsep=0pt,leftmargin=4em]
    271 \item
    272         Is @array(float, 999)@ type-compatible with @array(float, n)@?
    273 \item
    274         Is @arrayX(float, char[999])@ type-compatible with @arrayX(float, char[n])@?
    275         \footnote{Here, \lstinline{arrayX} represents the type that results
    276                 from desugaring the \lstinline{array} type
    277                 into a type whose generic parameters are all types.
    278                 This presentation elides the noisy fact that
    279                 \lstinline{array} is actually a macro for something bigger;
    280                 the reduction to \lstinline{char[-]} still proceeds as sketched.}
    281 \item
    282         Is @char[999]@ type-compatible with @char[n]@?
    283 \end{itemize}
    284 
    285 I chose to achieve the necessary rejection of the \CFA case
    286 by adding a rejection of the corresponding C case.
    287 
    288 This decision is not backward compatible.
    289 There are two complementary mitigations for this incompatibility.
    290 
    291 First, a simple recourse is available to a programmer who intends to proceed
    292 with the statically unsound assignment.
    293 This situation might arise if @n@ were known to be 999,
    294 rather than 42, as in the introductory examples.
    295 The programmer can add a cast, as in:
    296 \begin{cfa}
    297         xp = ( float (*)[999] ) & x;
    298 \end{cfa}
    299 This addition causes \CFA to accept, beacause now, the programmer has accepted blame.
    300 This addition is benign in plain C, because the cast is valid, just unnecessary there.
    301 Moreover, the addition can even be seen as appropriate ``eye candy,''
    302 marking where the unchecked length knowledge is used.
    303 Therefore, a program being onboarded to \CFA can receive a simple upgrade,
    304 to satisfy the \CFA rules (and arguably become clearer),
    305 without giving up its validity to a plain C compiler.
    306 
    307 Second, the incompatibility only affects types like pointer-to-array,
    308 which are are infrequently used in C.
    309 The more common C idiom for aliasing an array is to use the pointer-to-first-element type,
    310 which does not participate in the \CFA array's length checking.
    311 \footnote{Notably, the desugaring of the \lstinline{array@} type,
    312         avoids letting any \lstinline{-[-]} type decay,
    313         in order to preserve the length information that powers runtime bound checking.}
    314 Therefore, the frequency of needing to upgrade wild C code (as discussed in the first mitigation)
    315 is anticipated to be low.
    316 
    317 Because the incompatibility represents a low cost to a \CFA onboarding effort
    318 (with a plausible side benefit of linting the original code for a missing annotation),
    319 I elected not to add special measures to retain the compatibility.
    320 It would be possible to flag occurrences of @-[-]@ types that come from @array@ desugaring,
    321 treating those with stricter \CFA rules, while treating others with classic C rules.
    322 If future lessons from C project onboarding warrant it,
    323 this special compatibility measure can be added.
    324 
    325 Having allowed that both the initial C example's check
    326 \begin{itemize}[noitemsep,partopsep=-\parskip,parsep=0pt,leftmargin=4em]
    327         \item
    328                 Is @float[999]@ type-compatible with @float[n]@?
    329 \end{itemize}
    330 and the second \CFA exmple's induced check
    331 \begin{itemize}[noitemsep,partopsep=-\parskip,parsep=0pt,leftmargin=4em]
    332         \item
    333                 Is @char[999]@ type-compatible with @char[n]@?
    334 \end{itemize}
    335 shall have the same answer, (``no''),
    336 discussion turns to how I got the \CFA compiler to produce this answer.
    337 In its preexisting form, it produced a (buggy) approximation of the C rules.
    338 To implement the new \CFA rules, I took the syntactic recursion a step further, obtaining,
    339 in both cases:
    340 \begin{itemize}[noitemsep,partopsep=-\parskip,parsep=0pt,leftmargin=4em]
    341         \item
    342                 Is @999@ TBD-compatible with @n@?
    343 \end{itemize}
    344 This compatibility question applies to a pair of expressions, where the earlier ones were to types.
    345 Such an expression-compatibility question is a new addition to the \CFA compiler.
    346 These questions only arise in the context of dimension expressions on (C) array types.
    347 
    348 TODO: ensure these compiler implementation matters are treated under \CFA compiler background:
    349 type unification,
    350 cost calculation,
    351 GenPoly.
    352 
    353 The relevant technical component of the \CFA compiler is,
    354 within the type resolver, the type unification procedure.
    355 I added rules for continuing this unification into expressions that occur within types.
    356 It is still fundamentally doing \emph{type} unification
    357 because it is participating in binding type variables,
    358 and not participating in binding any variables that stand in for expression fragments
    359 (for there is no such sort of variable in \CFA's analysis.)
    360 
    361 An unfortunate fact about the \CFA compiler's preexisting implementation is that
    362 type unification suffers from two forms of duplication.
    363 
    364 The first duplication has (many of) the unification rules stated twice.
    365 As a result, my additions for dimension expressions are stated twice.
    366 The extra statement of the rules occurs in the GenPoly module,
    367 where concrete types like @array(int, 5)@\footnote{
    368         Again, the presentation is simplified
    369         by leaving the \lstinline{array} macro unexpanded}
    370 are lowered into corresponding C types @struct __conc_array_1234@ (the suffix being a generated index).
    371 In this case, the struct's definition gives fields that hardcode the argument values of @float@ and @5@.
    372 The next time an @array(-,-)@ concrete instance is encountered,
    373 is the previous @struct __conc_array_1234@ suitable for it?
    374 Yes, for another occurrance of @array(int, 5)@;
    375 no, for either @array(rational(int), 5)@ or @array(int, 42)@.
    376 By the last example, this phase must ``reject''
    377 the hypothesis that it should reuse the dimension-5 instance's C-lowering for a dimension-42 instance.
    378 
    379 The second duplication has unification (proper) being invoked at two stages of expression resolution.
    380 As a result, my added rule set needs to handle more cases than the preceding discussion motivates.
    381 In the program
    382 \begin{cfa}
    383         void f( double );
    384         forall( T & ) void f( T & );
    385         void g( int n ) {
    386                 array( float, n + 1 ) x;
    387                 f(x);
    388         }
    389 \end{cfa}
    390 when resolving the function call, the first unification stage
    391 compares the types @T@, of the parameter, with @array( float, n + 1 )@, of the argument.
    392 TODO: finish.
    393 
    394 The actual rules for comparing two dimension expressions are conservative.
    395 To answer, ``yes, consider this pair of expressions to be matching,''
    396 is to imply, ``all else being equal, allow an array with length calculated by $e_1$
    397 to be passed to a function expecting a length-$e_2$ array.''\footnote{
    398         TODO: Deal with directionality, that I'm doing exact-match, no ``at least as long as,'' no subtyping.
    399         Should it be an earlier scoping principle?  Feels like it should matter in more places than here.}
    400 So, a ``yes'' answer must represent a guarantee that both expressions will evaluate the
    401 same result, while a ``no'' can tolerate ``they might, but we're not sure,'
    402 provided that practical recourses are available
    403 to let programmers express their better knowledge.
    404 The specific rule-set that I offer with the current release is, in fact, extremely conservative.
    405 I chose to keep things simple,
    406 and allow real future needs do drive adding additional complexity,
    407 within the framework that I laid out.
    408 
    409 For starters, the original motivating example's rejection
    410 is not based on knowledge that
    411 the @xp@ length of (the literal) 999 is value-unequal to
    412 the (obvious) runtime value of the variable @n@, which is the @x@ length.
    413 Rather, the analysis assumes a variable's value can be anything,
    414 and so there can be no guarantee that its value is 999.
    415 So, a variable use and a literal can never match.
    416 
    417 Two occurrences of the same literal value are obviously a fine match.
    418 For two occurrences of the same varialbe, more information is needed.
    419 For example, this one is fine
    420 \begin{cfa}
    421         void f( const int n ) {
    422                 float x[n];
    423                 float (*xp)[n] = x; // accept
    424         }
    425 \end{cfa}
    426 while this one is not:
    427 \begin{cfa}
    428         void f() {
    429                 int n = 42;
    430                 float x[n];
    431                 n = 999;
    432                 float (*xp)[n] = x; // reject
    433         }
    434 \end{cfa}
    435 Furthermore, the fact that the first example sees @n@ as @const@
    436 is not actually a sufficent basis.
    437 In this example, @f@'s length expression's declaration is as @const@ as it can be,
    438 yet its value still changes between the two invocations:
    439 \begin{cfa}
    440         // compile unit 1
    441         void g();
    442         void f( const int & const nr ) {
    443                 float x[nr];
    444                 g();
    445                 float (*xp)[nr] = x; // reject
    446         }
    447         // compile unit 2
    448         static int n = 42;
    449         void g() {
    450                 n = 99;
    451         }
    452         void f( const int & );
    453         int main () {
    454                 f(n);
    455                 return 0;
    456         }
    457 \end{cfa}
    458 The issue in this last case is,
    459 just because you aren't able to change something doesn't mean someone else can't.
    460 
    461 My rule set also respects a feature of the C tradition.
    462 In spite of the several limitations of the C rules
    463 accepting cases that produce different values, there are a few mismatches that C stops.
    464 C is quite precise when working with two static values:
    465 \begin{cfa}
    466         enum { fortytwo = 42 };
    467         float x[fortytwo];
    468         float (*xp1)[42] = &x; // accept
    469         float (*xp2)[999] = &x; // reject
    470 \end{cfa}
    471 My \CFA rules agree with C's on these cases.
    472 
    473 My rules classify expressions into three groups:
    474 \begin{description}
    475 \item[Statically Evaluable]
    476         Expressions for which a specific value can be calculated (conservatively)
    477         at compile-time.
    478         A preexisting \CFA compiler module defines which expressions qualify,
    479         and evaluates them.
    480         Includes literals and enumeration values.
    481 \item[Dynamic but Stable]
    482         The value of a variable declared as @const@.
    483         Includes a @const@ parameter.
    484 \item[Potentially Unstable]
    485         The catch-all category.  Notable examples include:
    486         any function-call result (@float x[foo()];@),
    487         the particular function-call result that is a pointer dereference (@void f(const int * n) { float x[*n]; }@), and
    488         any use of a reference-typed variable.
    489 \end{description}
    490 
    491 My \CFA rules are:
    492 \begin{itemize}
    493 \item
    494         Accept a Statically Evaluable pair, if both expressions have the same value.
    495         Notably, this rule allows a literal to match with an enumeration value, based on the value.
    496 \item
    497         Accept a Dynamic but Stable pair, if both expressions are written out the same, e.g. refers to same variable declaration.
    498 \item
    499         Otherwise, reject.
    500         Notably, reject all pairs from the Potentially Unstable group.
    501         Notably, reject all pairs that cross groups.
    502 \end{itemize}
    503 
    504 TODO: summarize the C rules and add the case-comparison table
    505 
    506 TODO: Discuss Recourse
    507 
    508 TODO: Discuss dimension hoisting, which addresses the challenge of extra unification for cost calculation
    509233
    510234\section{Multidimensional Arrays}
Note: See TracChangeset for help on using the changeset viewer.