Changeset 187be97
- Timestamp:
- Oct 18, 2024, 2:42:59 PM (4 weeks ago)
- Branches:
- master
- Children:
- 3a08cb1, d3942b9
- Parents:
- 02ea981
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/theses/mike_brooks_MMath/array.tex
r02ea981 r187be97 231 231 \end{figure} 232 232 233 234 \section{Typing of C Arrays} 235 236 Essential in giving a guarantee of accurate length is the compiler's ability 237 to reject a program that presumes to mishandle length. 238 By contrast, most discussion so far dealt with communicating length, 239 from one party who knows it, to another who is willing to work with any given length. 240 For scenarios where the concern is a mishandled length, 241 the interaction is between two parties who both claim to know (something about) it. 242 Such a scenario occurs in this pure C fragment, wich today's C compilers accept: 243 \begin{cfa} 244 int n = @42@; 245 float x[n]; 246 float (*xp)[@999@] = &x; 247 (*xp)[@500@]; // in "bound"? 248 \end{cfa} 249 250 Here, the array @x@ has length 42, while a pointer to it (@xp@) claims length 999. 251 So, while the subscript of @xp@ at position 500 is out of bound of its referent @x@, 252 the access appears in-bound of the type information available on @xp@. 253 Truly, length is being mishandled in the previous step, 254 where the type-carried length information on @x@ is not compatible with that of @xp@. 255 256 The \CFA new-array rejects the analogous case: 257 \begin{cfa} 258 int n = @42@; 259 array(float, n) x; 260 array(float, 999) * xp = x; // static rejection here 261 (*xp)[@500@]; // runtime check vs len 999 262 \end{cfa} 263 264 % TODO: kill the vertical whitespace around these lists 265 % nothing from https://stackoverflow.com/questions/1061112/eliminate-space-before-beginitemize is working 266 267 The way the \CFA array is implemented, 268 the type analysis of this \CFA case reduces to a case similar to the earlier C version. 269 The \CFA compiler's compatibility analysis proceeds as: 270 \begin{itemize}[noitemsep,partopsep=-\parskip,parsep=0pt,leftmargin=4em] 271 \item 272 Is @array(float, 999)@ type-compatible with @array(float, n)@? 273 \item 274 Is @arrayX(float, char[999])@ type-compatible with @arrayX(float, char[n])@? 275 \footnote{Here, \lstinline{arrayX} represents the type that results 276 from desugaring the \lstinline{array} type 277 into a type whose generic parameters are all types. 278 This presentation elides the noisy fact that 279 \lstinline{array} is actually a macro for something bigger; 280 the reduction to \lstinline{char[-]} still proceeds as sketched.} 281 \item 282 Is @char[999]@ type-compatible with @char[n]@? 283 \end{itemize} 284 285 I chose to achieve the necessary rejection of the \CFA case 286 by adding a rejection of the corresponding C case. 287 288 This decision is not backward compatible. 289 There are two complementary mitigations for this incompatibility. 290 291 First, a simple recourse is available to a programmer who intends to proceed 292 with the statically unsound assignment. 293 This situation might arise if @n@ were known to be 999, 294 rather than 42, as in the introductory examples. 295 The programmer can add a cast, as in: 296 \begin{cfa} 297 xp = ( float (*)[999] ) & x; 298 \end{cfa} 299 This addition causes \CFA to accept, beacause now, the programmer has accepted blame. 300 This addition is benign in plain C, because the cast is valid, just unnecessary there. 301 Moreover, the addition can even be seen as appropriate ``eye candy,'' 302 marking where the unchecked length knowledge is used. 303 Therefore, a program being onboarded to \CFA can receive a simple upgrade, 304 to satisfy the \CFA rules (and arguably become clearer), 305 without giving up its validity to a plain C compiler. 306 307 Second, the incompatibility only affects types like pointer-to-array, 308 which are are infrequently used in C. 309 The more common C idiom for aliasing an array is to use the pointer-to-first-element type, 310 which does not participate in the \CFA array's length checking. 311 \footnote{Notably, the desugaring of the \lstinline{array@} type, 312 avoids letting any \lstinline{-[-]} type decay, 313 in order to preserve the length information that powers runtime bound checking.} 314 Therefore, the frequency of needing to upgrade wild C code (as discussed in the first mitigation) 315 is anticipated to be low. 316 317 Because the incompatibility represents a low cost to a \CFA onboarding effort 318 (with a plausible side benefit of linting the original code for a missing annotation), 319 I elected not to add special measures to retain the compatibility. 320 It would be possible to flag occurrences of @-[-]@ types that come from @array@ desugaring, 321 treating those with stricter \CFA rules, while treating others with classic C rules. 322 If future lessons from C project onboarding warrant it, 323 this special compatibility measure can be added. 324 325 Having allowed that both the initial C example's check 326 \begin{itemize}[noitemsep,partopsep=-\parskip,parsep=0pt,leftmargin=4em] 327 \item 328 Is @float[999]@ type-compatible with @float[n]@? 329 \end{itemize} 330 and the second \CFA exmple's induced check 331 \begin{itemize}[noitemsep,partopsep=-\parskip,parsep=0pt,leftmargin=4em] 332 \item 333 Is @char[999]@ type-compatible with @char[n]@? 334 \end{itemize} 335 shall have the same answer, (``no''), 336 discussion turns to how I got the \CFA compiler to produce this answer. 337 In its preexisting form, it produced a (buggy) approximation of the C rules. 338 To implement the new \CFA rules, I took the syntactic recursion a step further, obtaining, 339 in both cases: 340 \begin{itemize}[noitemsep,partopsep=-\parskip,parsep=0pt,leftmargin=4em] 341 \item 342 Is @999@ TBD-compatible with @n@? 343 \end{itemize} 344 This compatibility question applies to a pair of expressions, where the earlier ones were to types. 345 Such an expression-compatibility question is a new addition to the \CFA compiler. 346 These questions only arise in the context of dimension expressions on (C) array types. 347 348 TODO: ensure these compiler implementation matters are treated under \CFA compiler background: 349 type unification, 350 cost calculation, 351 GenPoly. 352 353 The relevant technical component of the \CFA compiler is, 354 within the type resolver, the type unification procedure. 355 I added rules for continuing this unification into expressions that occur within types. 356 It is still fundamentally doing \emph{type} unification 357 because it is participating in binding type variables, 358 and not participating in binding any variables that stand in for expression fragments 359 (for there is no such sort of variable in \CFA's analysis.) 360 361 An unfortunate fact about the \CFA compiler's preexisting implementation is that 362 type unification suffers from two forms of duplication. 363 364 The first duplication has (many of) the unification rules stated twice. 365 As a result, my additions for dimension expressions are stated twice. 366 The extra statement of the rules occurs in the GenPoly module, 367 where concrete types like @array(int, 5)@\footnote{ 368 Again, the presentation is simplified 369 by leaving the \lstinline{array} macro unexpanded} 370 are lowered into corresponding C types @struct __conc_array_1234@ (the suffix being a generated index). 371 In this case, the struct's definition gives fields that hardcode the argument values of @float@ and @5@. 372 The next time an @array(-,-)@ concrete instance is encountered, 373 is the previous @struct __conc_array_1234@ suitable for it? 374 Yes, for another occurrance of @array(int, 5)@; 375 no, for either @array(rational(int), 5)@ or @array(int, 42)@. 376 By the last example, this phase must ``reject'' 377 the hypothesis that it should reuse the dimension-5 instance's C-lowering for a dimension-42 instance. 378 379 The second duplication has unification (proper) being invoked at two stages of expression resolution. 380 As a result, my added rule set needs to handle more cases than the preceding discussion motivates. 381 In the program 382 \begin{cfa} 383 void f( double ); 384 forall( T & ) void f( T & ); 385 void g( int n ) { 386 array( float, n + 1 ) x; 387 f(x); 388 } 389 \end{cfa} 390 when resolving the function call, the first unification stage 391 compares the types @T@, of the parameter, with @array( float, n + 1 )@, of the argument. 392 TODO: finish. 393 394 The actual rules for comparing two dimension expressions are conservative. 395 To answer, ``yes, consider this pair of expressions to be matching,'' 396 is to imply, ``all else being equal, allow an array with length calculated by $e_1$ 397 to be passed to a function expecting a length-$e_2$ array.''\footnote{ 398 TODO: Deal with directionality, that I'm doing exact-match, no ``at least as long as,'' no subtyping. 399 Should it be an earlier scoping principle? Feels like it should matter in more places than here.} 400 So, a ``yes'' answer must represent a guarantee that both expressions will evaluate the 401 same result, while a ``no'' can tolerate ``they might, but we're not sure,' 402 provided that practical recourses are available 403 to let programmers express their better knowledge. 404 The specific rule-set that I offer with the current release is, in fact, extremely conservative. 405 I chose to keep things simple, 406 and allow real future needs do drive adding additional complexity, 407 within the framework that I laid out. 408 409 For starters, the original motivating example's rejection 410 is not based on knowledge that 411 the @xp@ length of (the literal) 999 is value-unequal to 412 the (obvious) runtime value of the variable @n@, which is the @x@ length. 413 Rather, the analysis assumes a variable's value can be anything, 414 and so there can be no guarantee that its value is 999. 415 So, a variable use and a literal can never match. 416 417 Two occurrences of the same literal value are obviously a fine match. 418 For two occurrences of the same varialbe, more information is needed. 419 For example, this one is fine 420 \begin{cfa} 421 void f( const int n ) { 422 float x[n]; 423 float (*xp)[n] = x; // accept 424 } 425 \end{cfa} 426 while this one is not: 427 \begin{cfa} 428 void f() { 429 int n = 42; 430 float x[n]; 431 n = 999; 432 float (*xp)[n] = x; // reject 433 } 434 \end{cfa} 435 Furthermore, the fact that the first example sees @n@ as @const@ 436 is not actually a sufficent basis. 437 In this example, @f@'s length expression's declaration is as @const@ as it can be, 438 yet its value still changes between the two invocations: 439 \begin{cfa} 440 // compile unit 1 441 void g(); 442 void f( const int & const nr ) { 443 float x[nr]; 444 g(); 445 float (*xp)[nr] = x; // reject 446 } 447 // compile unit 2 448 static int n = 42; 449 void g() { 450 n = 99; 451 } 452 void f( const int & ); 453 int main () { 454 f(n); 455 return 0; 456 } 457 \end{cfa} 458 The issue in this last case is, 459 just because you aren't able to change something doesn't mean someone else can't. 460 461 My rule set also respects a feature of the C tradition. 462 In spite of the several limitations of the C rules 463 accepting cases that produce different values, there are a few mismatches that C stops. 464 C is quite precise when working with two static values: 465 \begin{cfa} 466 enum { fortytwo = 42 }; 467 float x[fortytwo]; 468 float (*xp1)[42] = &x; // accept 469 float (*xp2)[999] = &x; // reject 470 \end{cfa} 471 My \CFA rules agree with C's on these cases. 472 473 My rules classify expressions into three groups: 474 \begin{description} 475 \item[Statically Evaluable] 476 Expressions for which a specific value can be calculated (conservatively) 477 at compile-time. 478 A preexisting \CFA compiler module defines which expressions qualify, 479 and evaluates them. 480 Includes literals and enumeration values. 481 \item[Dynamic but Stable] 482 The value of a variable declared as @const@. 483 Includes a @const@ parameter. 484 \item[Potentially Unstable] 485 The catch-all category. Notable examples include: 486 any function-call result (@float x[foo()];@), 487 the particular function-call result that is a pointer dereference (@void f(const int * n) { float x[*n]; }@), and 488 any use of a reference-typed variable. 489 \end{description} 490 491 My \CFA rules are: 492 \begin{itemize} 493 \item 494 Accept a Statically Evaluable pair, if both expressions have the same value. 495 Notably, this rule allows a literal to match with an enumeration value, based on the value. 496 \item 497 Accept a Dynamic but Stable pair, if both expressions are written out the same, e.g. refers to same variable declaration. 498 \item 499 Otherwise, reject. 500 Notably, reject all pairs from the Potentially Unstable group. 501 Notably, reject all pairs that cross groups. 502 \end{itemize} 503 504 TODO: summarize the C rules and add the case-comparison table 505 506 TODO: Discuss Recourse 507 508 TODO: Discuss dimension hoisting, which addresses the challenge of extra unification for cost calculation 233 509 234 510 \section{Multidimensional Arrays}
Note: See TracChangeset
for help on using the changeset viewer.