Changeset 35fc819 for doc/theses/mike_brooks_MMath/array.tex
- Timestamp:
- Dec 13, 2025, 4:56:22 PM (7 days ago)
- Branches:
- master
- Children:
- 5d300ba
- Parents:
- 67748f9
- File:
-
- 1 edited
-
doc/theses/mike_brooks_MMath/array.tex (modified) (34 diffs)
Legend:
- Unmodified
- Added
- Removed
-
doc/theses/mike_brooks_MMath/array.tex
r67748f9 r35fc819 2 2 \label{c:Array} 3 3 4 Arrays in C are possibly the single most misunderstood and incorrectly used feature in the language , resulting in the largest proportion of runtime errors and security violations.4 Arrays in C are possibly the single most misunderstood and incorrectly used feature in the language \see{\VRef{s:Array}}, resulting in the largest proportion of runtime errors and security violations. 5 5 This chapter describes the new \CFA language and library features that introduce a length-checked array type, @array@, to the \CFA standard library~\cite{Cforall}. 6 6 … … 13 13 \label{s:ArrayIntro} 14 14 15 The new \CFA array is declared by instantiating the generic @array@ type, 16 much like instantiating any other standard-library generic type (such as \CC @vector@), 17 though using a new style of generic parameter. 15 The new \CFA array is declared by instantiating the generic @array@ type, much like instantiating any other standard-library generic type (such as \CC @vector@), using a new style of generic parameter. 18 16 \begin{cfa} 19 17 @array( float, 99 )@ x; $\C[2.5in]{// x contains 99 floats}$ … … 24 22 void f( @array( float, 42 )@ & p ) {} $\C{// p accepts 42 floats}$ 25 23 f( x ); $\C{// statically rejected: type lengths are different, 99 != 42}$ 24 26 25 test2.cfa:3:1 error: Invalid application of existing declaration(s) in expression. 27 26 Applying untyped: Name: f ... to: Name: x … … 37 36 g( x, 0 ); $\C{// T is float, N is 99, dynamic subscript check succeeds}$ 38 37 g( x, 1000 ); $\C{// T is float, N is 99, dynamic subscript check fails}$ 38 39 39 Cforall Runtime error: subscript 1000 exceeds dimension range [0,99) $for$ array 0x555555558020. 40 40 \end{cfa} … … 110 110 forall( T & | sized(T) ) 111 111 T * alloc() { 112 return @(T *)@malloc( @sizeof(T)@ ); 112 return @(T *)@malloc( @sizeof(T)@ ); // C malloc 113 113 } 114 114 \end{cfa} … … 132 132 The loops follow the familiar pattern of using the variable @dim@ to iterate through the arrays. 133 133 Most importantly, the type system implicitly captures @dim@ at the call of @f@ and makes it available throughout @f@ as @N@. 134 The example shows @dim@ adapting into a type-system managed length at the declarations of @x@, @y@, and @result@ , @N@ adapting in the same way at @f@'s loop bound,and a pass-thru use of @dim@ at @f@'s declaration of @ret@.134 The example shows @dim@ adapting into a type-system managed length at the declarations of @x@, @y@, and @result@; @N@ adapting in the same way at @f@'s loop bound; and a pass-thru use of @dim@ at @f@'s declaration of @ret@. 135 135 Except for the lifetime-management issue of @result@, \ie explicit @free@, this program has eliminated both the syntactic and semantic problems associated with C arrays and their usage. 136 136 The result is a significant improvement in safety and usability. … … 148 148 \end{itemize} 149 149 150 \VRef[Figure]{f:TemplateVsGenericType} shows @N@ is not the same as a @size_t@ declaration in a \CC \lstinline[language=C++]{template}. 150 \VRef[Figure]{f:TemplateVsGenericType} shows @N@ is not the same as a @size_t@ declaration in a \CC \lstinline[language=C++]{template}.\footnote{ 151 The \CFA program requires a snapshot declaration for \lstinline{n} to compile, as described at the end of \Vref{s:ArrayTypingC}.} 151 152 \begin{enumerate}[leftmargin=*] 152 153 \item 153 154 The \CC template @N@ can only be a compile-time value, while the \CFA @N@ may be a runtime value. 154 155 \item 155 \CC does not allow a template function to be nested, while \CFA lets its polymorphic functions tobe nested.156 \CC does not allow a template function to be nested, while \CFA allows polymorphic functions be nested. 156 157 Hence, \CC precludes a simple form of information hiding. 157 158 \item … … 176 177 % mycode/arrr/thesis-examples/check-peter/cs-cpp.cpp, v10 177 178 \end{enumerate} 178 The \CC template @array@ type mitigates points \VRef[]{p:DimensionPassing} and \VRef[]{p:ArrayCopy}, but it is also trying to accomplish a similar mechanism to \CFA @array@. 179 The \CC template @std::array@ tries to accomplish a similar mechanism to \CFA @array@. 180 It is an aggregate type with the same semantics as a @struct@ holding a C-style array \see{\VRef{s:ArraysCouldbeValues}}, which mitigates points \VRef[]{p:DimensionPassing} and \VRef[]{p:ArrayCopy}. 179 181 180 182 \begin{figure} 181 \begin{tabular}{@{}l @{\hspace{20pt}}l@{}}183 \begin{tabular}{@{}ll@{}} 182 184 \begin{c++} 183 185 … … 187 189 } 188 190 int main() { 189 190 int ret[ 10], x[10];191 for ( int i = 0; i < 10; i += 1 ) x[i] = i;192 @copy<int, 10>( ret, x );@193 for ( int i = 0; i < 10; i += 1 )191 const size_t n = 10; // must be constant 192 int ret[n], x[n]; 193 for ( int i = 0; i < n; i += 1 ) x[i] = i; 194 @copy<int, n >( ret, x );@ 195 for ( int i = 0; i < n; i += 1 ) 194 196 cout << ret[i] << ' '; 195 197 cout << endl; … … 203 205 for ( i; N ) ret[i] = x[i]; 204 206 } 205 206 const int n = promptForLength();207 size_t n; 208 sin | n; 207 209 array( int, n ) ret, x; 208 210 for ( i; n ) x[i] = i; … … 228 230 When the argument lengths themselves are statically unknown, 229 231 the static check is conservative and, as always, \CFA's casting lets the programmer use knowledge not shared with the type system. 230 \begin{tabular}{@{\hspace{0.5in}}l@{\hspace{1in}}l@{}} 231 \lstinput{90-97}{hello-array.cfa} 232 & 233 \lstinput{110-117}{hello-array.cfa} 234 \end{tabular} 235 236 \noindent 237 This static check's full rules are presented in \VRef[Section]{s:ArrayTypingC}. 232 \lstinput{90-96}{hello-array.cfa} 233 This static check's rules are presented in \VRef[Section]{s:ArrayTypingC}. 238 234 239 235 Orthogonally, the \CFA array type works within generic \emph{types}, \ie @forall@-on-@struct@. 240 236 The same argument safety and the associated implicit communication of array length occurs. 241 237 Preexisting \CFA allowed aggregate types to be generalized with type parameters, enabling parameterizing of element types. 242 This has beenextended to allow parameterizing by dimension.243 Doing so gives a refinement of C's ``flexible array member''~\cite[\S~6.7.2.1.18]{C11} .238 This feature is extended to allow parameterizing by dimension. 239 Doing so gives a refinement of C's ``flexible array member''~\cite[\S~6.7.2.1.18]{C11}: 244 240 \begin{cfa} 245 241 struct S { … … 264 260 \end{cfa} 265 261 This ability to avoid casting and size arithmetic improves safety and usability over C flexible array members. 266 Finally, inputs and outputs are given at the bottomfor different sized schools.262 Finally, inputs and outputs are given on the right for different sized schools. 267 263 The example program prints the courses in each student's preferred order, all using the looked-up display names. 268 264 269 265 \begin{figure} 270 \begin{cquote} 271 \lstinput{50-55}{hello-accordion.cfa} 266 \begin{lrbox}{\myboxA} 267 \begin{tabular}{@{}l@{}} 268 \lstinput{50-55}{hello-accordion.cfa} \\ 272 269 \lstinput{90-98}{hello-accordion.cfa} 273 \ \\ 274 @$ cat school1@ 275 \lstinput{}{school1} 276 277 @$ ./a.out < school1@ 278 \lstinput{}{school1.out} 279 280 @$ cat school2@ 281 \lstinput{}{school2} 282 283 @$ ./a.out < school2@ 284 \lstinput{}{school2.out} 285 \end{cquote} 270 \end{tabular} 271 \end{lrbox} 272 273 \begin{lrbox}{\myboxB} 274 \begin{tabular}{@{}l@{}} 275 @$ cat school1@ \\ 276 \lstinputlisting{school1} \\ 277 @$ ./a.out < school1@ \\ 278 \lstinputlisting{school1.out} \\ 279 @$ cat school2@ \\ 280 \lstinputlisting{school2} \\ 281 @$ ./a.out < school2@ \\ 282 \lstinputlisting{school2.out} 283 \end{tabular} 284 \end{lrbox} 285 286 \setlength{\tabcolsep}{10pt} 287 \begin{tabular}{@{}ll@{}} 288 \usebox\myboxA 289 & 290 \usebox\myboxB 291 \end{tabular} 286 292 287 293 \caption{\lstinline{School} Example, Input and Output} … … 290 296 291 297 When a function operates on a @School@ structure, the type system handles its memory layout transparently. 292 \lstinput{30-3 7}{hello-accordion.cfa}298 \lstinput{30-36}{hello-accordion.cfa} 293 299 In the example, function @getPref@ returns, for the student at position @is@, what is the position of their @pref@\textsuperscript{th}-favoured class? 294 300 … … 296 302 \section{Dimension Parameter Implementation} 297 303 298 The core of the preexisting \CFA compiler already ha dthe ``heavy equipment'' needed to provide the feature set just reviewed (up to bugs in cases not yet exercised).304 The core of the preexisting \CFA compiler already has the ``heavy equipment'' needed to provide the feature set just reviewed (up to bugs in cases not yet exercised). 299 305 To apply this equipment in tracking array lengths, I encoded a dimension (array's length) as a type. 300 306 The type in question does not describe any data that the program actually uses at runtime. … … 323 329 \begin{itemize}[leftmargin=*] 324 330 \item 325 Resolver provided values for a useddeclaration's type-system variables, gathered from type information in scope at the usage site.326 \item 327 The box pass, encoding information about type parameters into ``extra'' regular parameters /arguments on declarations and calls.331 Resolver provided values for a declaration's type-system variables, gathered from type information in scope at the usage site. 332 \item 333 The box pass, encoding information about type parameters into ``extra'' regular parameters and arguments on declarations and calls. 328 334 Notably, it conveys the size of a type @foo@ as a @__sizeof_foo@ parameter, and rewrites the @sizeof(foo)@ expression as @__sizeof_foo@, \ie a use of the parameter. 329 335 \end{itemize} … … 331 337 The rules for resolution had to be restricted slightly, in order to achieve important refusal cases. 332 338 This work is detailed in \VRef[Section]{s:ArrayTypingC}. 333 However, the resolution--boxing scheme, in its preexisting state, was alreadyequipped to work on (desugared) dimension parameters.339 However, the resolution--boxing scheme, in its preexisting state, is equipped to work on (desugared) dimension parameters. 334 340 The following discussion explains the desugaring and how correctly lowered code results. 335 341 … … 357 363 \end{enumerate} 358 364 The chosen solution is to encode the value @N@ \emph{as a type}, so items 1 and 2 are immediately available for free. 359 Item 3 needs a way to recover the encoded value from a (valid) type (and to reject invalid types occurring here).365 Item 3 needs a way to recover the encoded value from a (valid) type and to reject invalid types. 360 366 Item 4 needs a way to produce a type that encodes the given value. 361 367 … … 416 422 The type @thing(N)@ is (replaced by @void *@, but thereby effectively) gone. 417 423 \item 418 The @sout...@ expression (being an application of the @?|?@ operator)has a regular variable (parameter) usage for its second argument.424 The @sout...@ expression has a regular variable (parameter) usage for its second argument. 419 425 \item 420 426 Information about the particular @thing@ instantiation (value 10) is moved, from the type, to a regular function-call argument. … … 455 461 \begin{cfa} 456 462 enum { n = 42 }; 457 float x[@n@]; // or just 42458 float (*xp1)[@42@] = &x; // accept459 float (*xp2)[@999@] = &x; // reject463 float x[@n@]; $\C{// or just 42}$ 464 float (*xp1)[@42@] = &x; $\C{// accept}$ 465 float (*xp2)[@999@] = &x; $\C{// reject}$ 460 466 warning: initialization of 'float (*)[999]' from incompatible pointer type 'float (*)[42]' 461 467 \end{cfa} 462 468 When a variable is involved, C and \CFA take two different approaches. 463 Today's C compilers accept the following without warning.469 Today's C compilers accept the following without a warning. 464 470 \begin{cfa} 465 471 static const int n = 42; … … 482 488 The way the \CFA array is implemented, the type analysis for this case reduces to a case similar to the earlier C version. 483 489 The \CFA compiler's compatibility analysis proceeds as: 484 \begin{itemize}[ parsep=0pt]490 \begin{itemize}[leftmargin=*,parsep=0pt] 485 491 \item 486 492 Is @array( float, 999 )@ type-compatible with @array( float, n )@? … … 510 516 in order to preserve the length information that powers runtime bound-checking.} 511 517 Therefore, the need to upgrade legacy C code is low. 512 Finally, if this incompatibility is a problem onboarding C programs to \CFA, it isshould be possible to change the C type check to a warning rather than an error, acting as a \emph{lint} of the original code for a missing type annotation.518 Finally, if this incompatibility is a problem onboarding C programs to \CFA, it should be possible to change the C type check to a warning rather than an error, acting as a \emph{lint} of the original code for a missing type annotation. 513 519 514 520 To handle two occurrences of the same variable, more information is needed, \eg, this is fine, … … 516 522 int n = 42; 517 523 float x[@n@]; 518 float (*xp)[@n@] = x; // accept524 float (*xp)[@n@] = x; $\C{// accept}$ 519 525 \end{cfa} 520 526 where @n@ remains fixed across a contiguous declaration context. 521 However, intervening dynamic statement cause failures.527 However, intervening dynamic statements can cause failures. 522 528 \begin{cfa} 523 529 int n = 42; 524 530 float x[@n@]; 525 @n@ = 999; // dynamic change526 float (*xp)[@n@] = x; // reject527 \end{cfa} 528 However, side-effects can occur in a contiguous declaration context.531 @n@ = 999; $\C{// dynamic change}$ 532 float (*xp)[@n@] = x; $\C{// reject}$ 533 \end{cfa} 534 As well, side-effects can even occur in a contiguous declaration context. 529 535 \begin{cquote} 530 536 \setlength{\tabcolsep}{20pt} … … 536 542 void f() { 537 543 float x[@n@] = { g() }; 538 float (*xp)[@n@] = x; // reject544 float (*xp)[@n@] = x; // reject 539 545 } 540 546 \end{cfa} … … 544 550 int @n@ = 42; 545 551 void g() { 546 @n@ = 99 ;552 @n@ = 999; // accept 547 553 } 548 554 … … 553 559 The issue here is that knowledge needed to make a correct decision is hidden by separate compilation. 554 560 Even within a translation unit, static analysis might not be able to provide all the information. 555 However, if the example uses @const@, the check is possible .561 However, if the example uses @const@, the check is possible even though the value is unknown. 556 562 \begin{cquote} 557 563 \setlength{\tabcolsep}{20pt} … … 563 569 void f() { 564 570 float x[n] = { g() }; 565 float (*xp)[n] = x; // reject571 float (*xp)[n] = x; // accept 566 572 } 567 573 \end{cfa} … … 571 577 @const@ int n = 42; 572 578 void g() { 573 @n = 99 @; // allowed579 @n = 999@; // reject 574 580 } 575 581 … … 749 755 \end{comment} 750 756 751 The conservatism of the new rule set can leave a programmer needing a recourse, when needing to use a dimension expression whose stability argument is more subtle than current-state analysis.757 The conservatism of the new rule set can leave a programmer requiring a recourse, when needing to use a dimension expression whose stability argument is more subtle than current-state analysis. 752 758 This recourse is to declare an explicit constant for the dimension value. 753 759 Consider these two dimension expressions, whose uses are rejected by the blunt current-state rules: … … 755 761 void f( int @&@ nr, @const@ int nv ) { 756 762 float x[@nr@]; 757 float (*xp)[@nr@] = &x; // reject: nr varying (no references)763 float (*xp)[@nr@] = &x; // reject: nr varying (no references) 758 764 float y[@nv + 1@]; 759 float (*yp)[@nv + 1@] = &y; // reject: ?+? unpredictable (no functions)765 float (*yp)[@nv + 1@] = &y; // reject: ?+? unpredictable (no functions) 760 766 } 761 767 \end{cfa} 762 768 Yet, both dimension expressions are reused safely. 763 The @nr@ reference is never written, no t volatile meaning noimplicit code (load) between declarations, and control does not leave the function between the uses.769 The @nr@ reference is never written, no implicit code (load) between declarations, and control does not leave the function between the uses. 764 770 As well, the build-in @?+?@ function is predictable. 765 771 To make these cases work, the programmer must add the follow constant declarations (cast does not work): … … 768 774 @const int nx@ = nr; 769 775 float x[nx]; 770 float (*xp)[nx] = & x; // accept776 float (*xp)[nx] = & x; // accept 771 777 @const int ny@ = nv + 1; 772 778 float y[ny]; 773 float (*yp)[ny] = & y; // accept779 float (*yp)[ny] = & y; // accept 774 780 } 775 781 \end{cfa} … … 808 814 \end{cfa} 809 815 Dimension hoisting already existed in the \CFA compiler. 810 But itswas buggy, particularly with determining, ``Can hoisting the expression be skipped here?'', for skipping this hoisting is clearly desirable in some cases.816 However, it was buggy, particularly with determining, ``Can hoisting the expression be skipped here?'', for skipping this hoisting is clearly desirable in some cases. 811 817 For example, when a programmer has already hoisted to perform an optimization to prelude duplicate code (expression) and/or expression evaluation. 812 818 In the new implementation, these cases are correct, harmonized with the accept/reject criteria. … … 820 826 \item 821 827 Flexible-stride memory: 822 this model has complete independence between subscript ing ordering and memory layout, offering the ability to slice by (provide an index for) any dimension, \eg slice a plane, row, or column, \eg @c[3][*][*]@, @c[3][4][*]@, @c[3][*][5]@.828 this model has complete independence between subscript ordering and memory layout, offering the ability to slice by (provide an index for) any dimension, \eg slice a row, column, or plane, \eg @c[3][4][*]@, @c[3][*][5]@, @c[3][*][*]@. 823 829 \item 824 830 Fixed-stride memory: … … 839 845 Style 3 is the inevitable target of any array implementation. 840 846 The hardware offers this model to the C compiler, with bytes as the unit of displacement. 841 C offers this model to its programmeras pointer arithmetic, with arbitrary sizes as the unit.847 C offers this model to programmers as pointer arithmetic, with arbitrary sizes as the unit. 842 848 Casting a multidimensional array as a single-dimensional array/pointer, then using @x[i]@ syntax to access its elements, is still a form of pointer arithmetic. 843 849 844 Now stepping into the implementation of \CFA's new type-1 multidimensional arrays in terms of C's existing type-2 multidimensional arrays, it helps to clarify that even the interface is quite low-level. 845 A C/\CFA array interface includes the resulting memory layout. 846 The defining requirement of a type-2 system is the ability to slice a column from a column-finest matrix. 847 The required memory shape of such a slice is fixed, before any discussion of implementation. 848 The implementation presented here is how the \CFA array-library wrangles the C type system, to make it do memory steps that are consistent with this layout while not affecting legacy C programs. 850 To step into the implementation of \CFA's new type-1 multidimensional arrays in terms of C's existing type-2 multidimensional arrays, it helps to clarify that the interface is low-level, \ie a C/\CFA array interface includes the resulting memory layout. 851 Specifically, the defining requirement of a type-2 system is the ability to slice a column from a column-finest matrix. 852 Hence, the required memory shape of such a slice is fixed, before any discussion of implementation. 853 The implementation presented here is how the \CFA array-library wrangles the C type system to make it do memory steps that are consistent with this layout while not affecting legacy C programs. 849 854 % TODO: do I have/need a presentation of just this layout, just the semantics of -[all]? 850 855 … … 874 879 \lstinput[aboveskip=0pt]{145-145}{hello-md.cfa} 875 880 The nontrivial slicing in this example now allows passing a \emph{noncontiguous} slice to @print1d@, where the new-array library provides a ``subscript by all'' operation for this purpose. 876 In a multi-dimensional subscript operation, any dimension given as @all@ is a placeholder, \ie ``not yet subscripted by a value'', waiting for such a value,implementing the @ar@ trait.881 In a multi-dimensional subscript operation, any dimension given as @all@ is a placeholder, \ie ``not yet subscripted by a value'', waiting for a value implementing the @ar@ trait. 877 882 \lstinput{150-151}{hello-md.cfa} 878 883
Note:
See TracChangeset
for help on using the changeset viewer.