Changeset ad9f593 for doc/theses/mike_brooks_MMath/background.tex
- Timestamp:
- Nov 1, 2024, 4:48:29 PM (12 days ago)
- Branches:
- master
- Children:
- 63a7394
- Parents:
- b7921d8
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/theses/mike_brooks_MMath/background.tex
rb7921d8 rad9f593 168 168 The conjoining of pointers and arrays could also be applied to structures, where a pointer references a structure field like an array element. 169 169 Finally, while subscripting involves pointer arithmetic (as does a field reference @x.y.z@), the computation is complex for multi-dimensional arrays and requires array descriptors to know stride lengths along dimensions. 170 Many C errors result from manually performing pointer arithmetic instead of using language subscripting, letting the compiler performs any arithmetic; 171 some C textbooks erroneously suggest manual pointer arithmetic is faster than subscripting. 170 Many C errors result from manually performing pointer arithmetic instead of using language subscripting, letting the compiler perform any arithmetic. 171 172 Some C textbooks erroneously suggest manual pointer arithmetic is faster than subscripting. 172 173 A sound and efficient C program does not require explicit pointer arithmetic. 173 174 C semantics want a programmer to \emph{believe} an array variable is a ``pointer to its first element.'' 174 TODO: provide an example, explain the belief, and give modern refutation 175 176 C semantics wants a programmer to \emph{believe} an array variable is a ``pointer to its first element.'' 175 177 This desire becomes apparent by a detailed inspection of an array declaration. 176 178 \lstinput{34-34}{bkgd-carray-arrty.c} 177 179 The inspection begins by using @sizeof@ to provide program semantics for the intuition of an expression's type. 180 An architecture with 64-bit pointer size is used, to keep irrelevant details fixed. 178 181 \lstinput{35-36}{bkgd-carray-arrty.c} 179 182 Now consider the @sizeof@ expressions derived from @ar@, modified by adding pointer-to and first-element (and including unnecessary parentheses to avoid any confusion about precedence). 180 183 \lstinput{37-40}{bkgd-carray-arrty.c} 181 184 Given that arrays are contiguous and the size of @float@ is 4, then the size of @ar@ with 10 floats being 40 bytes is common reasoning for C programmers. 182 Equally, C programmers know the size of a pointer to the first array element is 8 (or 4 depending on the addressing architecture).185 Equally, C programmers know the size of a pointer to the first array element is 8. 183 186 % Now, set aside for a moment the claim that this first assertion is giving information about a type. 184 187 Clearly, an array and a pointer to its first element are different. 185 188 186 In fact, the idea that there is such a thing as a pointer to an array may be surprising and it is not the same thing as a pointer to the first element. 189 In fact, the idea that there is such a thing as a pointer to an array may be surprising. 190 It it is not the same thing as a pointer to the first element. 187 191 \lstinput{42-45}{bkgd-carray-arrty.c} 188 192 The first assignment generates: … … 196 200 Using the type form yields the same results as the prior expression form. 197 201 \lstinput{46-49}{bkgd-carray-arrty.c} 198 The results are also the same when there is no allocation using a pointer-to-array type. 202 The results are also the same when there is no allocation at all. 203 This time, starting from a pointer-to-array type: 199 204 \lstinput{51-57}{bkgd-carray-arrty.c} 200 Hence, in all cases, @sizeof@ is informing abouttype information.205 Hence, in all cases, @sizeof@ is reporting on type information. 201 206 202 207 Therefore, thinking of an array as a pointer to its first element is too simplistic an analogue and it is not backed up by the type system. … … 527 532 \subsection{Array Parameter Declaration} 528 533 529 C has a formal and actual declaration for functions to allow definition-before-use and separate compilation, where formal describes a type and an actual defines the type. 530 \begin{cfa} 531 int foo( int, float, char ); $\C{// formal, parameter names option}$ 532 int foo( int i, float f, char c ) { ... } $\C{// actual}$ 533 \end{cfa} 534 For array parameters, a formal parameter array declaration can specify the first dimension with a dimension value, @[10]@ (which is ignored), an empty dimension list, @[ ]@, or a pointer, @*@: 534 Passing an array along with a function call is obviously useful. 535 Let us say that a parameter is an array parameter when the called function intends to subscript it. 536 This section asserts that a more satisfactory/formal characterization does not exist in C, surveys the ways that C API authors communicate ``@p@ has zero or more @T@s,'' and calls out the minority cases where the C type system is using or verifying such claims. 537 538 A C function's parameter declarations look different, from the caller's and callee's perspectives. 539 Both perspectives consist of the text read by a programmer and the semantics enforced by the type system. 540 The caller's perspecitve is available from a mere function declaration (which allow definition-before-use and separate compilation), but can also be read from (the non-body part of) a function definition. 541 The callee's perspective is what is available inside the function. 542 \begin{cfa} 543 int foo( int, float, char ); $\C{// declaration, names optional}$ 544 int bar( int i, float f, char c ) { $\C{// definition, names mandatory}$ 545 $/* caller's perspective of foo's; callee's perspective of bar's */$ 546 ... 547 } 548 $/* caller's persepectives of foo's and bar's */$ 549 \end{cfa} 550 The caller's perspective is more limited. 551 The example shows, so far, that parameter names (by virtue of being optional) are really comments in the caller's perspective, while they are semantically significant in the callee's perspective. 552 Array parameters introduce a further, subtle, semantic difference and considerable freedom to comment. 553 554 At the semantic level, there is no such thing as an array parameter, except for one case (@T[static 5]@) discussed shortly. 555 Rather, there are only pointer parameters. 556 This fact probably shares considerable responsibility for the common sense of ``an array is just a pointer,'' wich has been refuted in non-parameter contexts. 557 This fact holds in both the caller's and callee's perspectives. 558 However, a parameter's type can include ``array of.'' 559 For example, the type ``pointer to array of 5 ints'' (@T(*)[5]@) is a pointer type, a fully meaningful parameter type (in the sense that this description does not contain any information that the type system ignores), and a type that appears the same in the caller's \vs callee's perspectives. 560 The outermost type constructor (syntactically first dimension) is really the one that determines the flavour of parameter. 561 562 \begin{figure} 535 563 \begin{cquote} 536 564 \begin{tabular}{@{}llll@{}} 537 565 \begin{cfa} 538 double sum( double [5] ); 539 double sum( double *[5] ); 566 float sum( float a[5] ); 567 float sum( float a[5][4] ); 568 float sum( float a[5][] ); 569 float sum( float a[5]* ); 570 float sum( float *a[5] ); 540 571 \end{cfa} 541 572 & 542 573 \begin{cfa} 543 double sum( double [ ] ); 544 double sum( double *[ ] ); 574 float sum( float a[] ); 575 float sum( float a[][4] ); 576 float sum( float a[][] ); 577 float sum( float a[]* ); 578 float sum( float *a[] ); 545 579 \end{cfa} 546 580 & 547 581 \begin{cfa} 548 double sum( double * ); 549 double sum( double ** ); 582 float sum( float *a ); 583 float sum( float (*a)[4] ); 584 float sum( float (*a)[] ); 585 float sum( float (*a)* ); 586 float sum( float **a ); 550 587 \end{cfa} 551 588 & 552 589 \begin{cfa} 553 // array 554 // matrix 590 // ar of float 591 // mat of float 592 // invalid 593 // invalid 594 // ar of ptr to float 555 595 \end{cfa} 556 596 \end{tabular} 557 597 \end{cquote} 558 Good practice uses the middle form as it clearly indicates the parameter is subscripted. 559 However, an actual declaration cannot use @[ ]@; 560 it must use @*@. 561 \begin{cfa} 562 double sum( double v[ ] ) { $\C{// formal declaration}$ 563 double * cv; $\C{// actual declaration, think cv[ ]}$ 564 sum( cv ); $\C{// address assignment v = cv}$ 565 \end{cfa} 566 567 Given the formal dimension forms @[ ]@ or @[5]@, it raises the question of qualifying the implicit array pointer rather than the array element type. 598 \caption{Multiple ways to declare an arrray parameter. Across a valid row, every declaration is equivalent. Each column gives a declaration style. Really, the style can be read from the first row only. The second row shows how the style extends to multiple dimensions, with the rows thereafter providing context for the choice of which second-row \lstinline{[]}receives the column-style variation.} 599 \label{f:ArParmEquivDecl} 600 \end{figure} 601 602 Yet, C allows array syntax for the outermost type constructor, from which comes the freedom to comment. 603 An array parameter declaration can specify the outermost dimension with a dimension value, @[10]@ (which is ignored), an empty dimension list, @[ ]@, or a pointer, @*@, as seen in \VRef[Figure]{f:ArParmEquivDecl}. The rationale for rejecting the first ``invalid'' row follows shortly, while the second ``invalid'' row is simple nonsense, included to complete the pattern; its syntax hints at what the final row actually achieves. 604 605 In the lefmost style, the typechecker ignores the actual value in most practical cases. 606 This value is allowed to be a dynamic expression, so it is \emph{possible} to use the leftmost style in many practical cases. 607 608 % To help contextualize the matrix part of this example, the syntaxes @float [5][]@, @float [][]@ and @float (*)[]@ are all rejected, for reasons discussed shortly. 609 % So are @float[5]*@, @float[]*@ and @float (*)*@. These latter ones are simply nonsense, though they hint at ``1d array of pointers'', whose equivalent syntax options are, @float *[5]@, @float *[]@, and @float **@. 610 611 It is a matter of taste as to whether a programmer should use a form as far left as possible (getting the most out of syntactically integrated comments), sticking to the right (avoiding false comfort from suggesting the typechecker is checking more than it is), or compromising in the middle (reducing unchecked information, yet clearly stating, ``I will subscript this one''). 612 613 Note that this equivalence of pointer and array declarations is special to paramters. 614 It does not apply to local variables, where true array declarations are possible. 615 \begin{cfa} 616 void f( float * a ) { 617 float * b = a; // ok 618 float c[] = a; // reject 619 float d[] = { 1.0, 2.0, 3.0 }; // ok 620 static_assert( sizeof(b) == sizeof(float*) ); 621 static_assert( sizeof(d) != sizeof(float*) ); 622 } 623 \end{cfa} 624 This equivalence has the consequence that the type system does not help a caller get it right. 625 \begin{cfa} 626 float sum( float v[] ); 627 float arg = 3.14; 628 sum( &arg ); $\C{// accepted, v := \&arg}$ 629 \end{cfa} 630 631 Given the syntactic dimension forms @[ ]@ or @[5]@, it raises the question of qualifying the implied array pointer rather than the array element type. 568 632 For example, the qualifiers after the @*@ apply to the array pointer. 569 633 \begin{cfa} … … 571 635 void foo( const volatile int [ ] @const volatile@ ); // does not parse 572 636 \end{cfa} 573 C addressed this shortcoming by moving the pointer qualifiersinto the first dimension.637 C instead puts these pointer qualifiers syntactically into the first dimension. 574 638 \begin{cquote} 575 639 @[@ \textit{type-qualifier-list}$_{opt}$ \textit{assignment-expression}$_{opt}$ @]@ … … 580 644 \end{cfa} 581 645 582 To make the first formaldimension size meaningful, C adds this form.646 To make the first dimension size meaningful, C adds this form. 583 647 \begin{cquote} 584 648 @[@ @static@ \textit{type-qualifier-list}$_{opt}$ \textit{assignment-expression} @]@ … … 590 654 \end{cfa} 591 655 Here, the @static@ storage qualifier defines the minimum array size for its argument. 592 @gcc@ ignores this dimension qualifier, \ie it gives no warning if the argument array size is less than the parameter minimum. 593 594 Finally, to handle VLAs, C repurposed the @*@ \emph{within} the dimension in the formal declaration context to mean the argument must be a VLA (contiguous). 656 @gcc@ ignores this dimension qualifier, \ie it gives no warning if the argument array size is less than the parameter minimum. However, @clang@ implements the check, in accordance with the standard. TODO: be specific about versions 657 658 Note that there are now two different meanings for modifiers in the same position. In 659 \begin{cfa} 660 void foo( int x[static const volatile 3] ); 661 \end{cfa} 662 the @static@ applies to the 3, while the @const volatile@ applies to the @x@. 663 664 With multidimensional arrays, on dimensions after the first, a size is required and, is not ignored. 665 These sizes are required for the callee to be able to subscript. 666 \begin{cfa} 667 void f( float a[][10], float b[][100] ) { 668 static_assert( ((char*)&a([1])) - ((char*)&a([0])) == 10 * sizeof(float) ); 669 static_assert( ((char*)&b([1])) - ((char*)&b([0])) == 100 * sizeof(float) ); 670 } 671 \end{cfa} 672 Here, the distance between the first and second elements of each array depends on the inner dimension size. 673 674 The last observation is a fact of the callee's perspective. 675 There is little type-system checking, in the caller's perspective, that what is being passed, matches. 676 \begin{cfa} 677 void f( float [][10] ); 678 int n = 100; 679 float a[100], b[n]; 680 f(&a); // reject 681 f(&b); // accept 682 \end{cfa} 683 This size is therefore, a callee's assumption. 684 685 Finally, to handle higher-dimensional VLAs, C repurposed the @*@ \emph{within} the dimension in a declaration to mean that the callee will have make an assumption about the size here, but no (unchecked, possibly wrong) information about this assumption is included for the caller-programmer's benefit/overconfidence. 595 686 \begin{cquote} 596 687 @[@ \textit{type-qualifier-list$_{opt}$} @* ]@ 597 688 \end{cquote} 598 689 \begin{cfa} 599 void foo( int [@*@][@*@] ); $\C{// formal}$ 600 void foo( int ar[10][10] ) { ... } $\C{// actual}$ 601 int ar[2][10]; $\C{// contiguous}$ 602 foo( ar ); $\C{// valid}$ 603 int * arp[10]; $\C{// non-contiguous}$ 604 foo( arp ); $\C{// invalid}$ 605 \end{cfa} 606 This syntactic form for the formal prototype means the header file does not have to commit to specific dimension values, but the compiler knows the argument is a contiguous array. 690 void foo( float [][@*@] ); $\C{// declaration}$ 691 void foo( float a[][10] ) { ... } $\C{// definition}$ 692 \end{cfa} 693 Repeating it with the full context of a VLA is useful: 694 \begin{cfa} 695 void foo( int, float [][@*@] ); $\C{// declaration}$ 696 void foo( int n, float a[][n] ) { ... } $\C{// definition}$ 697 \end{cfa} 698 Omitting the dimension from the declaration is consistent with omitting parameter names, for the declaration case has no name @n@ in scope. 699 The omission is also redacting all information not needed to generate correct caller-side code. 607 700 608 701
Note: See TracChangeset
for help on using the changeset viewer.